PostgreSQL+Pgpool实现HA主备切换的操作
PostgreSQL流复制实现HA主备切换
环境说明和主机规划
| 操作系统 | 主机名 | 主机 | 角色 | 端口 |
|---|---|---|---|---|
| CentOS7 | master | 10.0.0.11 | PG-Master | 54321 |
| CentOS7 | slave | 10.0.0.12 | PG-Slave | 54321 |
| CentOS7 | pool | 10.0.0.13 | pgpool | 54321 |
基础环境配置(所有主机操作)
配置HOSTS
echo-e"10.0.0.11master\n10.0.0.12slave\n10.0.0.13pool">>/etc/hosts#执行一次即可
配置统一的时间(若已配置,请忽略)
yuminstall-yntpdate&&ntpdatentp1.aliyun.com echo-e"#synctimefromntp1.aliyun.com\n5****/usr/sbin/ntpdatentp1.aliyun.com>/dev/null2>&1 ">>/var/spool/cron/root#写入定时任务,执行一次即可
创建postgres用户
useraddpostgres&&echo"your_password"|passwd--stdinpostgres
配置免密钥登陆
su-postgres ssh-keygen-trsa-f/home/postgres/.ssh/id_rsa-P"" cd~/.ssh/ ssh-copy-idpostgres@master#三台主机执行 scpauthorized_keyspostgres@slave:~/.ssh#只在master主机执行 scpauthorized_keyspostgres@pool:~/.ssh#只在master主机执行
安装Postgresql数据库(PG9.6)
yuminstall-yhttps://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm
yuminstall-ypostgresql96-serverpostgresql96-contribpostgresql96postgresql96-libs
创建统一的目录结构
mkdir/data1/pg_{data,bin,logs}-p
chown-Rpostgres.postgres/data1/
修改系统变量
vi/etc/profile#增加以下内容 exportPGHOME=/usr/pgsql-9.6/ exportPGDATA=/data1/pg_data exportPGPORT=54321 exportPATH=$PATH:$PGHOME/bin #生效 source/etc/profile
PostgreSQL流复制结构(master和slave主机操作)
master主机操作
初始化系统
/usr/pgsql-9.6/bin/postgresql96-setupinitdb
vi/usr/lib/systemd/system/postgresql-9.6.service
修改postgresql-9.6.service
内容如下:
#Includethedefaultconfig: .include/usr/lib/systemd/system/postgresql-9.6.service [Service] Environment=PGDATA=/data1/pg_data
重启PG服务
systemctldaemon-reload su-postgres-c'/usr/pgsql-9.6/bin/initdb-D/data1/pg_data' systemctlrestartpostgresql-9.6 systemctlenablepostgresql-9.6.service
修改系统配置(以下用postgres用户操作)
cp/data1/pg_data/pg_hba.conf{,.bak}
cat>/data1/pg_data/pg_hba.conf<
64G
cp/data1/pg_data/postgresql.conf{,.bak}
cat>/data1/pg_data/postgresql.conf<
128G
listen_addresses='*'
port=54321
max_connections=256
shared_buffers=32GB
effective_cache_size=96GB
work_mem=128MB
maintenance_work_mem=2GB
min_wal_size=2GB
max_wal_size=4GB
checkpoint_completion_target=0.9
wal_buffers=16MB
default_statistics_target=100
wal_level=hot_standby
wal_log_hints=on
max_wal_senders=1
hot_standby=on
logging_collector=on
log_directory='pg_log'
在主库中创建流复制用户(stream_replication)和PGPool用户(srcheck)
CREATEUSERstream_replicationreplicationLOGINCONNECTIONLIMIT5ENCRYPTEDPASSWORD'your_password';
CREATEUSERsrcheckreplicationLOGINCONNECTIONLIMIT5ENCRYPTEDPASSWORD'your_password';
修改主库pg_hba.conf文件(已操作见cat>/data1/pg_data/pg_hba.conf<
hostreplicationstream_replication0.0.0.0/0md5
slave主机操作
初始化系统
/usr/pgsql-9.6/bin/postgresql96-setupinitdb
vi/usr/lib/systemd/system/postgresql-9.6.service
修改postgresql-9.6.service
内容如下:
#Includethedefaultconfig:
.include/usr/lib/systemd/system/postgresql-9.6.service
[Service]
Environment=PGDATA=/data1/pg_data
重启PG服务
systemctldaemon-reload
基础备份复制到备库服务器
rm-rf/data1/pg_data#如果没有重要数据可操作,主要为同步主库路径
su-postgres-c'pg_basebackup-D$PGDATA--format=p-hmaster-p54321-Ustream_replication-W'
修改备库配置信息
cp$PGHOME/share/recovery.conf.sample$PGDATA/recovery.conf
vi$PGDATA/recovery.conf
增加以下内容
standby_mode='on'
primary_conninfo='host=masterport=54321user=stream_replicationpassword=your_password'
restore_command=''
recovery_target_timeline='latest'
#重启PG服务
systemctlrestartpostgresql-9.6
systemctlenablepostgresql-9.6.service
验证
主节点执行
createtabletest(idint4,create_timetimestamp(0)withouttimezone);
insertintotestvalues(1,now());
select*fromtest;
备节点执行
select*fromtest;
其他查询
进入测试数据库test,主库上执行如下命令返回f,备库上返回t。selectpg_is_in_recovery();
执行如下命令查看快照,它返回主库记录点、备库记录点;主库每增加一条写入,记录点的值就会加1。
selecttxid_current_snapshot();
执行如下命令可以查看主备同步状态。
select*frompg_stat_replication;
字段state显示的同步状态有:startup(连接中)、catchup(同步中)、streaming(同步);字段sync_state显示的模式有:async(异步)、sync(同步)、potential(虽然现在是异步模式,但是有可能升级到同步模式)。
主备切换
假设主库崩溃了,备库如何从只读状态切换为读写状态呢?只要把备库的postgresql.conf中hot_standby修改为off,并且删除recovery.conf,然后重启库就可以提供服务了。
PGPool2(pool主机操作)
安装PGPool2
yuminstall-yhttp://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm
yum-yinstallpgpool-II-pg96pgpool-II-pg96-debuginfopgpool-II-pg96-develpgpool-II-pg96-extensions
systemctlenablepgpool.service#开启自动启动
添加Pgpool-II运行用户
useraddpostgres#环境准备时已操作
chown-Rpostgres.postgres/etc/pgpool-II
chown-Rpostgres.postgres/var/run/pgpool/
配置pool_hba.conf
cp/etc/pgpool-II/pool_hba.conf{,.bak}
vi/etc/pgpool-II/pool_hba.conf
增加内容
hostallall0.0.0.0/0md5
配置pcp.conf
主节点登陆后执行:
postgres=#selectrolname,rolpasswordfrompg_authid;
rolname|rolpassword
--------------------+-------------------------------------
pg_signal_backend|
srcheck|md5662c10f61b27a9ab38ce69157186b25f
postgres|md5d3612d57ee8d4c147cf27b11e3a0974d
stream_replication|md59279ef6b904bc483e4f85e6d44cfc0ed
(4rows)
vi/etc/pgpool-II/pool_passwd
增加SQL执行结果的内容,形式为$rolname:$rolpassword例如:
srcheck:md5662c10f61b27a9ab38ce69157186b25f
或者:
pg_md5-upostgresyour_password
vi/etc/pgpool-II/pcp.conf##加入postgres:上一命令的输出
配置pgpool.conf
cp/etc/pgpool-II/pgpool.conf{,.bak}
vi/etc/pgpool-II/pgpool.conf
内容如下:
#CONNECTIONS
listen_addresses='*'
port=54321
socket_dir='/var/run/pgpool'
pcp_listen_addresses='*'
pcp_port=9898
pcp_socket_dir='/var/run/pgpool'
#-BackendConnectionSettings-
backend_hostname0='master'
backend_port0=54321
backend_weight0=1
backend_data_directory0='/data1/pg_data'
backend_flag0='ALLOW_TO_FAILOVER'
backend_hostname1='slave'
backend_port1=54321
backend_weight1=1
backend_data_directory1='/data1/pg_data'
backend_flag1='ALLOW_TO_FAILOVER'
#-Authentication-
enable_pool_hba=on
pool_passwd='pool_passwd'
#FILELOCATIONS
pid_file_name='/var/run/pgpool/pgpool.pid'
logdir='/data1/pg_logs'
replication_mode=off
load_balance_mode=on
master_slave_mode=on
master_slave_sub_mode='stream'
sr_check_period=5
sr_check_user='srcheck'
sr_check_password='123456'
sr_check_database='postgres'
#HEALTHCHECK健康检查
health_check_period=10
health_check_timeout=20
health_check_user='srcheck'
health_check_password='123456'
health_check_database='postgres'
#FAILOVERANDFAILBACK
failover_command='/data1/pg_bin/failover_stream.sh%H'
failover_stream.sh脚本
vim/data1/pg_bin/failover_stream.sh
chmod777/data1/pg_bin/failover_stream.sh
chmodu+s/sbin/ifconfig
chmodu+s/usr/sbin
pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1#启动
pgpool-mfaststop##关闭
failover_stream.sh内容:
#!/bin/sh
#Failovercommandforstreamingreplication.
#Arguments:$1:newmasterhostname.
new_master=$1
trigger_command="$PGHOME/bin/pg_ctlpromote-D$PGDATA"
#Promptestandbydatabase.
/usr/bin/ssh-T$new_master$trigger_command
exit0;
登陆设置
当执行pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1&后可查看集群状态:
[postgres@poolpgpool-II]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres
postgres=#showpool_nodes;
node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------
0|master|54321|up|0.500000|primary|0|false|0
1|slave|54321|up|0.500000|standby|0|true|0
(2rows)
如果未发现集群状态,请在master和slave主机分别执行以下操作:
[postgres@pool~]$pcp_attach_node-d-Upostgres-hpool-p9898-n0
[postgres@pool~]$pcp_attach_node-d-Upostgres-hpool-p9898-n1
#详情查询命令pcp_attach_node
HA切换
模拟master主机宕机
Master端:
[postgres@master~]$pg_ctlstop
waitingforservertoshutdown....done
serverstopped
当前集群状态
[postgres@pool~]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres
psql(9.6.1)
Type"help"forhelp.
postgres=#showpool_nodes;
node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0|master|5432|down|0.500000|standby|0|false|0
1|slave|5432|up|0.500000|primary|0|true|0
(2rows)
发现master已经是standby了,且down机了
修改master,启动
当master主机宕机后,此时slave主机PG数据库成为主库,修改master成为slave的从库即可
[postgres@master~]$vimrecovery.conf
standby_mode='on'
primary_conninfo='host=slaveport=54321user=stream_replicationpassword=your_password'
restore_command=''
recovery_target_timeline='latest'
同步时间线
#如果报时间线冲突落后,先停掉pg服务,然后执行同步时间线,否知直接看状态
[postgres@master~]$pg_rewind--target-pgdata=/data1/pg_data--source-server='host=slaveport=54321user=postgresdbname=postgres'
serversdivergedatWALposition0/5000098ontimeline1
rewindingfromlastcommoncheckpointat0/5000028ontimeline1
Done!
#重新启动数据库
[postgres@master~]$pg_ctlstart
再次查看当前状态
[postgres@pool~]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres
postgres=#showpool_nodes;
node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0|master|5432|down|0.500000|standby|0|false|0
1|slave|5432|up|0.500000|primary|0|true|0
(2rows)
#注意虽然master已经启动了,但是还是down,需要手动将master节点添加进pgpool,master的node_id是0,所以-n0
[postgres@pool~]$pcp_attach_node-d-Upostgres-hpool-p54321-n0
#提示输入密码,输入pcp管理密码
#查看当前状态
[postgres@pool~]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres
postgres=#showpool_nodes;
node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
0|master|5432|up|0.500000|standby|0|false|0
1|slave|5432|up|0.500000|primary|0|true|0
(2rows)
现在两个节点都是up了。
主从两节点pgpool健康检查脚本(pgpool_check.sh)
说明:此脚本是基于PGpool只安装到master和slave两个主机上的情况下使用,在master主机有了pgpool进程后,可在slave主机执行shpgpool_check.sh&即可
#!/bin/bash
#CheckMasterhostpgpool-process
whiletrue
do
pgcount=$(nmap10.0.0.11|egrep'9898|9999'|wc-l)
if[$pgcount-eq2];then
echo'MasterhostpgpoolisGOOD!!!'>/dev/null2>&1
else
echo-e"Masterhostpgpoolis\033[31mBAD!!!\033[0m"
echo-e"Masterhostpgpoolis\033[31mBAD!!!\033[0m"
echo-e"Masterhostpgpoolis\033[31mBAD!!!\033[0m"
echo-e"SYSTEMWILLDOTHESHELL:\033[34msu-postgres-c'pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1&'\033[0m"
su-postgres-c'pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1&'
pgport=$(netstat-lntup|egrep'9898|9999'|wc-l)
[$pgport-gt0]&&echo-e"Slavehostpgpoolis\033[32mRUNNING!!!\033[0m"
exit0
fi
done
以上为个人经验,希望能给大家一个参考,也希望大家多多支持毛票票。如有错误或未考虑完全的地方,望不吝赐教。