PostgreSQL+Pgpool实现HA主备切换的操作
PostgreSQL流复制实现HA主备切换
环境说明和主机规划
操作系统 | 主机名 | 主机 | 角色 | 端口 |
---|---|---|---|---|
CentOS7 | master | 10.0.0.11 | PG-Master | 54321 |
CentOS7 | slave | 10.0.0.12 | PG-Slave | 54321 |
CentOS7 | pool | 10.0.0.13 | pgpool | 54321 |
基础环境配置(所有主机操作)
配置HOSTS
echo-e"10.0.0.11master\n10.0.0.12slave\n10.0.0.13pool">>/etc/hosts#执行一次即可
配置统一的时间(若已配置,请忽略)
yuminstall-yntpdate&&ntpdatentp1.aliyun.com echo-e"#synctimefromntp1.aliyun.com\n5****/usr/sbin/ntpdatentp1.aliyun.com>/dev/null2>&1 ">>/var/spool/cron/root#写入定时任务,执行一次即可
创建postgres用户
useraddpostgres&&echo"your_password"|passwd--stdinpostgres
配置免密钥登陆
su-postgres ssh-keygen-trsa-f/home/postgres/.ssh/id_rsa-P"" cd~/.ssh/ ssh-copy-idpostgres@master#三台主机执行 scpauthorized_keyspostgres@slave:~/.ssh#只在master主机执行 scpauthorized_keyspostgres@pool:~/.ssh#只在master主机执行
安装Postgresql数据库(PG9.6)
yuminstall-yhttps://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm
yuminstall-ypostgresql96-serverpostgresql96-contribpostgresql96postgresql96-libs
创建统一的目录结构
mkdir/data1/pg_{data,bin,logs}-p
chown-Rpostgres.postgres/data1/
修改系统变量
vi/etc/profile#增加以下内容 exportPGHOME=/usr/pgsql-9.6/ exportPGDATA=/data1/pg_data exportPGPORT=54321 exportPATH=$PATH:$PGHOME/bin #生效 source/etc/profile
PostgreSQL流复制结构(master和slave主机操作)
master主机操作
初始化系统
/usr/pgsql-9.6/bin/postgresql96-setupinitdb
vi/usr/lib/systemd/system/postgresql-9.6.service
修改postgresql-9.6.service
内容如下:
#Includethedefaultconfig: .include/usr/lib/systemd/system/postgresql-9.6.service [Service] Environment=PGDATA=/data1/pg_data
重启PG服务
systemctldaemon-reload su-postgres-c'/usr/pgsql-9.6/bin/initdb-D/data1/pg_data' systemctlrestartpostgresql-9.6 systemctlenablepostgresql-9.6.service
修改系统配置(以下用postgres用户操作)
cp/data1/pg_data/pg_hba.conf{,.bak} cat>/data1/pg_data/pg_hba.conf<64G
cp/data1/pg_data/postgresql.conf{,.bak} cat>/data1/pg_data/postgresql.conf<128G
listen_addresses='*' port=54321 max_connections=256 shared_buffers=32GB effective_cache_size=96GB work_mem=128MB maintenance_work_mem=2GB min_wal_size=2GB max_wal_size=4GB checkpoint_completion_target=0.9 wal_buffers=16MB default_statistics_target=100 wal_level=hot_standby wal_log_hints=on max_wal_senders=1 hot_standby=on logging_collector=on log_directory='pg_log'在主库中创建流复制用户(stream_replication)和PGPool用户(srcheck)
CREATEUSERstream_replicationreplicationLOGINCONNECTIONLIMIT5ENCRYPTEDPASSWORD'your_password';
CREATEUSERsrcheckreplicationLOGINCONNECTIONLIMIT5ENCRYPTEDPASSWORD'your_password';
修改主库pg_hba.conf文件(已操作见cat>/data1/pg_data/pg_hba.conf<
hostreplicationstream_replication0.0.0.0/0md5
slave主机操作
初始化系统
/usr/pgsql-9.6/bin/postgresql96-setupinitdb
vi/usr/lib/systemd/system/postgresql-9.6.service
修改postgresql-9.6.service
内容如下:
#Includethedefaultconfig: .include/usr/lib/systemd/system/postgresql-9.6.service [Service] Environment=PGDATA=/data1/pg_data重启PG服务
systemctldaemon-reload
基础备份复制到备库服务器
rm-rf/data1/pg_data#如果没有重要数据可操作,主要为同步主库路径
su-postgres-c'pg_basebackup-D$PGDATA--format=p-hmaster-p54321-Ustream_replication-W'
修改备库配置信息
cp$PGHOME/share/recovery.conf.sample$PGDATA/recovery.conf
vi$PGDATA/recovery.conf
增加以下内容
standby_mode='on' primary_conninfo='host=masterport=54321user=stream_replicationpassword=your_password' restore_command='' recovery_target_timeline='latest' #重启PG服务 systemctlrestartpostgresql-9.6 systemctlenablepostgresql-9.6.service验证
主节点执行
createtabletest(idint4,create_timetimestamp(0)withouttimezone); insertintotestvalues(1,now()); select*fromtest;备节点执行
select*fromtest;
其他查询
进入测试数据库test,主库上执行如下命令返回f,备库上返回t。selectpg_is_in_recovery();
执行如下命令查看快照,它返回主库记录点、备库记录点;主库每增加一条写入,记录点的值就会加1。
selecttxid_current_snapshot();
执行如下命令可以查看主备同步状态。
select*frompg_stat_replication;
字段state显示的同步状态有:startup(连接中)、catchup(同步中)、streaming(同步);字段sync_state显示的模式有:async(异步)、sync(同步)、potential(虽然现在是异步模式,但是有可能升级到同步模式)。
主备切换
假设主库崩溃了,备库如何从只读状态切换为读写状态呢?只要把备库的postgresql.conf中hot_standby修改为off,并且删除recovery.conf,然后重启库就可以提供服务了。
PGPool2(pool主机操作)
安装PGPool2
yuminstall-yhttp://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm yum-yinstallpgpool-II-pg96pgpool-II-pg96-debuginfopgpool-II-pg96-develpgpool-II-pg96-extensions systemctlenablepgpool.service#开启自动启动添加Pgpool-II运行用户
useraddpostgres#环境准备时已操作 chown-Rpostgres.postgres/etc/pgpool-II chown-Rpostgres.postgres/var/run/pgpool/配置pool_hba.conf
cp/etc/pgpool-II/pool_hba.conf{,.bak}
vi/etc/pgpool-II/pool_hba.conf
增加内容
hostallall0.0.0.0/0md5
配置pcp.conf
主节点登陆后执行:
postgres=#selectrolname,rolpasswordfrompg_authid; rolname|rolpassword --------------------+------------------------------------- pg_signal_backend| srcheck|md5662c10f61b27a9ab38ce69157186b25f postgres|md5d3612d57ee8d4c147cf27b11e3a0974d stream_replication|md59279ef6b904bc483e4f85e6d44cfc0ed (4rows)vi/etc/pgpool-II/pool_passwd
增加SQL执行结果的内容,形式为$rolname:$rolpassword例如:
srcheck:md5662c10f61b27a9ab38ce69157186b25f
或者:
pg_md5-upostgresyour_password
vi/etc/pgpool-II/pcp.conf##加入postgres:上一命令的输出
配置pgpool.conf
cp/etc/pgpool-II/pgpool.conf{,.bak}
vi/etc/pgpool-II/pgpool.conf
内容如下:
#CONNECTIONS listen_addresses='*' port=54321 socket_dir='/var/run/pgpool' pcp_listen_addresses='*' pcp_port=9898 pcp_socket_dir='/var/run/pgpool' #-BackendConnectionSettings- backend_hostname0='master' backend_port0=54321 backend_weight0=1 backend_data_directory0='/data1/pg_data' backend_flag0='ALLOW_TO_FAILOVER' backend_hostname1='slave' backend_port1=54321 backend_weight1=1 backend_data_directory1='/data1/pg_data' backend_flag1='ALLOW_TO_FAILOVER' #-Authentication- enable_pool_hba=on pool_passwd='pool_passwd' #FILELOCATIONS pid_file_name='/var/run/pgpool/pgpool.pid' logdir='/data1/pg_logs' replication_mode=off load_balance_mode=on master_slave_mode=on master_slave_sub_mode='stream' sr_check_period=5 sr_check_user='srcheck' sr_check_password='123456' sr_check_database='postgres' #HEALTHCHECK健康检查 health_check_period=10 health_check_timeout=20 health_check_user='srcheck' health_check_password='123456' health_check_database='postgres' #FAILOVERANDFAILBACK failover_command='/data1/pg_bin/failover_stream.sh%H'failover_stream.sh脚本
vim/data1/pg_bin/failover_stream.sh chmod777/data1/pg_bin/failover_stream.sh chmodu+s/sbin/ifconfig chmodu+s/usr/sbin pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1#启动 pgpool-mfaststop##关闭failover_stream.sh内容:
#!/bin/sh #Failovercommandforstreamingreplication. #Arguments:$1:newmasterhostname. new_master=$1 trigger_command="$PGHOME/bin/pg_ctlpromote-D$PGDATA" #Promptestandbydatabase. /usr/bin/ssh-T$new_master$trigger_command exit0;登陆设置
当执行pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1&后可查看集群状态:
[postgres@poolpgpool-II]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres postgres=#showpool_nodes; node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay ---------+----------+-------+--------+-----------+---------+------------+-------------------+------------------- 0|master|54321|up|0.500000|primary|0|false|0 1|slave|54321|up|0.500000|standby|0|true|0 (2rows)如果未发现集群状态,请在master和slave主机分别执行以下操作:
[postgres@pool~]$pcp_attach_node-d-Upostgres-hpool-p9898-n0 [postgres@pool~]$pcp_attach_node-d-Upostgres-hpool-p9898-n1 #详情查询命令pcp_attach_nodeHA切换
模拟master主机宕机
Master端:
[postgres@master~]$pg_ctlstop waitingforservertoshutdown....done serverstopped当前集群状态
[postgres@pool~]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres psql(9.6.1) Type"help"forhelp. postgres=#showpool_nodes; node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay ---------+----------+------+--------+-----------+---------+------------+-------------------+------------------- 0|master|5432|down|0.500000|standby|0|false|0 1|slave|5432|up|0.500000|primary|0|true|0 (2rows)发现master已经是standby了,且down机了
修改master,启动
当master主机宕机后,此时slave主机PG数据库成为主库,修改master成为slave的从库即可
[postgres@master~]$vimrecovery.conf standby_mode='on' primary_conninfo='host=slaveport=54321user=stream_replicationpassword=your_password' restore_command='' recovery_target_timeline='latest'同步时间线
#如果报时间线冲突落后,先停掉pg服务,然后执行同步时间线,否知直接看状态 [postgres@master~]$pg_rewind--target-pgdata=/data1/pg_data--source-server='host=slaveport=54321user=postgresdbname=postgres' serversdivergedatWALposition0/5000098ontimeline1 rewindingfromlastcommoncheckpointat0/5000028ontimeline1 Done! #重新启动数据库 [postgres@master~]$pg_ctlstart再次查看当前状态
[postgres@pool~]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres postgres=#showpool_nodes; node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay ---------+----------+------+--------+-----------+---------+------------+-------------------+------------------- 0|master|5432|down|0.500000|standby|0|false|0 1|slave|5432|up|0.500000|primary|0|true|0 (2rows) #注意虽然master已经启动了,但是还是down,需要手动将master节点添加进pgpool,master的node_id是0,所以-n0 [postgres@pool~]$pcp_attach_node-d-Upostgres-hpool-p54321-n0 #提示输入密码,输入pcp管理密码 #查看当前状态 [postgres@pool~]$psql-p54321-h10.0.0.13-Usrcheck-dpostgres postgres=#showpool_nodes; node_id|hostname|port|status|lb_weight|role|select_cnt|load_balance_node|replication_delay ---------+----------+------+--------+-----------+---------+------------+-------------------+------------------- 0|master|5432|up|0.500000|standby|0|false|0 1|slave|5432|up|0.500000|primary|0|true|0 (2rows)现在两个节点都是up了。
主从两节点pgpool健康检查脚本(pgpool_check.sh)
说明:此脚本是基于PGpool只安装到master和slave两个主机上的情况下使用,在master主机有了pgpool进程后,可在slave主机执行shpgpool_check.sh&即可
#!/bin/bash #CheckMasterhostpgpool-process whiletrue do pgcount=$(nmap10.0.0.11|egrep'9898|9999'|wc-l) if[$pgcount-eq2];then echo'MasterhostpgpoolisGOOD!!!'>/dev/null2>&1 else echo-e"Masterhostpgpoolis\033[31mBAD!!!\033[0m" echo-e"Masterhostpgpoolis\033[31mBAD!!!\033[0m" echo-e"Masterhostpgpoolis\033[31mBAD!!!\033[0m" echo-e"SYSTEMWILLDOTHESHELL:\033[34msu-postgres-c'pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1&'\033[0m" su-postgres-c'pgpool-n-d-D>/data1/pg_logs/pgpool.log2>&1&' pgport=$(netstat-lntup|egrep'9898|9999'|wc-l) [$pgport-gt0]&&echo-e"Slavehostpgpoolis\033[32mRUNNING!!!\033[0m" exit0 fi done以上为个人经验,希望能给大家一个参考,也希望大家多多支持毛票票。如有错误或未考虑完全的地方,望不吝赐教。