PostgreSQL流复制参数max_wal_senders的用法说明
环境:
PostgreSQL9.2.4
主机:192.25.10.76
从机:192.25.10.71
做postgresql的流复制主从时,会遇到调整max_wal_sengers这个参数,官方文档对这个参数做了一个简要的说明(9.2.4比早先版本多了几句话并做了一些微调),但没有实际的例子。
1.参数说明:
Specifiesthemaximumnumberofconcurrentconnectionsfromstandbyserversorstreamingbasebackupclients(i.e.,themaximumnumberofsimultaneouslyrunningWALsenderprocesses).Thedefaultiszero,meaningreplicationisdisabled.WALsenderprocessescounttowardsthetotalnumberofconnections,sotheparametercannotbesethigherthanmax_connections.Thisparametercanonlybesetatserverstart.wal_levelmustbesettoarchiveorhot_standbytoallowconnectionsfromstandbyservers.
也就是说,这个参数是在主机上设置的,是从机连接到主机的并发连接数之总和,所以这个参数是个正整型。默认值是0,也即默认没有流复制功能。该并发连接数从进程上看,就是各个walsender进程数之和,可以通过ps-ef|grepsenders来查看,所以该值不能超过系统的最大连接数(max_connections,该BUG在9.1.5被修复),可以允许超过实际的流复制用户数。该参数更改需要重启DB,比如我只配了一个从机:
[postgres@ndb2database]$ps-ef|grepsender postgres2125721247020:57?00:00:00postgres:walsenderprocessrepuser192.25.10.71(46161)streaming0/4018ED8 postgres2219320949023:02pts/000:00:00grepsender
2.异常情况
很多时候配置主从的时候会遗漏这个参数,或者没有设置正确(比如实际配的从机数超过设置的连接数),这个时候一般会报错numberofrequestedstandbyconnectionsexceedsmax_wal_senders(currentlyX):备机上显示的日志异常:
2013-08-1220:53:42.132CST,,,8859,,5208dad6.229b,1,,2013-08-1220:53:42CST,,0,FATAL,XX000,"couldnotconnecttotheprimaryserver:FATAL:numberofrequestedstandbyconnectionsexceedsmax_wal_senders(currently0) ",,,,,,,,,"" 2013-08-1220:53:47.137CST,,,8861,,5208dadb.229d,1,,2013-08-1220:53:47CST,,0,FATAL,XX000,"couldnotconnecttotheprimaryserver:FATAL:numberofrequestedstandbyconnectionsexceedsmax_wal_senders(currently0) ",,,,,,,,,"" 2013-08-1220:53:52.142CST,,,8862,,5208dae0.229e,1,,2013-08-1220:53:52CST,,0,FATAL,XX000,"couldnotconnecttotheprimaryserver:FATAL:numberofrequestedstandbyconnectionsexceedsmax_wal_senders(currently0) ",,,,,,,,,"" 2013-08-1220:53:57.148CST,,,8864,,5208dae5.22a0,1,,2013-08-1220:53:57CST,,0,FATAL,XX000,"couldnotconnecttotheprimaryserver:FATAL:numberofrequestedstandbyconnectionsexceedsmax_wal_senders(currently0) ",,,,,,,,,"
主机上显示的日志异常:
receiver" 2013-08-1220:43:26.937CST,,,21064,"",5208d86e.5248,1,"",2013-08-1220:43:26CST,,0,LOG,00000,"connectionreceived:host=192.25.10.71port=46085",,,,,,,,,"" 2013-08-1220:43:26.938CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,2,"authentication",2013-08-1220:43:26CST,2/1195,0,LOG,00000,"replicationconnectionauthorized:user=repuser",,,,,,,,,"" 2013-08-1220:43:26.938CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,3,"startup",2013-08-1220:43:26CST,2/0,0,FATAL,53300,"numberofrequestedstandbyconnectionsexceedsmax_wal_senders(currently0)",,,,,,,,,"walreceiver" 2013-08-1220:43:26.939CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,4,"startup",2013-08-1220:43:26CST,,0,LOG,00000,"disconnection:sessiontime:0:00:00.002user=repuserdatabase=host=192.25.10.71port=46085",,,,,,,,,"wa lreceiver" 2013-08-1220:43:41.513CST,,,21066,"",5208d87d.524a,1,"",2013-08-1220:43:41CST,,0,LOG,00000,"connectionreceived:host=192.25.10.71port=46086",,,,,,,,,"" 2013-08-1220:43:41.514CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,2,"authentication",2013-08-1220:43:41CST,2/1198,0,LOG,00000,"replicationconnectionauthorized:user=repuser",,,,,,,,,"" 2013-08-1220:43:41.514CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,3,"startup",2013-08-1220:43:41CST,2/0,0,FATAL,53300,"numberofrequestedstandbyconnectionsexceedsmax_wal_senders(currently0)",,,,,,,,,"walreceiver" 2013-08-1220:43:41.515CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,4,"startup",2013-08-1220:43:41CST,,0,LOG,00000,"disconnection:sessiontime:0:00:00.002user=repuserdatabase=host=192.25.10.71port=46086",,,,,,,,,"wa lreceiver" ...
遇到如此问题,则需要检查postgresql.conf文件的max_wal_senders参数了。
3.参数生效
文档上说明需要重启DB使之生效,如果动态使之生效会报错如下:
[postgres@proxy1]$psql psql(9.2.4) Type"help"forhelp. postgres=#showmax_wal_senders; max_wal_senders ----------------- 0 (1row) postgres=#setmax_wal_senders=1; ERROR:parameter"max_wal_senders"cannotbechangedwithoutrestartingtheserver postgres=#
补充:postgresql基于流复制(streamingreplication)的warm-standby
实例一枚:
Primary:
l归档设置:
Wal_level=archive Archive_mode=on archive_command='cp-i%p/data/pgsql/archived_wal/%f'
l流复制相关设置:
max_wal_senders='10'#启动复制进程数量限制,必须大于0 max_replication_slots='10'#为使用replicationslot,必须大于0;replicationslot作用是保证wal没有同步到standby之前不能从pg_xlog移走; wal_keep_segments='50'#指定pg_xlog中最少保留的wal数量 selectpg_create_physical_replication_slot(‘gp1_a_slot');#创建replicationslot select*frompg_replication_slots;#查询创建的replicationslot
l编辑pg_hba.conf
#Allowreplicationconnectionsfromlocalhost,byauserwiththereplicationprivilege. #hostreplicationpostgres127.0.0.1/32trust #hostreplicationpostgres::1/128trust localreplicationpostgrestrust hostreplicationpostgres192.168.12.0/24trust
l联机备份过程(基础备份)
#touch/var/lib/pgsql/backup_in_progress $psql–c"selectpg_start_backup('hot_backup');" $tar-cf/var/lib/pgsql/backup.tar/var/lib/pgsql/data/ $psql-c"selectpg_stop_backup();" #rm/var/lib/pgsql/backup_in_progress tar-rf/var/lib/pgsql/backup.tar/var/lib/pgsql/archive/#打包归档
实例:
psql-c"selectpg_start_backup('pgbk10');" tar-zcfpgbk10.tgzdata/ psql-c"selectpg_stop_backup();"
Standby:
l编辑recovery.conf
standby_mode='on' primary_conninfo='host=192.168.12.38port=5666user=postgres' primary_slot_name='gp1_a_slot' #restore_command='cp/data/pgsql/archived_wal/%f%p' #archive_cleanup_command='pg_archivecleanup/data/pgsql/archived_wal%r'
l将primary上的基础备份传输到standby上
$scpprimary:/var/lib/pgsql/backup.tar.
解压备份到standby上的$PGDATA
l启动standby
$pg_ctlstart–D$PGDATA
启动standby后,postgres开始从primary上接收wal日志进行恢复,并且一直保持恢复状态,psql不能登录;
以上为个人经验,希望能给大家一个参考,也希望大家多多支持毛票票。如有错误或未考虑完全的地方,望不吝赐教。