【发布时间】:2016-03-01 23:02:31
【问题描述】:
所以我正在使用一些本地虚拟机测试一些玩具 postgresql 基础架构,以确定 pgpool 在故障转移时的行为。我已经配置了一个基本设置,其中有两台数据库机器(192.168.0.2 和 192.168.0.3)和一台 pgpool 机器(192.168.0.4)。 192.168.0.3 已使用流复制设置为 192.168.0.2 的从属设备。 pgpool-ii 已使用以下配置:
listen_addresses = '*'
backend_hostname0 = '192.168.0.2'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.4/main/'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.0.3'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.4/main/'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = on
replication_mode = false
master_slave_mode = on
master_slave_sub_mode = 'stream'
fail_over_on_backend_error = true
failover_command = '/root/pgpool_failover_stream.sh %d %H /tmp/postgresql.trigger.5432'
load_balance_mode = false
我已经确认这一切正常。也就是说,当我更改 master 数据库时,复制正在工作,我可以通过示例应用程序连接到 master、slave 和 pgpool-ii 并获得我期望的结果。
现在,我启动了一个连接到 pgpool 的长时间运行的应用程序,然后尝试通过 SSH 连接到主数据库服务器并强制结束 postgres 任务(service postgresql stop 作为 root)来进行故障转移。我的应用程序继续正确执行查询,但没有发生故障转移(脚本尚未运行)。我什至测试过直接连接到主数据库,当我停止 postgres 服务时,我确实最终导致应用程序崩溃。
我做错了吗?我没有正确配置我的 pgpool 吗?还是有更好的方法来触发故障转移?
编辑:根据要求,这里是第一个错误发生的日志部分:
...
2016-03-15 18:47:15: pid 1232: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1231: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1230: DEBUG: initializing backend status
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: LOG: find_primary_node: checking backend no 1
2016-03-15 18:47:15: pid 1209: ERROR: failed to authenticate
2016-03-15 18:47:15: pid 1209: DETAIL: invalid authentication message response type, Expecting 'R' and received 'E'
2016-03-15 18:47:15: pid 1209: DEBUG: find_primary_node: no primary node found
...
奇怪的是,我仍然可以连接到 pgpool 并执行查询,所以很明显我不明白那里的东西。
编辑 2:这些是我在主服务器上 service postgresql shutdown 之后得到的错误。我展示了一切,直到 pgpool 开始关闭。
...
2016-03-16 17:24:57: pid 1012: DEBUG: session context: clearing doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting doing extended query messaging. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: reading backend data packet kind
2016-03-16 17:24:57: pid 1012: DETAIL: backend:0 of 2 kind = 'E'
2016-03-16 17:24:57: pid 1012: DEBUG: processing backend response
2016-03-16 17:24:57: pid 1012: DETAIL: received kind 'E'(45) from backend
2016-03-16 17:24:57: pid 1012: ERROR: unable to forward message to frontend
2016-03-16 17:24:57: pid 1012: DETAIL: FATAL error occured on backend
2016-03-16 17:24:57: pid 1012: DEBUG: session context: setting query in progress. DONE
2016-03-16 17:24:57: pid 1012: DEBUG: decide where to send the queries
2016-03-16 17:24:57: pid 1012: DETAIL: destination = 3 for query= "DISCARD ALL"
2016-03-16 17:24:57: pid 1012: DEBUG: waiting for query response
2016-03-16 17:24:57: pid 1012: DETAIL: waiting for backend:0 to complete the query
2016-03-16 17:24:57: pid 1012: FATAL: unable to read data from DB node 0
2016-03-16 17:24:57: pid 1012: DETAIL: EOF encountered with backend
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler
2016-03-16 17:24:57: pid 998: LOG: child process with pid: 1012 exits with status 256
2016-03-16 17:24:57: pid 998: LOG: fork a new child process with pid: 1033
2016-03-16 17:24:57: pid 998: DEBUG: reaper handler: exiting normally
2016-03-16 17:24:57: pid 1033: DEBUG: initializing backend status
2016-03-16 17:25:02: pid 1031: DEBUG: PCP child receives shutdown request signal 2
2016-03-16 17:25:02: pid 1029: LOG: child process received shutdown request signal 2
...
请注意,当主服务器关闭时,我的示例应用程序实际上确实死了。
编辑 3:在正确设置 sr_check_period、sr_check_user、sr_check_password 后,我在新日志中遇到的错误,现在所有以前的错误都消失了:
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: 1
2016-03-31 17:45:00: pid 18363: DEBUG: reading backend data packet kind
2016-03-31 17:45:00: pid 18363: DETAIL: backend:0 of 2 kind = '1'
...
2016-03-31 17:45:00: pid 18363: DEBUG: detect error: kind: S
【问题讨论】:
标签: postgresql failover postgresql-9.4 pgpool