https://yq.aliyun.com/articles/238882?spm=5176.8067842.tagmain.18.73PjU3

摘要: MHA failover GTID 专题 这里以masterha_master_switch为背景详解各种可能遇到的场景 假定环境(经典三节点) host_1(host_1:3306) (current master) +--host_2(host_2:3306 slave[candidat...

这里以masterha_master_switch为背景详解各种可能遇到的场景

假定环境(经典三节点)

host_1(host_1:3306) (current master)
 +--host_2(host_2:3306 slave[candidate master])
 +--host_3(host_3:3306 etl)


一、Master : MySQL down

1.1 etl 延迟8小时

配置文件中加上no_check_delay=0 即可忽略报错

1.2 slave(候选master)比etl还要落后更多

  • 1.2.1 当master的部分日志还没传递两个slave,这时候master 上的MySQL挂了
### 模拟现场,现场的3台DB gtid状态

* master host_2

dba:lc> show master status;
+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+
| File                | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                                                                        |
+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+
| host_1.000002 |     2885 |              |                  | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 |
+---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


* slave (candidate master) host_1

           Retrieved_Gtid_Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353
            Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446353
                Auto_Position: 1

* etl (other slave) host_3

           Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:4-16,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353-446356
            Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16,
ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446356
                Auto_Position: 1



### 切换日志
masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf  --dead_master_host=host_2  --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error

Thu Nov  9 10:43:49 2017 - [info] MHA::MasterFailover version 0.56.
Thu Nov  9 10:43:49 2017 - [info] Starting master failover.
Thu Nov  9 10:43:49 2017 - [info]
Thu Nov  9 10:43:49 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Nov  9 10:43:49 2017 - [info]
Thu Nov  9 10:43:50 2017 - [info] HealthCheck: SSH to host_2 is reachable.
Thu Nov  9 10:43:50 2017 - [info] Binlog server host_2 is reachable.
Thu Nov  9 10:43:50 2017 - [info] HealthCheck: SSH to host_1 is reachable.
Thu Nov  9 10:43:50 2017 - [info] Binlog server host_1 is reachable.
Thu Nov  9 10:43:50 2017 - [info] HealthCheck: SSH to host_3 is reachable.
Thu Nov  9 10:43:50 2017 - [info] Binlog server host_3 is reachable.
Thu Nov  9 10:43:51 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306)
Thu Nov  9 10:43:51 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306)
Thu Nov  9 10:43:51 2017 - [info] GTID failover mode = 1
Thu Nov  9 10:43:51 2017 - [info] Dead Servers:
Thu Nov  9 10:43:51 2017 - [info]   host_2(host_2:3306)
Thu Nov  9 10:43:51 2017 - [info] Checking master reachability via MySQL(double check)...
Thu Nov  9 10:43:51 2017 - [info]  ok.
Thu Nov  9 10:43:51 2017 - [info] Alive Servers:
Thu Nov  9 10:43:51 2017 - [info]   host_1(host_1:3306)
Thu Nov  9 10:43:51 2017 - [info]   host_3(host_3:3306)
Thu Nov  9 10:43:51 2017 - [info] Alive Slaves:
Thu Nov  9 10:43:51 2017 - [info]   host_1(host_1:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 10:43:51 2017 - [info]     GTID ON
Thu Nov  9 10:43:51 2017 - [info]     Replicating from host_2(host_2:3306)
Thu Nov  9 10:43:51 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov  9 10:43:51 2017 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 10:43:51 2017 - [info]     GTID ON
Thu Nov  9 10:43:51 2017 - [info]     Replicating from host_2(host_2:3306)
Thu Nov  9 10:43:51 2017 - [info]     Not candidate for the new Master (no_master is set)
Thu Nov  9 10:43:51 2017 - [info]  Starting SQL thread on host_1(host_1:3306) ..
Thu Nov  9 10:43:51 2017 - [info]   done.
Thu Nov  9 10:43:51 2017 - [info]  Starting SQL thread on host_3(host_3:3306) ..
Thu Nov  9 10:43:51 2017 - [info]   done.
Thu Nov  9 10:43:51 2017 - [info] Starting GTID based failover.
Thu Nov  9 10:43:51 2017 - [info]
Thu Nov  9 10:43:51 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Nov  9 10:43:51 2017 - [info]
Thu Nov  9 10:43:51 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Nov  9 10:43:51 2017 - [info]
Thu Nov  9 10:43:51 2017 - [info] HealthCheck: SSH to host_2 is reachable.
Thu Nov  9 10:43:51 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Nov  9 10:43:51 2017 - [info] Executing master IP deactivation script:
Thu Nov  9 10:43:51 2017 - [info]   /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root
Thu Nov  9 10:43:53 2017 - [info]  done.
Thu Nov  9 10:43:53 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Nov  9 10:43:53 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info] * Phase 3: Master Recovery Phase..
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info] The latest binary log file/position on all slaves is host_1.000002:1115
Thu Nov  9 10:43:53 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:4-16,
Thu Nov  9 10:43:53 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Nov  9 10:43:53 2017 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 10:43:53 2017 - [info]     GTID ON
Thu Nov  9 10:43:53 2017 - [info]     Replicating from host_2(host_2:3306)
Thu Nov  9 10:43:53 2017 - [info]     Not candidate for the new Master (no_master is set)
Thu Nov  9 10:43:53 2017 - [info] The oldest binary log file/position on all slaves is host_1.000002:230
Thu Nov  9 10:43:53 2017 - [info] Retrieved Gtid Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353
Thu Nov  9 10:43:53 2017 - [info] Oldest slaves:
Thu Nov  9 10:43:53 2017 - [info]   host_1(host_1:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 10:43:53 2017 - [info]     GTID ON
Thu Nov  9 10:43:53 2017 - [info]     Replicating from host_2(host_2:3306)
Thu Nov  9 10:43:53 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info] * Phase 3.3: Determining New Master Phase..
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info] Searching new master from slaves..
Thu Nov  9 10:43:53 2017 - [info]  Candidate masters from the configuration file:
Thu Nov  9 10:43:53 2017 - [info]   host_1(host_1:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 10:43:53 2017 - [info]     GTID ON
Thu Nov  9 10:43:53 2017 - [info]     Replicating from host_2(host_2:3306)
Thu Nov  9 10:43:53 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Nov  9 10:43:53 2017 - [info]  Non-candidate masters:
Thu Nov  9 10:43:53 2017 - [info]   host_3(host_3:3306)  Version=5.7.13-log (oldest major version between slaves) log-bin:enabled
Thu Nov  9 10:43:53 2017 - [info]     GTID ON
Thu Nov  9 10:43:53 2017 - [info]     Replicating from host_2(host_2:3306)
Thu Nov  9 10:43:53 2017 - [info]     Not candidate for the new Master (no_master is set)
Thu Nov  9 10:43:53 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Nov  9 10:43:53 2017 - [info]   Not found.
Thu Nov  9 10:43:53 2017 - [info]  Searching from all candidate_master slaves..
Thu Nov  9 10:43:53 2017 - [info] New master is host_1(host_1:3306)
Thu Nov  9 10:43:53 2017 - [info] Starting master failover..
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info] * Phase 3.3: New Master Recovery Phase..
Thu Nov  9 10:43:53 2017 - [info]
Thu Nov  9 10:43:53 2017 - [info]  Waiting all logs to be applied..
Thu Nov  9 10:43:53 2017 - [info]   done.
Thu Nov  9 10:43:53 2017 - [info]  Replicating from the latest slave host_3(host_3:3306) and waiting to apply..
Thu Nov  9 10:43:53 2017 - [info]  Waiting all logs to be applied on the latest slave..
Thu Nov  9 10:43:53 2017 - [info]  Resetting slave host_1(host_1:3306) and starting replication from the new master host_3(host_3:3306)..
Thu Nov  9 10:43:53 2017 - [info]  Executed CHANGE MASTER.
Thu Nov  9 10:43:54 2017 - [info]  Slave started.
Thu Nov  9 10:43:54 2017 - [info]  Waiting to execute all relay logs on host_1(host_1:3306)..
Thu Nov  9 10:43:54 2017 - [info]  master_pos_wait(host_3.000049:18041) completed on host_1(host_1:3306). Executed 0 events.
Thu Nov  9 10:43:54 2017 - [info]   done.
Thu Nov  9 10:43:54 2017 - [info]   done.
Thu Nov  9 10:43:54 2017 - [info] -- Saving binlog from host host_2 started, pid: 150294
Thu Nov  9 10:43:54 2017 - [info] -- Saving binlog from host host_1 started, pid: 150295
Thu Nov  9 10:43:54 2017 - [info] -- Saving binlog from host host_3 started, pid: 150297
Thu Nov  9 10:43:54 2017 - [info]
Thu Nov  9 10:43:54 2017 - [info] Log messages from host_1 ...
Thu Nov  9 10:43:54 2017 - [info]
Thu Nov  9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_1..
Thu Nov  9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=host_1.000002  --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log  --binlog_dir=/data/mysql.bin
Thu Nov  9 10:43:54 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?
Thu Nov  9 10:43:54 2017 - [info] End of log messages from host_1.
Thu Nov  9 10:43:54 2017 - [warning] Got error from host_1.
Thu Nov  9 10:43:54 2017 - [info]
Thu Nov  9 10:43:54 2017 - [info] Log messages from host_3 ...
Thu Nov  9 10:43:54 2017 - [info]
Thu Nov  9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_3..
Thu Nov  9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file=host_1.000002  --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log  --binlog_dir=/data/mysql.bin
Thu Nov  9 10:43:54 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt?
Thu Nov  9 10:43:54 2017 - [info] End of log messages from host_3.
Thu Nov  9 10:43:54 2017 - [warning] Got error from host_3.
Thu Nov  9 10:43:55 2017 - [info]
Thu Nov 

相关文章: