
72679 ワード

MHAはこのような優雅な方法を提供し、業務0.5~2 sの時間を塞ぐだけで、この間、業務は読み取りと書き込みができない.
Candicate master読み
Monitor hostモニタクラスタグループ
# masterha_stop --conf=/etc/masterha/app1.cnf
#/usr/local/bin/masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host= --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
--running_updates_Limitのデフォルトは1 sです.つまり、プライマリスレーブ遅延時間(Seconds_Behind_Master)またはmaster show processlistのdml操作が1 sより大きい場合、切り替えは実行されません.
Tue Apr 11 15:28:32 2017 - [info] MHA::MasterRotate version 0.56.
Tue Apr 11 15:28:32 2017 - [info] Starting online master switch..
Tue Apr 11 15:28:32 2017 - [info] 
Tue Apr 11 15:28:32 2017 - [info] * Phase 1: Configuration Check Phase..
Tue Apr 11 15:28:32 2017 - [info] 
Tue Apr 11 15:28:32 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 11 15:28:32 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Apr 11 15:28:32 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Apr 11 15:28:34 2017 - [info] GTID failover mode = 0
Tue Apr 11 15:28:34 2017 - [info] Current Alive Master:
Tue Apr 11 15:28:34 2017 - [info] Alive Slaves:
Tue Apr 11 15:28:34 2017 - [info]  Version=5.6.31-log (oldest major version between slaves) log
-bin:enabledTue Apr 11 15:28:34 2017 - [info]     Replicating from
Tue Apr 11 15:28:34 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 11 15:28:34 2017 - [info]  Version=5.6.31-log (oldest major version between slaves) log
-bin:enabledTue Apr 11 15:28:34 2017 - [info]     Replicating from

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on
.244.10:3306)? (YES/no): yes
Tue Apr 11 15:28:47 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Tue Apr 11 15:28:47 2017 - [info] ok. Tue Apr 11 15:28:47 2017 - [info] Checking MHA is not monitoring or doing failover.. Tue Apr 11 15:28:47 2017 - [info] Checking replication health on Tue Apr 11 15:28:47 2017 - [info] ok. Tue Apr 11 15:28:47 2017 - [info] Checking replication health on Tue Apr 11 15:28:47 2017 - [info] ok. Tue Apr 11 15:28:47 2017 - [info] can be new master. Tue Apr 11 15:28:47 2017 - [info] From: (current master) +-- +-- To: (new master) +-- +-- Starting master switch from to (yes/NO): yes Tue Apr 11 15:29:00 2017 - [info] Checking whether is ok for the new master.. Tue Apr 11 15:29:00 2017 - [info] ok. Tue Apr 11 15:29:00 2017 - [info] SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.Tue Apr 11 15:29:00 2017 - [info] Resetting slave pointing to the dummy host. Tue Apr 11 15:29:00 2017 - [info] ** Phase 1: Configuration Check Phase completed. Tue Apr 11 15:29:00 2017 - [info] Tue Apr 11 15:29:00 2017 - [info] * Phase 2: Rejecting updates Phase.. Tue Apr 11 15:29:00 2017 - [info] Tue Apr 11 15:29:00 2017 - [info] Executing master ip online change script to disable write on the current master: Tue Apr 11 15:29:00 2017 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host= --orig_ma ster_ip= --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host= --new_master_ip= --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr 11 15:29:00 2017 476501 Set read_only on the new master.. ok. Tue Apr 11 15:29:00 2017 911951 Set read_only=1 on the orig master.. ok. Tue Apr 11 15:29:00 2017 919517 Killing all application threads.. Tue Apr 11 15:29:00 2017 919552 done. Disabling the VIP an old master: SIOCSIFFLAGS: Cannot assign requested address Tue Apr 11 15:29:00 2017 - [info] ok. Tue Apr 11 15:29:00 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Tue Apr 11 15:29:00 2017 - [info] Executing FLUSH TABLES WITH READ LOCK.. Tue Apr 11 15:29:00 2017 - [info] ok. Tue Apr 11 15:29:00 2017 - [info] Orig master binlog:pos is mysql-bin.000016:211. Tue Apr 11 15:29:00 2017 - [info] Waiting to execute all relay logs on Tue Apr 11 15:29:01 2017 - [info] master_pos_wait(mysql-bin.000016:211) completed on Executed 0 events.Tue Apr 11 15:29:01 2017 - [info] done. Tue Apr 11 15:29:01 2017 - [info] Getting new master's binlog name and position.. Tue Apr 11 15:29:01 2017 - [info] mysql-bin.000009:211 Tue Apr 11 15:29:01 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_ HOST='', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=211, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Apr 11 15:29:01 2017 - [info] Executing master ip online change script to allow write on the new master: Tue Apr 11 15:29:01 2017 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host= --orig_m aster_ip= --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host= --new_master_ip= --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr 11 15:29:01 2017 109040 Set read_only=0 on the new master. Enabling the VIP on the new master: Tue Apr 11 15:29:01 2017 - [info] ok. Tue Apr 11 15:29:01 2017 - [info] Tue Apr 11 15:29:01 2017 - [info] * Switching slaves in parallel.. Tue Apr 11 15:29:01 2017 - [info] Tue Apr 11 15:29:01 2017 - [info] -- Slave switch on host started, pid: 17651 Tue Apr 11 15:29:01 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] Log messages from ... Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:01 2017 - [info] Waiting to execute all relay logs on Tue Apr 11 15:29:01 2017 - [info] master_pos_wait(mysql-bin.000016:211) completed on Executed 0 events.Tue Apr 11 15:29:01 2017 - [info] done. Tue Apr 11 15:29:01 2017 - [info] Resetting slave and starting replication from the new master 1 Apr 11 15:29:01 2017 - [info] Executed CHANGE MASTER. Tue Apr 11 15:29:01 2017 - [info] Slave started. Tue Apr 11 15:29:02 2017 - [info] End of log messages from ... Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] -- Slave switch on host succeeded. Tue Apr 11 15:29:02 2017 - [info] Unlocking all tables on the orig master: Tue Apr 11 15:29:02 2017 - [info] Executing UNLOCK TABLES.. Tue Apr 11 15:29:02 2017 - [info] ok. Tue Apr 11 15:29:02 2017 - [info] Starting orig master as a new slave.. Tue Apr 11 15:29:02 2017 - [info] Resetting slave and starting replication from the new master 1 Apr 11 15:29:02 2017 - [info] Executed CHANGE MASTER. Tue Apr 11 15:29:02 2017 - [info] Slave started. Tue Apr 11 15:29:02 2017 - [info] All new slave servers switched successfully. Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] * Phase 5: New master cleanup phase.. Tue Apr 11 15:29:02 2017 - [info] Tue Apr 11 15:29:02 2017 - [info] Resetting slave info succeeded. Tue Apr 11 15:29:02 2017 - [info] Switching master to completed successfully.

MHAを読み出すプロファイル/etc/masterha/app 1を含む.cnfおよび現在のslaveの健康状態の検査
1>1.5 s($time_until_kill_threads*100 ms)待ち、現在の接続が切断されるのを待ちます.
3>0.5 s待ち、現在のDML操作の完了待ち.
3.新しいマスターがすべてのrelay logを実行するのを待つ
Waiting to execute all relay logs on

1>slave(がコピーから生成したrelay logを元のマスターに適用した後、change master操作を実行して新しいmasterに切り替えるのを待つ.
主にreset slave all操作を実行し、前のレプリケーション情報をクリアします.
2.すべてのslaveのSeconds_Behind_Masterがrunning_以下であるupdates_limitの値です.このパラメータに指定が表示されていない場合は、デフォルトは1 sです.
3.マスター上でshow processlist出力により、running_より大きなDML動作時間は1つもないupdates_limitの値.
オンライン切り替え時にGeneral logを開き、各サーバの操作情報
1. It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on (YES/no):
2. Starting master switch from to (yes/NO):
170412 16:52:38    23 Connect    monitor@node4 on 
                   23 Query    set autocommit=1
                   23 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:39    24 Connect    monitor@node4 on 
                   24 Query    set autocommit=1
                   24 Query    SELECT CONNECTION_ID() AS Value
                   24 Query    SET wait_timeout=86400
                   24 Query    SELECT @@global.server_id As Value
                   24 Query    SELECT VERSION() AS Value
                   24 Query    SELECT @@global.gtid_mode As Value
                   24 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   24 Query    SHOW MASTER STATUS
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.slave_parallel_workers AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT @@global.read_only As Value
                   24 Query    SELECT @@global.relay_log_purge As Value

170412 16:54:06    24 Query    FLUSH NO_WRITE_TO_BINLOG TABLES
                   24 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value
                   24 Query    SHOW PROCESSLIST

170412 16:55:51    24 Query    SHOW SLAVE STATUS
                   24 Query    CHANGE MASTER TO MASTER_HOST='dummy_host'
170412 16:55:52    24 Query    SHOW SLAVE STATUS
                   24 Query    RESET SLAVE /*!50516 ALL */
                   24 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value
                   24 Quit    
                   25 Connect    monitor@node4 on 
                   25 Query    set autocommit=1
                   25 Query    SELECT CONNECTION_ID() AS Value
                   25 Query    SET sql_log_bin=0
                   25 Query    SHOW PROCESSLIST
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SET GLOBAL read_only=1
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SHOW PROCESSLIST
                   25 Query    SET sql_log_bin=1
                   25 Quit    
                   26 Connect    monitor@node4 on 
                   26 Query    set autocommit=1
                   26 Query    SELECT CONNECTION_ID() AS Value
                   26 Query    SET wait_timeout=86400
                   26 Query    FLUSH TABLES WITH READ LOCK
                   26 Query    SHOW MASTER STATUS
170412 16:55:53    26 Query    UNLOCK TABLES
                   26 Query    CHANGE MASTER TO MASTER_HOST = '' MASTER_USER = 'repl' MASTER_PASSWORD =  MASTE
R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120           26 Query    SET GLOBAL relay_log_purge=0
                   26 Query    START SLAVE
                   27 Connect Out    repl@
                   26 Query    SHOW SLAVE STATUS
                   26 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   26 Quit    

170412 16:52:38    23 Connect    monitor@node4 on 
                   23 Query    set autocommit=1
                   23 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:39    24 Connect    monitor@node4 on 
                   24 Query    set autocommit=1
                   24 Query    SELECT CONNECTION_ID() AS Value
                   24 Query    SET wait_timeout=86400
                   24 Query    SELECT @@global.server_id As Value
                   24 Query    SELECT VERSION() AS Value
                   24 Query    SELECT @@global.gtid_mode As Value
                   24 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   24 Query    SHOW MASTER STATUS
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.slave_parallel_workers AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT @@global.read_only As Value
                   24 Query    SELECT @@global.relay_log_purge As Value
                   24 Query    SELECT @@global.relay_log_info_repository AS Value
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.relay_log_info_file AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'

170412 16:54:06    24 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SHOW SLAVE STATUS

170412 16:55:52    24 Query    SHOW PROCESSLIST
                   25 Connect    monitor@node4 on 
                   25 Query    set autocommit=1
                   25 Query    SELECT CONNECTION_ID() AS Value
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SELECT @@global.read_only As Value
                   25 Quit    
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result
                   24 Query    STOP SLAVE SQL_THREAD
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SHOW MASTER STATUS
                   26 Connect    monitor@node4 on 
                   26 Query    set autocommit=1
                   26 Query    SELECT CONNECTION_ID() AS Value
                   26 Query    SET sql_log_bin=0
                   26 Query    SELECT @@global.read_only As Value
                   26 Query    SET GLOBAL read_only=0
                   26 Query    SET sql_log_bin=1
                   26 Quit    
                   24 Query    SELECT @@global.read_only As Value
                   27 Connect    repl@node3 on 
                   27 Query    SELECT UNIX_TIMESTAMP()
                   27 Query    SHOW VARIABLES LIKE 'SERVER_ID'
                   27 Query    SET @master_heartbeat_period= 1799999979520
                   27 Query    SET @master_binlog_checksum= @@global.binlog_checksum
                   27 Query    SELECT @master_binlog_checksum
                   27 Query    SELECT @@GLOBAL.GTID_MODE
                   27 Query    SHOW VARIABLES LIKE 'SERVER_UUID'
                   27 Query    SET @slave_uuid= '8a1093c8-1d00-11e7-954f-000c299a5715'
                   27 Binlog Dump    Log: 'mysql-bin.000010'  Pos: 120
170412 16:55:53    28 Connect    repl@node1 on 
                   28 Query    SELECT UNIX_TIMESTAMP()
                   28 Query    SHOW VARIABLES LIKE 'SERVER_ID'
                   28 Query    SET @master_heartbeat_period= 1799999979520
                   28 Query    SET @master_binlog_checksum= @@global.binlog_checksum
                   28 Query    SELECT @master_binlog_checksum
                   28 Query    SELECT @@GLOBAL.GTID_MODE
                   28 Query    SHOW VARIABLES LIKE 'SERVER_UUID'
                   24 Query    STOP SLAVE
                   28 Query    SET @slave_uuid= '2a6365e0-1d05-11e7-956d-000c29c64704'
                   28 Binlog Dump    Log: 'mysql-bin.000010'  Pos: 120
                   24 Query    SHOW SLAVE STATUS
                   24 Query    RESET SLAVE /*!50516 ALL */
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   24 Quit    

170412 16:52:37    16 Connect    monitor@node4 on 
                   16 Query    set autocommit=1
                   16 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:38    17 Connect    monitor@node4 on 
                   17 Query    set autocommit=1
                   17 Query    SELECT CONNECTION_ID() AS Value
                   17 Query    SET wait_timeout=86400
                   17 Query    SELECT @@global.server_id As Value
                   17 Query    SELECT VERSION() AS Value
                   17 Query    SELECT @@global.gtid_mode As Value
                   17 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   17 Query    SHOW MASTER STATUS
                   17 Query    SELECT @@global.datadir AS Value
                   17 Query    SELECT @@global.slave_parallel_workers AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT @@global.read_only As Value
                   17 Query    SELECT @@global.relay_log_purge As Value
                   17 Query    SELECT @@global.relay_log_info_repository AS Value
                   17 Query    SELECT @@global.datadir AS Value
                   17 Query    SELECT @@global.relay_log_info_file AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'

170412 16:54:05    17 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SHOW SLAVE STATUS

170412 16:55:50    17 Query    SHOW SLAVE STATUS
170412 16:55:51    17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result
                   17 Query    STOP SLAVE SQL_THREAD
                   17 Query    SHOW SLAVE STATUS
                   17 Query    STOP SLAVE
                   17 Query    STOP SLAVE
                   17 Query    SHOW SLAVE STATUS
                   17 Query    RESET SLAVE
                   17 Query    CHANGE MASTER TO MASTER_HOST = '' MASTER_USER = 'repl' MASTER_PASSWORD =  MASTE
R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120           17 Query    SET GLOBAL relay_log_purge=0
                   17 Query    START SLAVE
                   18 Connect Out    repl@
                   17 Query    SHOW SLAVE STATUS
170412 16:55:52    17 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   17 Quit    
