MySQL高可用性方案MHAオンライン切替の手順と原理

72679 ワード

日常の仕事では、mysqlデータベースのアップグレード、プライマリサーバのハードウェアのアップグレードなど、書き込み操作を別のサーバに切り替える必要がある場合がありますが、オンライン切り替えはどうすればいいのでしょうか.また,スイッチングプロセスが短く,業務への影響が小さいことが要求される.
MHAはこのような優雅な方法を提供し、業務0.5~2 sの時間を塞ぐだけで、この間、業務は読み取りと書き込みができない.

クラスタ情報
ロールIPアドレスサーバIDタイプ
Master 192.168.244.110書き込み
Candicate master 192.168.244.22読み
Slave 192.168.244.33読み
Monitor host 192.168.244.40モニタクラスタグループ

MHAの具体的な構築手順と原理は、別のブログを参考にすることができます.
MySQL高可用性スキームMHAの導入と原理

オンライン切り替えの手順
1.MHAモニタをオフにする
# masterha_stop --conf=/etc/masterha/app1.cnf
2.オンライン切り替え
#/usr/local/bin/masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.244.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
ここで、
--orig_master_is_new_slaveは元のmasterを新しいマスターに切り替えるslaveで、デフォルトでは追加されません.
--running_updates_Limitのデフォルトは1 sです.つまり、プライマリスレーブ遅延時間(Seconds_Behind_Master)またはmaster show processlistのdml操作が1 sより大きい場合、切り替えは実行されません.

オンライン切替出力

Tue Apr 11 15:28:32 2017 - [info] MHA::MasterRotate version 0.56.
Tue Apr 11 15:28:32 2017 - [info] Starting online master switch..
Tue Apr 11 15:28:32 2017 - [info] 
Tue Apr 11 15:28:32 2017 - [info] * Phase 1: Configuration Check Phase..
Tue Apr 11 15:28:32 2017 - [info] 
Tue Apr 11 15:28:32 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 11 15:28:32 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Apr 11 15:28:32 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Apr 11 15:28:34 2017 - [info] GTID failover mode = 0
Tue Apr 11 15:28:34 2017 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306)
Tue Apr 11 15:28:34 2017 - [info] Alive Slaves:
Tue Apr 11 15:28:34 2017 - [info]   192.168.244.20(192.168.244.20:3306)  Version=5.6.31-log (oldest major version between slaves) log
-bin:enabledTue Apr 11 15:28:34 2017 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Tue Apr 11 15:28:34 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 11 15:28:34 2017 - [info]   192.168.244.30(192.168.244.30:3306)  Version=5.6.31-log (oldest major version between slaves) log
-bin:enabledTue Apr 11 15:28:34 2017 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168
.244.10:3306)? (YES/no): yes
Tue Apr 11 15:28:47 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Apr 11 15:28:47 2017 - [info]  ok.
Tue Apr 11 15:28:47 2017 - [info] Checking MHA is not monitoring or doing failover..
Tue Apr 11 15:28:47 2017 - [info] Checking replication health on 192.168.244.20..
Tue Apr 11 15:28:47 2017 - [info]  ok.
Tue Apr 11 15:28:47 2017 - [info] Checking replication health on 192.168.244.30..
Tue Apr 11 15:28:47 2017 - [info]  ok.
Tue Apr 11 15:28:47 2017 - [info] 192.168.244.20 can be new master.
Tue Apr 11 15:28:47 2017 - [info] 
From:
192.168.244.10(192.168.244.10:3306) (current master)
 +--192.168.244.20(192.168.244.20:3306)
 +--192.168.244.30(192.168.244.30:3306)

To:
192.168.244.20(192.168.244.20:3306) (new master)
 +--192.168.244.30(192.168.244.30:3306)
 +--192.168.244.10(192.168.244.10:3306)

Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO): yes
Tue Apr 11 15:29:00 2017 - [info] Checking whether 192.168.244.20(192.168.244.20:3306) is ok for the new master..
Tue Apr 11 15:29:00 2017 - [info]  ok.
Tue Apr 11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): SHOW SLAVE STATUS returned empty result. To check replication 
filtering rules, temporarily executing CHANGE MASTER to a dummy host.Tue Apr 11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): Resetting slave pointing to the dummy host.
Tue Apr 11 15:29:00 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Apr 11 15:29:00 2017 - [info] 
Tue Apr 11 15:29:00 2017 - [info] * Phase 2: Rejecting updates Phase..
Tue Apr 11 15:29:00 2017 - [info] 
Tue Apr 11 15:29:00 2017 - [info] Executing master ip online change script to disable write on the current master:
Tue Apr 11 15:29:00 2017 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.244.10 --orig_ma
ster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveTue Apr 11 15:29:00 2017 476501 Set read_only on the new master.. ok.
Tue Apr 11 15:29:00 2017 911951 Set read_only=1 on the orig master.. ok.
Tue Apr 11 15:29:00 2017 919517 Killing all application threads..
Tue Apr 11 15:29:00 2017 919552 done.
Disabling the VIP an old master: 192.168.244.10 
SIOCSIFFLAGS: Cannot assign requested address
Tue Apr 11 15:29:00 2017 - [info]  ok.
Tue Apr 11 15:29:00 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Apr 11 15:29:00 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Apr 11 15:29:00 2017 - [info]  ok.
Tue Apr 11 15:29:00 2017 - [info] Orig master binlog:pos is mysql-bin.000016:211.
Tue Apr 11 15:29:00 2017 - [info]  Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..
Tue Apr 11 15:29:01 2017 - [info]  master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.20(192.168.244.20:3306). Executed 0
 events.Tue Apr 11 15:29:01 2017 - [info]   done.
Tue Apr 11 15:29:01 2017 - [info] Getting new master's binlog name and position..
Tue Apr 11 15:29:01 2017 - [info]  mysql-bin.000009:211
Tue Apr 11 15:29:01 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_
HOST='192.168.244.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=211, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Apr 11 15:29:01 2017 - [info] Executing master ip online change script to allow write on the new master:
Tue Apr 11 15:29:01 2017 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.244.10 --orig_m
aster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveTue Apr 11 15:29:01 2017 109040 Set read_only=0 on the new master.
Enabling the VIP 192.168.244.188 on the new master: 192.168.244.20 
Tue Apr 11 15:29:01 2017 - [info]  ok.
Tue Apr 11 15:29:01 2017 - [info] 
Tue Apr 11 15:29:01 2017 - [info] * Switching slaves in parallel..
Tue Apr 11 15:29:01 2017 - [info] 
Tue Apr 11 15:29:01 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) started, pid: 17651
Tue Apr 11 15:29:01 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info] Log messages from 192.168.244.30 ...
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:01 2017 - [info]  Waiting to execute all relay logs on 192.168.244.30(192.168.244.30:3306)..
Tue Apr 11 15:29:01 2017 - [info]  master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.30(192.168.244.30:3306). Executed 0
 events.Tue Apr 11 15:29:01 2017 - [info]   done.
Tue Apr 11 15:29:01 2017 - [info]  Resetting slave 192.168.244.30(192.168.244.30:3306) and starting replication from the new master 1
92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:01 2017 - [info]  Executed CHANGE MASTER.
Tue Apr 11 15:29:01 2017 - [info]  Slave started.
Tue Apr 11 15:29:02 2017 - [info] End of log messages from 192.168.244.30 ...
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) succeeded.
Tue Apr 11 15:29:02 2017 - [info] Unlocking all tables on the orig master:
Tue Apr 11 15:29:02 2017 - [info] Executing UNLOCK TABLES..
Tue Apr 11 15:29:02 2017 - [info]  ok.
Tue Apr 11 15:29:02 2017 - [info] Starting orig master as a new slave..
Tue Apr 11 15:29:02 2017 - [info]  Resetting slave 192.168.244.10(192.168.244.10:3306) and starting replication from the new master 1
92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:02 2017 - [info]  Executed CHANGE MASTER.
Tue Apr 11 15:29:02 2017 - [info]  Slave started.
Tue Apr 11 15:29:02 2017 - [info] All new slave servers switched successfully.
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info] * Phase 5: New master cleanup phase..
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info]  192.168.244.20: Resetting slave info succeeded.
Tue Apr 11 15:29:02 2017 - [info] Switching master to 192.168.244.20(192.168.244.20:3306) completed successfully.

MHAオンライン切替の原理
1.現在の構成情報及びマスタスレーブサーバの情報を確認する
MHAを読み出すプロファイル/etc/masterha/app 1を含む.cnfおよび現在のslaveの健康状態の検査
2.現在のmasterの更新をブロックする
主に次の手順に従います.
1>1.5 s($time_until_kill_threads*100 ms)待ち、現在の接続が切断されるのを待ちます.
2>read_の実行only=1、新しいDML操作を阻止
3>0.5 s待ち、現在のDML操作の完了待ち.
4>killはすべての接続を削除します.
5> FLUSH NO_WRITE_TO_BINLOG TABLES
6> FLUSH TABLES WITH READ LOCK
3.新しいマスターがすべてのrelay logを実行するのを待つ

Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..

4.新しいマスターのread_をonlyをoffに設定し、VIPを追加
5.slaveを新しいマスターに切り替えます.
1>slave(192.168.244.30)がコピーから生成したrelay logを元のマスターに適用した後、change master操作を実行して新しいmasterに切り替えるのを待つ.
2>元のmasterに追加されたロックを解除します.
3>因masterha_master_switchコマンドラインに--orig_master_is_new_slaveパラメータなので、元のmasterも新しいmasterのスレーブに切り替えます.
6.新しいマスターに関する情報を整理します.
主にreset slave all操作を実行し、前のレプリケーション情報をクリアします.

MHAオンライン切替に必要な条件
MHAはオンライン切替を実行する前に、現在の主従レプリケーション情報を判断し、以下の条件を満たしてこそ切替動作を実行することができる.
1.すべてのSLAVEのIOスレッドとSQLスレッドが実行されています.
2.すべてのslaveのSeconds_Behind_Masterがrunning_以下であるupdates_limitの値です.このパラメータに指定が表示されていない場合は、デフォルトは1 sです.
3.マスター上でshow processlist出力により、running_より大きなDML動作時間は1つもないupdates_limitの値.

オンライン切り替え時にGeneral logを開き、各サーバの操作情報
注意:masterhaを実行中master_switchコマンドの場合、2回確認操作があります
1. It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168.244.10:3306)? (YES/no):
2. Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO):
以下の出力の中間には2回のブランクがあり、そのうち1回目のブランクより前の出力は1回目の確認より前、2回目の前の出力は2回目の確認より前に対応する.

元master 192.168.244.10

170412 16:52:38    23 Connect    monitor@node4 on 
                   23 Query    set autocommit=1
                   23 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:39    24 Connect    monitor@node4 on 
                   24 Query    set autocommit=1
                   24 Query    SELECT CONNECTION_ID() AS Value
                   24 Query    SET wait_timeout=86400
                   24 Query    SELECT @@global.server_id As Value
                   24 Query    SELECT VERSION() AS Value
                   24 Query    SELECT @@global.gtid_mode As Value
                   24 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   24 Query    SHOW MASTER STATUS
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.slave_parallel_workers AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT @@global.read_only As Value
                   24 Query    SELECT @@global.relay_log_purge As Value

           
170412 16:54:06    24 Query    FLUSH NO_WRITE_TO_BINLOG TABLES
                   24 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value
                   24 Query    SHOW PROCESSLIST
                   

170412 16:55:51    24 Query    SHOW SLAVE STATUS
                   24 Query    CHANGE MASTER TO MASTER_HOST='dummy_host'
170412 16:55:52    24 Query    SHOW SLAVE STATUS
                   24 Query    RESET SLAVE /*!50516 ALL */
                   24 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value
                   24 Quit    
                   25 Connect    monitor@node4 on 
                   25 Query    set autocommit=1
                   25 Query    SELECT CONNECTION_ID() AS Value
                   25 Query    SET sql_log_bin=0
                   25 Query    SHOW PROCESSLIST
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SET GLOBAL read_only=1
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SHOW PROCESSLIST
                   25 Query    SET sql_log_bin=1
                   25 Quit    
                   26 Connect    monitor@node4 on 
                   26 Query    set autocommit=1
                   26 Query    SELECT CONNECTION_ID() AS Value
                   26 Query    SET wait_timeout=86400
                   26 Query    FLUSH TABLES WITH READ LOCK
                   26 Query    SHOW MASTER STATUS
170412 16:55:53    26 Query    UNLOCK TABLES
                   26 Query    CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD =  MASTE
R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120           26 Query    SET GLOBAL relay_log_purge=0
                   26 Query    START SLAVE
                   27 Connect Out    repl@192.168.244.20:3306
                   26 Query    SHOW SLAVE STATUS
                   26 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   26 Quit

新しいmaster 192.168.244.220

170412 16:52:38    23 Connect    monitor@node4 on 
                   23 Query    set autocommit=1
                   23 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:39    24 Connect    monitor@node4 on 
                   24 Query    set autocommit=1
                   24 Query    SELECT CONNECTION_ID() AS Value
                   24 Query    SET wait_timeout=86400
                   24 Query    SELECT @@global.server_id As Value
                   24 Query    SELECT VERSION() AS Value
                   24 Query    SELECT @@global.gtid_mode As Value
                   24 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   24 Query    SHOW MASTER STATUS
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.slave_parallel_workers AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT @@global.read_only As Value
                   24 Query    SELECT @@global.relay_log_purge As Value
                   24 Query    SELECT @@global.relay_log_info_repository AS Value
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.relay_log_info_file AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'

           
           
170412 16:54:06    24 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SHOW SLAVE STATUS

           
170412 16:55:52    24 Query    SHOW PROCESSLIST
                   25 Connect    monitor@node4 on 
                   25 Query    set autocommit=1
                   25 Query    SELECT CONNECTION_ID() AS Value
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SELECT @@global.read_only As Value
                   25 Quit    
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result
                   24 Query    STOP SLAVE SQL_THREAD
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SHOW MASTER STATUS
                   26 Connect    monitor@node4 on 
                   26 Query    set autocommit=1
                   26 Query    SELECT CONNECTION_ID() AS Value
                   26 Query    SET sql_log_bin=0
                   26 Query    SELECT @@global.read_only As Value
                   26 Query    SET GLOBAL read_only=0
                   26 Query    SET sql_log_bin=1
                   26 Quit    
                   24 Query    SELECT @@global.read_only As Value
                   27 Connect    repl@node3 on 
                   27 Query    SELECT UNIX_TIMESTAMP()
                   27 Query    SHOW VARIABLES LIKE 'SERVER_ID'
                   27 Query    SET @master_heartbeat_period= 1799999979520
                   27 Query    SET @master_binlog_checksum= @@global.binlog_checksum
                   27 Query    SELECT @master_binlog_checksum
                   27 Query    SELECT @@GLOBAL.GTID_MODE
                   27 Query    SHOW VARIABLES LIKE 'SERVER_UUID'
                   27 Query    SET @slave_uuid= '8a1093c8-1d00-11e7-954f-000c299a5715'
                   27 Binlog Dump    Log: 'mysql-bin.000010'  Pos: 120
170412 16:55:53    28 Connect    repl@node1 on 
                   28 Query    SELECT UNIX_TIMESTAMP()
                   28 Query    SHOW VARIABLES LIKE 'SERVER_ID'
                   28 Query    SET @master_heartbeat_period= 1799999979520
                   28 Query    SET @master_binlog_checksum= @@global.binlog_checksum
                   28 Query    SELECT @master_binlog_checksum
                   28 Query    SELECT @@GLOBAL.GTID_MODE
                   28 Query    SHOW VARIABLES LIKE 'SERVER_UUID'
                   24 Query    STOP SLAVE
                   28 Query    SET @slave_uuid= '2a6365e0-1d05-11e7-956d-000c29c64704'
                   28 Binlog Dump    Log: 'mysql-bin.000010'  Pos: 120
                   24 Query    SHOW SLAVE STATUS
                   24 Query    RESET SLAVE /*!50516 ALL */
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   24 Quit

slave 192.168.244.30

170412 16:52:37    16 Connect    monitor@node4 on 
                   16 Query    set autocommit=1
                   16 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:38    17 Connect    monitor@node4 on 
                   17 Query    set autocommit=1
                   17 Query    SELECT CONNECTION_ID() AS Value
                   17 Query    SET wait_timeout=86400
                   17 Query    SELECT @@global.server_id As Value
                   17 Query    SELECT VERSION() AS Value
                   17 Query    SELECT @@global.gtid_mode As Value
                   17 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   17 Query    SHOW MASTER STATUS
                   17 Query    SELECT @@global.datadir AS Value
                   17 Query    SELECT @@global.slave_parallel_workers AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT @@global.read_only As Value
                   17 Query    SELECT @@global.relay_log_purge As Value
                   17 Query    SELECT @@global.relay_log_info_repository AS Value
                   17 Query    SELECT @@global.datadir AS Value
                   17 Query    SELECT @@global.relay_log_info_file AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'

           
170412 16:54:05    17 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SHOW SLAVE STATUS
                   

170412 16:55:50    17 Query    SHOW SLAVE STATUS
170412 16:55:51    17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result
                   17 Query    STOP SLAVE SQL_THREAD
                   17 Query    SHOW SLAVE STATUS
                   17 Query    STOP SLAVE
                   17 Query    STOP SLAVE
                   17 Query    SHOW SLAVE STATUS
                   17 Query    RESET SLAVE
                   17 Query    CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD =  MASTE
R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120           17 Query    SET GLOBAL relay_log_purge=0
                   17 Query    START SLAVE
                   18 Connect Out    repl@192.168.244.20:3306
                   17 Query    SHOW SLAVE STATUS
170412 16:55:52    17 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   17 Quit

リファレンス
『深入浅出MySQL』

Oracle一括変更データ-Mergeでの小結の使用

Excel sheetのカラム名からカラム番号への変換