redisクラスタ実現(六)災害対応とダウンタイム復旧
17427 ワード
クラスタを実装するには、さまざまなソフトウェアやハードウェアの障害が発生してもサービスを提供できるように、高可用性を保証することが重要です.一般に2つの解決策があり,1つは各ノードが互いにデータインタラクションやモニタリングを行い,障害が発生した場合,各ノードが協調タスクを行うことである.もう1つは、クラスタをリアルタイムで監視し、障害処理するための調整コンポーネントを追加することです.現在広く使用されているのは第2のスキームであり、各モジュール間の低結合であり、エンジニアも先に簡単である(第1のスキームに比べて).前節ではraftプロトコルを紹介しましたが、raftプロトコルの基礎があればsentinelを理解するのも楽になると思います.redis内のsentinelはノードをリアルタイムでスキャンし、ダウンタイムのノードが見つかったらフェイルオーバ、選択者などの操作を行い、具体的な手順を見てみましょう.まず、3つのノードを持つsentinelクラスタを起動します.まず、sentinelのプロファイルを変更する必要があります.sentinelには、3つのノードを起動するには、ポートが異なる必要があります.port:変更する必要があります.dir:sentinelのランタイムディレクトリ.sentinel monitor:masterという名前のmasterを監視します.私たちはslaveを監視する必要はありません.masterを監視すると、slaveは自動的にsentinelに追加されます.後のquorumは合意の最小数を表し、少なくともquorum台の機械が合意してこそ、合意性を保証することができる.sentinel down-after-millisecondsは監視するノードが後で返事がないと主観的にラインオフされることを示し,quorum個のノードがラインオフされたと考えられると客観的にラインオフされる.sentinel parallel-syncsは、フェイルオーバ時にnumslavesが同期して新しいmasterを更新することが多いことを示しています.私たちが修正した3つのsentinel.confはsentinel 1である.conf,sentinel2.conf,sentinel3.conf、具体的な内容は以下の通り:sentinel 1.conf:
そして入力します
3つのsentinel擬似クラスタを構築することができ、3つのmasterとsentinelが識別されたことを示す印刷が表示されます.
一般的に、1つのmasterをラインオフすると、クラスタは使用不可になりますが、sentinelができ、masterラインオフするとすぐにフェイルオーバが実行され、短時間で利用可能になります.
最初は6つのノード、3つのmaster、3つのslaveがあり、状態は以下の通りです.
クラスタはmasterノードを停止した後,短い時間でフェイルオーバを処理し,クラスタはすぐに利用可能になり,元のslaveがmasterになったことを見出した.次のセクションでは、sentinelがフェイルオーバをどのように実現するかをソースコード階層から見てみましょう.分散クラスタでは、高可用性を保証することが重要です.
# Example sentinel.conf
# port
# The port that this sentinel instance will run on
port 27000
# dir
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
dir /tmp
# sentinel monitor
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Slaves are auto-discovered, so you don't need to specify slaves in
# any way. Sentinel itself will rewrite this configuration file adding
# the slaves using additional configuration options.
# Also note that the configuration file is rewritten when a
# slave is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel monitor master1 127.0.0.1 7000 2
sentinel monitor master2 127.0.0.1 7004 2
sentinel monitor master3 127.0.0.1 7005 2
# sentinel down-after-milliseconds
#
# Number of milliseconds the master (or any attached slave or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
sentinel down-after-milliseconds master1 30000
sentinel down-after-milliseconds master2 30000
sentinel down-after-milliseconds master3 30000
# sentinel parallel-syncs
#
# How many slaves we can reconfigure to point to the new slave simultaneously
# during the failover. Use a low number if you use the slaves to serve query
# to avoid that all the slaves will be unreachable at about the same
# time while performing the synchronization with the master.
sentinel parallel-syncs master1 1
sentinel parallel-syncs master2 1
sentinel parallel-syncs master3 1
# sentinel failover-timeout
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
# reconfigured as slaves of the new master. However even after this time
# the slaves will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
sentinel failover-timeout master1 180000
sentinel failover-timeout master2 180000
sentinel failover-timeout master3 180000
sentinel2.conf # Example sentinel.conf
# port
# The port that this sentinel instance will run on
port 27001
# dir
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
dir /tmp
# sentinel monitor
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Slaves are auto-discovered, so you don't need to specify slaves in
# any way. Sentinel itself will rewrite this configuration file adding
# the slaves using additional configuration options.
# Also note that the configuration file is rewritten when a
# slave is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel monitor master1 127.0.0.1 7000 2
sentinel monitor master2 127.0.0.1 7004 2
sentinel monitor master3 127.0.0.1 7005 2
# sentinel down-after-milliseconds
#
# Number of milliseconds the master (or any attached slave or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
sentinel down-after-milliseconds master1 30000
sentinel down-after-milliseconds master2 30000
sentinel down-after-milliseconds master3 30000
# sentinel parallel-syncs
#
# How many slaves we can reconfigure to point to the new slave simultaneously
# during the failover. Use a low number if you use the slaves to serve query
# to avoid that all the slaves will be unreachable at about the same
# time while performing the synchronization with the master.
sentinel parallel-syncs master1 1
sentinel parallel-syncs master2 1
sentinel parallel-syncs master3 1
# sentinel failover-timeout
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
# reconfigured as slaves of the new master. However even after this time
# the slaves will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
sentinel failover-timeout master1 180000
sentinel failover-timeout master2 180000
sentinel failover-timeout master3 180000
sentinel3.conf # Example sentinel.conf
# port
# The port that this sentinel instance will run on
port 27002
# dir
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
dir /tmp
# sentinel monitor
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Slaves are auto-discovered, so you don't need to specify slaves in
# any way. Sentinel itself will rewrite this configuration file adding
# the slaves using additional configuration options.
# Also note that the configuration file is rewritten when a
# slave is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
sentinel monitor master1 127.0.0.1 7000 2
sentinel monitor master2 127.0.0.1 7004 2
sentinel monitor master3 127.0.0.1 7005 2
# sentinel down-after-milliseconds
#
# Number of milliseconds the master (or any attached slave or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
sentinel down-after-milliseconds master1 30000
sentinel down-after-milliseconds master2 30000
sentinel down-after-milliseconds master3 30000
# sentinel parallel-syncs
#
# How many slaves we can reconfigure to point to the new slave simultaneously
# during the failover. Use a low number if you use the slaves to serve query
# to avoid that all the slaves will be unreachable at about the same
# time while performing the synchronization with the master.
sentinel parallel-syncs master1 1
sentinel parallel-syncs master2 1
sentinel parallel-syncs master3 1
# sentinel failover-timeout
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a slave replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted slave).
#
# - The maximum time a failover in progress waits for all the slaves to be
# reconfigured as slaves of the new master. However even after this time
# the slaves will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
sentinel failover-timeout master1 180000
sentinel failover-timeout master2 180000
sentinel failover-timeout master3 180000
そして入力します
redis-sentinel sentinel1.conf
redis-sentinel sentinel2.conf
redis-sentinel sentinel3.conf
3つのsentinel擬似クラスタを構築することができ、3つのmasterとsentinelが識別されたことを示す印刷が表示されます.
56161:X 04 Dec 09:23:09.855 # Sentinel runid is 4dd7b82766f7faac95c251235682e42079e0a701
56161:X 04 Dec 09:23:09.855 # +monitor master master0 192.168.39.153 7000 quorum 2
56161:X 04 Dec 09:23:09.855 # +monitor master master2 192.168.39.153 7005 quorum 2
56161:X 04 Dec 09:23:09.856 # +monitor master master1 192.168.39.153 7004 quorum 2
56161:X 04 Dec 09:23:10.842 * +slave slave 192.168.39.153:7003 192.168.39.153 7003 @ master0 192.168.39.153 7000
56161:X 04 Dec 09:23:10.842 * +slave slave 192.168.39.153:7002 192.168.39.153 7002 @ master2 192.168.39.153 7005
56161:X 04 Dec 09:23:10.843 * +slave slave 192.168.39.153:7001 192.168.39.153 7001 @ master1 192.168.39.153 7004
56161:X 04 Dec 09:23:19.505 * +sentinel sentinel 192.168.39.153:27001 192.168.39.153 27001 @ master0 192.168.39.153 7000
56161:X 04 Dec 09:23:19.506 * +sentinel sentinel 192.168.39.153:27001 192.168.39.153 27001 @ master2 192.168.39.153 7005
56161:X 04 Dec 09:23:19.508 * +sentinel sentinel 192.168.39.153:27001 192.168.39.153 27001 @ master1 192.168.39.153 7004
56161:X 04 Dec 09:23:25.240 * +sentinel sentinel 192.168.39.153:27002 192.168.39.153 27002 @ master1 192.168.39.153 7004
56161:X 04 Dec 09:23:25.241 * +sentinel sentinel 192.168.39.153:27002 192.168.39.153 27002 @ master2 192.168.39.153 7005
56161:X 04 Dec 09:23:25.242 * +sentinel sentinel 192.168.39.153:27002 192.168.39.153 27002 @ master0 192.168.39.153 7000
一般的に、1つのmasterをラインオフすると、クラスタは使用不可になりますが、sentinelができ、masterラインオフするとすぐにフェイルオーバが実行され、短時間で利用可能になります.
最初は6つのノード、3つのmaster、3つのslaveがあり、状態は以下の通りです.
127.0.0.1:7000> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:12
cluster_my_epoch:10
cluster_stats_messages_sent:2424257
cluster_stats_messages_received:2423717
127.0.0.1:7000> cluster nodes
930daea84150b5fabd32a95592781b27ceab1b71 192.168.39.153:7001 slave 81c884ebfc919ad293f02d797aff1033025ac27e 0 1480817793875 9 connected
8a6707d5b9269b6260315b47f300c1ab599733b7 192.168.39.153:7005 master - 0 1480817794879 11 connected 10923-16383
bdb62bb6ffce71588961f513c74b0d5a1a7145ea 192.168.39.153:7002 slave 8a6707d5b9269b6260315b47f300c1ab599733b7 0 1480817793372 11 connected
81c884ebfc919ad293f02d797aff1033025ac27e 192.168.39.153:7004 master - 0 1480817794378 9 connected 5461-10922
099cfc6fbb785449a8bf5369a53d21a9e127fa42 192.168.39.153:7000 myself,master - 0 0 10 connected 0-5460
a8081e97862d9cf76c72d364f9a173187376f215 192.168.39.153:7003 slave 099cfc6fbb785449a8bf5369a53d21a9e127fa42 0 1480817792868 10 connected
int信号を手動で送信してこのプロセスを終了し、redis-server:7004プロセスが私たちに殺されたことを発見しました.ubuntu@ubuntu-virtual-machine:~/redis-3.0.0/src$ ps aux | grep redis
ubuntu 6067 0.0 0.4 33148 4080 ? Ss 11 27 0:00 SCREEN -S redis
ubuntu 7192 0.0 0.8 42300 8392 ? Ssl 11 27 7:22 redis-server *:7000 [cluster]
ubuntu 7196 0.0 1.0 42300 10632 ? Ssl 11 27 7:19 redis-server *:7001 [cluster]
ubuntu 7200 0.0 1.0 42300 10504 ? Ssl 11 27 7:21 redis-server *:7002 [cluster]
ubuntu 7205 0.0 1.0 42300 10524 ? Ssl 11 27 7:21 redis-server *:7003 [cluster]
ubuntu 7218 0.0 0.8 42300 8556 ? Ssl 11 27 7:21 redis-server *:7005 [cluster]
ubuntu 56036 0.0 0.3 31128 3232 pts/6 S+ 09:15 0:00 screen -r redis
ubuntu 56161 0.2 0.7 42304 7532 pts/25 Sl+ 09:23 0:10 redis-sentinel *:27000 [sentinel]
ubuntu 56176 0.2 0.7 42304 7444 pts/26 Sl+ 09:23 0:10 redis-sentinel *:27001 [sentinel]
ubuntu 56192 0.2 0.9 42304 9424 pts/27 Sl+ 09:23 0:10 redis-sentinel *:27002 [sentinel]
ubuntu 56536 0.0 0.2 15944 2396 pts/12 R+ 10:29 0:00 grep --color=auto redis
ubuntu@ubuntu-virtual-machine:~/redis-3.0.0/src$ redis-cli -p 7000
127.0.0.1:7000> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:13
cluster_my_epoch:10
cluster_stats_messages_sent:2433366
cluster_stats_messages_received:2433005
127.0.0.1:7000> cluster nodes
930daea84150b5fabd32a95592781b27ceab1b71 192.168.39.153:7001 master - 0 1480818606296 13 connected 5461-10922
8a6707d5b9269b6260315b47f300c1ab599733b7 192.168.39.153:7005 master - 0 1480818606797 11 connected 10923-16383
bdb62bb6ffce71588961f513c74b0d5a1a7145ea 192.168.39.153:7002 slave 8a6707d5b9269b6260315b47f300c1ab599733b7 0 1480818608306 11 connected
81c884ebfc919ad293f02d797aff1033025ac27e 192.168.39.153:7004 master,fail - 1480818583889 1480818583084 9 disconnected
099cfc6fbb785449a8bf5369a53d21a9e127fa42 192.168.39.153:7000 myself,master - 0 0 10 connected 0-5460
a8081e97862d9cf76c72d364f9a173187376f215 192.168.39.153:7003 slave 099cfc6fbb785449a8bf5369a53d21a9e127fa42 0 1480818607301 10 connected
クラスタはmasterノードを停止した後,短い時間でフェイルオーバを処理し,クラスタはすぐに利用可能になり,元のslaveがmasterになったことを見出した.次のセクションでは、sentinelがフェイルオーバをどのように実現するかをソースコード階層から見てみましょう.分散クラスタでは、高可用性を保証することが重要です.