Manual split brain recovery
2474 ワード
DRBD detects split brain at the time connectivity becomes available again and the peer nodes exchange the initial DRBD protocol handshake. If DRBD detects that both nodes are (or were at some point, while disconnected) in the primary role, it immediately tears down the replication connection. The tell-tale sign of this is a message like the following appearing in the system log:
After split brain has been detected, one node will always have the resource in a
At this point, unless you configured DRBD to automatically recover from split brain, you must manually intervene by selecting one node whose modifications will be discarded (this node is referred to as the split brain victim). This intervention is made with the following commands:
The split brain victim needs to be in the connection state of
drbdadm disconnect
drbdadm secondary
On the other node (the split brain survivor), if its connection state is also
You may omit this step if the node is already in the
If the resource affected by the split brain is a stacked resource, use
Upon connection, your split brain victim immediately changes its connection state to
The split brain victim is not subjected to a full device synchronization. Instead, it has its local modifications rolled back, and any modifications made on the split brain survivor propagate to the victim.
After resynchronization has completed, the split brain is considered resolved and the two nodes form a fully consistent, redundant replicated storage system again
Split-Brain detected, dropping connection!
After split brain has been detected, one node will always have the resource in a
StandAlone
connection state. The other might either also be in the StandAlone
state (if both nodes detected the split brain simultaneously), or in WFConnection
(if the peer tore down the connection before the other node had a chance to detect split brain). At this point, unless you configured DRBD to automatically recover from split brain, you must manually intervene by selecting one node whose modifications will be discarded (this node is referred to as the split brain victim). This intervention is made with the following commands:
The split brain victim needs to be in the connection state of
StandAlone
or the following commands will return an error. You can ensure it is standalone by issuing: drbdadm disconnect
drbdadm secondary
drbdadm connect --discard-my-data
On the other node (the split brain survivor), if its connection state is also
StandAlone
, you would enter: drbdadm connect
You may omit this step if the node is already in the
WFConnection
state; it will then reconnect automatically. If the resource affected by the split brain is a stacked resource, use
drbdadm --stacked
instead of just drbdadm
. Upon connection, your split brain victim immediately changes its connection state to
SyncTarget
, and has its modifications overwritten by the remaining primary node. The split brain victim is not subjected to a full device synchronization. Instead, it has its local modifications rolled back, and any modifications made on the split brain survivor propagate to the victim.
After resynchronization has completed, the split brain is considered resolved and the two nodes form a fully consistent, redundant replicated storage system again