corosync+pacemaker pgストリームレプリケーション自動切替を実現(二)
23859 ワード
五、テスト
5.1スペアノードの失効
node 2でpostgresデータベースプロセスを殺し、スタンバイノードでのデータベースクラッシュをシミュレートします.[root@node2 ~]# killall -9 postgres
クラスタのステータスを表示するには、次の手順に従います.[root@node1 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:15:06 2014
Last change: Wed Jan 22 02:15:33 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node1
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Stopped: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : -INFINITY
+ pgsql-data-status : DISCONNECT
+ pgsql-status : STOP
Migration summary:
* Node node2:
pgsql: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 02:15:35 2014'
* Node node1:
Failed actions:
pgsql_monitor_7000 on node2 'not running' (7): call=42, status=complete, last-rc-change='Wed Jan 22 02:14:58 2014', queued=0ms, exec=0ms
{vip-slaveリソースがnode 1に正常に切り替えられました}
node 2のcorosyncを再起動すると、データベースは再起動に伴います.[root@node2 ~]# service corosync restart
[root@node1 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:16:24 2014
Last change: Wed Jan 22 02:16:55 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
Migration summary:
* Node node2:
* Node node1:
{vip-slaveはnod 2に戻った}
5.2メインノードのフェイルオーバー
Node 1でpostgresデータベースプロセスを殺し、プライマリノードのデータベースクラッシュをシミュレートします.[root@node1 ~]# killall -9 postgres
クラスタのステータスが表示されます.[root@node2 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:17:50 2014
Last change: Wed Jan 22 02:18:16 2014 via crm_attribute on node2
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node2
vip-rep (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node2 ]
Stopped: [ node1 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : -INFINITY
+ pgsql-data-status : DISCONNECT
+ pgsql-status : STOP
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000008014A70
+ pgsql-status : PRI
Migration summary:
* Node node2:
* Node node1:
pgsql: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 02:18:11 2014'
Failed actions:
pgsql_monitor_2000 on node1 'not running' (7): call=2435, status=complete, last-rc-change='Wed Jan 22 02:18:11 2014', queued=0ms, exec=0ms
{vip-master/vip-repはnode 2に正常に切り替えられ、node 2はmasterになり、node 2のpgデータベースのステータスはPRIに切り替えられました}
Node 1のcorosyncを停止します.[root@node1 ~]# service corosync stop
基本同期を1回実行するには、次の手順に従います.[postgres@node1 data]$ pwd
/opt/pgsql/data
[postgres@node1 data]$ rm -rf *
[postgres@node1 data]$ pg_basebackup -h 192.168.1.3 -U postgres -D /opt/pgsql/data/ -P
19172/19172 kB (100%), 1/1 tablespace
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
[postgres@node1 data]$ ls
backup_label base pg_clog pg_ident.conf pg_notify pg_stat_tmp pg_tblspc PG_VERSION postgresql.conf
backup_label.old global pg_hba.conf pg_multixact pg_serial pg_subtrans pg_twophase pg_xlog recovery.done
node 1のcorosyncを起動します.[root@node1 ~]# service corosync start
5.3プライマリ・ノードのリカバリ
元のプライマリノードを修復して現在のスタンバイノードに復元
node 1でベース同期を実行するには、次の手順に従います.[postgres@node1 data]$ pwd
/opt/pgsql/data
[postgres@node1 data]$ rm -rf *
[postgres@node1 data]$ pg_basebackup -h 192.168.2.3 -U postgres -D /opt/pgsql/data/ -P
19172/19172 kB (100%), 1/1 tablespace
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
[postgres@node1 data]$ ls
backup_label base pg_clog pg_ident.conf pg_notify pg_stat_tmp pg_tblspc PG_VERSION postgresql.conf
backup_label.old global pg_hba.conf pg_multixact pg_serial pg_subtrans pg_twophase pg_xlog recovery.done
Heartbeatを起動する前に、資本ロックを削除する必要があります.そうしないと、リソースはheartbeatの起動に伴いません.[root@node1 ~]# rm -rf /var/lib/pgsql/tmp/PGSQL.lock
{このロックファイルは、ノードがプライマリノードである場合に作成されますが、heartbeatの異常停止やデータベース/システムの異常終了によって自動的に削除されることはありません.したがって、ノードを復元する際に、ノードがプライマリノードとして機能している限り、手動でロックファイルをクリーンアップする必要があります}
node 1のheartbeatを再起動するには:[root@node1 ~]# service heartbeat restart
時間が経過すると、クラスタのステータスが表示されます.[root@node2 ~]# crm_mon -Afr1
============
Last updated: Mon Jan 27 08:50:43 2014
Stack: Heartbeat
Current DC: node2 (f2dcd1df-7429-42f5-82e9-b73921f97cab) - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, unknown expected votes
4 Resources configured.
============
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node1
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node2
vip-rep (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: msPostgresql
Masters: [ node2 ]
Slaves: [ node1 ]
Clone Set: clnPingCheck
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql:0 : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
* Node node2:
+ default_ping_set : 100
+ master-pgsql:1 : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 00000000120000B0
+ pgsql-status : PRI
Migration summary:
* Node node1:
* Node node2:
{vip-slaveはnode 1に正常に切断され、node 1はストリームレプリケーションの準備ノードに正常になりました}
六、管理
6.1 corosyncの起動と停止
[root@node1 ~]# service corosync start
[root@node1 ~]# service corosync stop
6.2 HAステータスの表示
[root@node1 ~]# crm status
Last updated: Tue Jan 21 23:55:13 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
6.3リソースステータスおよびノード属性の表示
[root@node1 ~]# crm_mon -Afr -1
Last updated: Tue Jan 21 23:37:20 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
Migration summary:
* Node node2:
* Node node1:
6.4構成の表示
[root@node1 ~]# crm configure show
node node1 \
attributes pgsql-data-status="LATEST"
node node2 \
attributes pgsql-data-status="STREAMING|SYNC"
primitive pgsql ocf:heartbeat:pgsql \
params pgctl="/opt/pgsql/bin/pg_ctl" psql="/opt/pgsql/bin/psql" pgdata="/opt/pgsql/data/" start_opt="-p 5432" rep_mode="sync" node_list="node1 node2" restore_command="cp /opt/archivelog/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="192.168.1.3" stop_escalate="0" \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="7s" on-fail="restart" \
op monitor timeout="60s" interval="2s" on-fail="restart" role="Master" \
op promote timeout="60s" interval="0s" on-fail="restart" \
op demote timeout="60s" interval="0s" on-fail="stop" \
……
……
6.5リアルタイム監視HA
[root@node1 ~]# crm_mon -Afr
Last updated: Wed Jan 22 00:40:12 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
Migration summary:* Node node2: * Node node1:
6.6 crm_义齿
リソースの起動/停止:
[root@node1 ~]# crm_resource -r vip-master -v started
[root@node1 ~]# crm_resource -r vip-master -v stoped
リソースを列挙:
[root@node1 ~]# crm_resource -L
vip-slave (ocf::heartbeat:IPaddr2): Started
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started
vip-rep (ocf::heartbeat:IPaddr2): Started
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
リソースの場所を表示するには、次の手順に従います。
[root@node1 ~]# crm_resource -W -r pgsql
resource pgsql is running on: node2
リソースの移行:
[root@node1 ~]# crm_resource -M -r vip-slave -N node2
リソースを削除するには
[root@node1 ~]# crm_resource -D -r vip-slave -t primitive
6.7 crmコマンド
指定されたRAを列挙:
[root@node1 ~]# crm ra list ocf pacemaker
ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo SystemHealth controld ping pingd
remote
ノードを削除するには
[root@node1 ~]# crm node delete node2
ノードの非アクティブ化:
[root@node1 ~]# crm node standby node2
ノードの有効化:
[root@node1 ~]# crm node online node2
pacemakerの構成:
[root@node1 ~]# crm configure
crm(live)configure#
……
……
crm(live)configure# commit
crm(live)configure# quit
6.8 failcountをリセット
[root@node1 ~]# crm resource
crm(live)resource# failcount pgsql set node1 0
crm(live)resource# failcount pgsql show node1
scope=status name=fail-count-pgsql value=0
[root@node1 ~]# crm resource cleanup pgsql
Cleaning up pgsql:0 on node1
Waiting for 1 replies from the CRMd. OK
[root@node1 ~]# crm_failcount -G -U node1 -r pgsql
scope=status name=fail-count-pgsql value=INFINITY
[root@node1 ~]# crm_failcount -D -U node1 -r pgsql
七、問題記録
7.1 Q1
問題:
corosync.logログにエラーが表示されました:
Jan 15 10:23:57 node1 lrmd: [6327]: info: RA output: (pgsql:0:monitor:stderr)/usr/lib/ocf/resource.d//heartbeat/pgsql: line 1749: ocf_local_nodename: command not found
Jan 15 10:23:57 node1 crm_attribute: [11094]: info: Invoked:/usr/sbin/crm_attribute -l reboot -N -n -v 0000000006000090 pgsql-xlog-loc lrm_get_rsc_type_metadata(578)
Jan 15 10:23:57 node1 lrmd: [6327]: info: RA output: (pgsql:0:monitor:stderr) Could not map uname=-n to a UUID: The object/attribute does not exist
解決方法:
pgsqlスクリプトを表示するとocf_が使用されていることがわかりますlocal_nodename、この関数はocf-shellfuncsにあるはずだった.inには定義がありますが、この関数はありません.インターネットで関連フォーラムを表示します.
http://www.gossamer-threads.com/lists/linuxha/users/89379?do=post_view_threaded
この場合、ocf_を解決するためにパッチが必要であることを示します.local_nodename関数のパッチ:
https://github.com/ClusterLabs/resource-agents/commit/abc1c3f6464f6e5e7a1e41cd7c9b8179896c1903
最新バージョンはocf_がありませんlocal_Nodename関数なので、次のバージョンを使用します.
{注意:pacemakerバージョン>1.1.8、crm_node-nコマンドが使用できないことを確認します}
https://github.com/ClusterLabs/resource-agents/blob/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql
https://github.com/ClusterLabs/resource-agents/tree/abc1c3f6464f6e5e7a1e41cd7c9b8179896c1903/heartbeat
ocfを含まないlocal_nodename関数のpgsqlスクリプト:
https://raw.github.com/ClusterLabs/resource-agents/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql
7.2 Q2
問題:[root@node1 ~]# crm configure load update pgsql.crm
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for notify is smaller than the advised 90
WARNING: pgsql: specified timeout 60s for demote is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for promote is smaller than the advised 120
ERROR: master-group: attribute ordered does not exist
Do you still want to commit? no
解決方法:
エラーメッセージ:定義されたmaster-groupにorderedプロパティが存在しません
(1)この問題はpacemakerバージョンによるもので、pacemaker-1.1バージョンではordered,colocated属性はサポートされておらず、以下の方法で1.0バージョンのcibconfig.py現在の新しいバージョンを置き換えてこの問題を解決しようとしましたが、失敗しました.[root@node1 ~]# vim /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py
[root@node1 ~]# cd /usr/lib64/python2.6/site-packages/crmsh/
[root@node1 crmsh]# mv cibconfig.py cibconfig.py.bak
[root@node1 crmsh]# wget https://github.com/ClusterLabs/pacemaker-1.0/blob/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc/shell/modules/cibconfig.py
(2)orderedに関する定義を構成スクリプトから削除(成功):
group master-group\
vip-master \
vip-rep \
meta \
ordered="false"
変更後:
group master-group\
vip-master \
vip-rep
7.3 Q3
問題:
Pacemakerのインストールエラー:# yum install pacemaker*
……
--> Processing Dependency: libesmtp.so.5()(64bit) for package: pacemaker
--> Finished Dependency Resolution
pacemaker-1.0.12-1.el5.centos.i386 from clusterlabs has depsolving problems
--> Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
pacemaker-1.0.12-1.el5.centos.x86_64 from clusterlabs has depsolving problems
--> Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
You could try using --skip-broken to work around the problem
You could try running: package-cleanup --problems
package-cleanup --dupes
rpm -Va --nofiles --nodigest
The program package-cleanup is found in the yum-utils package.
解決方法:
ヒントlibesmtpが欠けている場合は、インストールすればいいです.# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/x86_64/libesmtp-1.0.4-5.el5.x86_64.rpm
# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/i386/libesmtp-1.0.4-5.el5.i386.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.x86_64.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.i386.rpm
7.4 Q4
問題:
crm構成のロードエラー:[root@node1 ~]# crm configure load update pgsql.crm
ERROR: pgsql: parameter rep_mode does not exist
ERROR: pgsql: parameter node_list does not exist
ERROR: pgsql: parameter master_ip does not exist
ERROR: pgsql: parameter restore_command does not exist
ERROR: pgsql: parameter primary_conninfo_opt does not exist
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: action monitor_Master not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: action notify not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action demote not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action promote not advertised in meta-data, it may not be supported by the RA
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
Do you still want to commit? no
解決方法:
パラメータが存在しないのはpgsqlスクリプトが古いため、置換が必要です.
scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/
scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/
7.5 Q5
問題:[root@node1 ~]# crm_mon -Afr -1
Last updated: Tue Jan 21 05:10:56 2014
Last change: Tue Jan 21 05:10:08 2014 via cibadmin on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Stopped
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Stopped
vip-rep (ocf::heartbeat:IPaddr2): Stopped
Master/Slave Set: msPostgresql [pgsql]
Stopped: [ node1 node2 ]
Clone Set: clnPingCheck [pingCheck]
Stopped: [ node1 node2 ]
Node Attributes:
* Node node1:
* Node node2:
Migration summary:
* Node node1:
* Node node2:
Failed actions:
pingCheck_monitor_0 on node1 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:10:10 2014', queued=200ms, exec=0ms
pingCheck_monitor_0 on node2 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:09:36 2014', queued=281ms, exec=0ms
解決方法:
このエラーは、スクリプト定義のpingCheckで呼び出されたpingdスクリプトに未知のパラメータが存在し、ocf/pacemaker/pingdにmultiplierパラメータが存在しないことを確認したためです.
primitive pingCheck ocf:pacemaker:pingd\
params \
name="default_ping_set" \
host_list="192.168.100.1" \
multiplier="100" \
op start timeout="60s"interval="0s" on-fail="restart"\
op monitor timeout="60s"interval="10s"on-fail="restart"\
op stop timeout="60s"interval="0s" on-fail="ignore"
したがって呼び出しをocf:heartbeat:pingdに変更
7.6 Q6
問題:
corosyncログでエラーが発生しました:
Jan 21 04:36:02 corosync [TOTEM ] Received message has invalid digest... ignoring.
Jan 21 04:36:02 corosync [TOTEM ] Invalid packet data
解決方法:
ネットワークに同じマルチキャストが存在することを説明し、マルチキャストアドレスを変更すればよい.
八、参考資源
スクリプト:
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql
スクリプトの使用方法:
https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication
crm_resouceコマンド:
http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmresource.html
crm_failcountコマンド:
http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmfailcount.html
[root@node2 ~]# killall -9 postgres
[root@node1 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:15:06 2014
Last change: Wed Jan 22 02:15:33 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node1
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Stopped: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : -INFINITY
+ pgsql-data-status : DISCONNECT
+ pgsql-status : STOP
Migration summary:
* Node node2:
pgsql: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 02:15:35 2014'
* Node node1:
Failed actions:
pgsql_monitor_7000 on node2 'not running' (7): call=42, status=complete, last-rc-change='Wed Jan 22 02:14:58 2014', queued=0ms, exec=0ms
[root@node2 ~]# service corosync restart
[root@node1 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:16:24 2014
Last change: Wed Jan 22 02:16:55 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
Migration summary:
* Node node2:
* Node node1:
[root@node1 ~]# killall -9 postgres
[root@node2 ~]# crm_mon -Afr -1
Last updated: Wed Jan 22 02:17:50 2014
Last change: Wed Jan 22 02:18:16 2014 via crm_attribute on node2
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node2
vip-rep (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node2 ]
Stopped: [ node1 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : -INFINITY
+ pgsql-data-status : DISCONNECT
+ pgsql-status : STOP
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000008014A70
+ pgsql-status : PRI
Migration summary:
* Node node2:
* Node node1:
pgsql: migration-threshold=1 fail-count=1 last-failure='Wed Jan 22 02:18:11 2014'
Failed actions:
pgsql_monitor_2000 on node1 'not running' (7): call=2435, status=complete, last-rc-change='Wed Jan 22 02:18:11 2014', queued=0ms, exec=0ms
[root@node1 ~]# service corosync stop
[postgres@node1 data]$ pwd
/opt/pgsql/data
[postgres@node1 data]$ rm -rf *
[postgres@node1 data]$ pg_basebackup -h 192.168.1.3 -U postgres -D /opt/pgsql/data/ -P
19172/19172 kB (100%), 1/1 tablespace
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
[postgres@node1 data]$ ls
backup_label base pg_clog pg_ident.conf pg_notify pg_stat_tmp pg_tblspc PG_VERSION postgresql.conf
backup_label.old global pg_hba.conf pg_multixact pg_serial pg_subtrans pg_twophase pg_xlog recovery.done
[root@node1 ~]# service corosync start
[postgres@node1 data]$ pwd
/opt/pgsql/data
[postgres@node1 data]$ rm -rf *
[postgres@node1 data]$ pg_basebackup -h 192.168.2.3 -U postgres -D /opt/pgsql/data/ -P
19172/19172 kB (100%), 1/1 tablespace
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
[postgres@node1 data]$ ls
backup_label base pg_clog pg_ident.conf pg_notify pg_stat_tmp pg_tblspc PG_VERSION postgresql.conf
backup_label.old global pg_hba.conf pg_multixact pg_serial pg_subtrans pg_twophase pg_xlog recovery.done
[root@node1 ~]# rm -rf /var/lib/pgsql/tmp/PGSQL.lock
[root@node1 ~]# service heartbeat restart
[root@node2 ~]# crm_mon -Afr1
============
Last updated: Mon Jan 27 08:50:43 2014
Stack: Heartbeat
Current DC: node2 (f2dcd1df-7429-42f5-82e9-b73921f97cab) - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, unknown expected votes
4 Resources configured.
============
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node1
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node2
vip-rep (ocf::heartbeat:IPaddr2): Started node2
Master/Slave Set: msPostgresql
Masters: [ node2 ]
Slaves: [ node1 ]
Clone Set: clnPingCheck
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql:0 : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
* Node node2:
+ default_ping_set : 100
+ master-pgsql:1 : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 00000000120000B0
+ pgsql-status : PRI
Migration summary:
* Node node1:
* Node node2:
6.1 corosyncの起動と停止
[root@node1 ~]# service corosync start
[root@node1 ~]# service corosync stop
6.2 HAステータスの表示
[root@node1 ~]# crm status
Last updated: Tue Jan 21 23:55:13 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
6.3リソースステータスおよびノード属性の表示
[root@node1 ~]# crm_mon -Afr -1
Last updated: Tue Jan 21 23:37:20 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
Migration summary:
* Node node2:
* Node node1:
6.4構成の表示
[root@node1 ~]# crm configure show
node node1 \
attributes pgsql-data-status="LATEST"
node node2 \
attributes pgsql-data-status="STREAMING|SYNC"
primitive pgsql ocf:heartbeat:pgsql \
params pgctl="/opt/pgsql/bin/pg_ctl" psql="/opt/pgsql/bin/psql" pgdata="/opt/pgsql/data/" start_opt="-p 5432" rep_mode="sync" node_list="node1 node2" restore_command="cp /opt/archivelog/%f %p" primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip="192.168.1.3" stop_escalate="0" \
op start timeout="60s" interval="0s" on-fail="restart" \
op monitor timeout="60s" interval="7s" on-fail="restart" \
op monitor timeout="60s" interval="2s" on-fail="restart" role="Master" \
op promote timeout="60s" interval="0s" on-fail="restart" \
op demote timeout="60s" interval="0s" on-fail="stop" \
……
……
6.5リアルタイム監視HA
[root@node1 ~]# crm_mon -Afr
Last updated: Wed Jan 22 00:40:12 2014
Last change: Tue Jan 21 23:37:36 2014 via crm_attribute on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Started node2
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started node1
vip-rep (ocf::heartbeat:IPaddr2): Started node1
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
Node Attributes:
* Node node1:
+ default_ping_set : 100
+ master-pgsql : 1000
+ pgsql-data-status : LATEST
+ pgsql-master-baseline : 0000000006000078
+ pgsql-status : PRI
* Node node2:
+ default_ping_set : 100
+ master-pgsql : 100
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : HS:sync
Migration summary:* Node node2: * Node node1:
6.6 crm_义齿
リソースの起動/停止:
[root@node1 ~]# crm_resource -r vip-master -v started
[root@node1 ~]# crm_resource -r vip-master -v stoped
リソースを列挙:
[root@node1 ~]# crm_resource -L
vip-slave (ocf::heartbeat:IPaddr2): Started
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started
vip-rep (ocf::heartbeat:IPaddr2): Started
Master/Slave Set: msPostgresql [pgsql]
Masters: [ node1 ]
Slaves: [ node2 ]
Clone Set: clnPingCheck [pingCheck]
Started: [ node1 node2 ]
リソースの場所を表示するには、次の手順に従います。
[root@node1 ~]# crm_resource -W -r pgsql
resource pgsql is running on: node2
リソースの移行:
[root@node1 ~]# crm_resource -M -r vip-slave -N node2
リソースを削除するには
[root@node1 ~]# crm_resource -D -r vip-slave -t primitive
6.7 crmコマンド
指定されたRAを列挙:
[root@node1 ~]# crm ra list ocf pacemaker
ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo SystemHealth controld ping pingd
remote
ノードを削除するには
[root@node1 ~]# crm node delete node2
ノードの非アクティブ化:
[root@node1 ~]# crm node standby node2
ノードの有効化:
[root@node1 ~]# crm node online node2
pacemakerの構成:
[root@node1 ~]# crm configure
crm(live)configure#
……
……
crm(live)configure# commit
crm(live)configure# quit
6.8 failcountをリセット
[root@node1 ~]# crm resource
crm(live)resource# failcount pgsql set node1 0
crm(live)resource# failcount pgsql show node1
scope=status name=fail-count-pgsql value=0
[root@node1 ~]# crm resource cleanup pgsql
Cleaning up pgsql:0 on node1
Waiting for 1 replies from the CRMd. OK
[root@node1 ~]# crm_failcount -G -U node1 -r pgsql
scope=status name=fail-count-pgsql value=INFINITY
[root@node1 ~]# crm_failcount -D -U node1 -r pgsql
七、問題記録
7.1 Q1
問題:
corosync.logログにエラーが表示されました:
Jan 15 10:23:57 node1 lrmd: [6327]: info: RA output: (pgsql:0:monitor:stderr)/usr/lib/ocf/resource.d//heartbeat/pgsql: line 1749: ocf_local_nodename: command not found
Jan 15 10:23:57 node1 crm_attribute: [11094]: info: Invoked:/usr/sbin/crm_attribute -l reboot -N -n -v 0000000006000090 pgsql-xlog-loc lrm_get_rsc_type_metadata(578)
Jan 15 10:23:57 node1 lrmd: [6327]: info: RA output: (pgsql:0:monitor:stderr) Could not map uname=-n to a UUID: The object/attribute does not exist
解決方法:
pgsqlスクリプトを表示するとocf_が使用されていることがわかりますlocal_nodename、この関数はocf-shellfuncsにあるはずだった.inには定義がありますが、この関数はありません.インターネットで関連フォーラムを表示します.
http://www.gossamer-threads.com/lists/linuxha/users/89379?do=post_view_threaded
この場合、ocf_を解決するためにパッチが必要であることを示します.local_nodename関数のパッチ:
https://github.com/ClusterLabs/resource-agents/commit/abc1c3f6464f6e5e7a1e41cd7c9b8179896c1903
最新バージョンはocf_がありませんlocal_Nodename関数なので、次のバージョンを使用します.
{注意:pacemakerバージョン>1.1.8、crm_node-nコマンドが使用できないことを確認します}
https://github.com/ClusterLabs/resource-agents/blob/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql
https://github.com/ClusterLabs/resource-agents/tree/abc1c3f6464f6e5e7a1e41cd7c9b8179896c1903/heartbeat
ocfを含まないlocal_nodename関数のpgsqlスクリプト:
https://raw.github.com/ClusterLabs/resource-agents/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql
7.2 Q2
問題:[root@node1 ~]# crm configure load update pgsql.crm
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for notify is smaller than the advised 90
WARNING: pgsql: specified timeout 60s for demote is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for promote is smaller than the advised 120
ERROR: master-group: attribute ordered does not exist
Do you still want to commit? no
解決方法:
エラーメッセージ:定義されたmaster-groupにorderedプロパティが存在しません
(1)この問題はpacemakerバージョンによるもので、pacemaker-1.1バージョンではordered,colocated属性はサポートされておらず、以下の方法で1.0バージョンのcibconfig.py現在の新しいバージョンを置き換えてこの問題を解決しようとしましたが、失敗しました.[root@node1 ~]# vim /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py
[root@node1 ~]# cd /usr/lib64/python2.6/site-packages/crmsh/
[root@node1 crmsh]# mv cibconfig.py cibconfig.py.bak
[root@node1 crmsh]# wget https://github.com/ClusterLabs/pacemaker-1.0/blob/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc/shell/modules/cibconfig.py
(2)orderedに関する定義を構成スクリプトから削除(成功):
group master-group\
vip-master \
vip-rep \
meta \
ordered="false"
変更後:
group master-group\
vip-master \
vip-rep
7.3 Q3
問題:
Pacemakerのインストールエラー:# yum install pacemaker*
……
--> Processing Dependency: libesmtp.so.5()(64bit) for package: pacemaker
--> Finished Dependency Resolution
pacemaker-1.0.12-1.el5.centos.i386 from clusterlabs has depsolving problems
--> Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
pacemaker-1.0.12-1.el5.centos.x86_64 from clusterlabs has depsolving problems
--> Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
You could try using --skip-broken to work around the problem
You could try running: package-cleanup --problems
package-cleanup --dupes
rpm -Va --nofiles --nodigest
The program package-cleanup is found in the yum-utils package.
解決方法:
ヒントlibesmtpが欠けている場合は、インストールすればいいです.# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/x86_64/libesmtp-1.0.4-5.el5.x86_64.rpm
# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/i386/libesmtp-1.0.4-5.el5.i386.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.x86_64.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.i386.rpm
7.4 Q4
問題:
crm構成のロードエラー:[root@node1 ~]# crm configure load update pgsql.crm
ERROR: pgsql: parameter rep_mode does not exist
ERROR: pgsql: parameter node_list does not exist
ERROR: pgsql: parameter master_ip does not exist
ERROR: pgsql: parameter restore_command does not exist
ERROR: pgsql: parameter primary_conninfo_opt does not exist
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: action monitor_Master not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: action notify not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action demote not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action promote not advertised in meta-data, it may not be supported by the RA
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
Do you still want to commit? no
解決方法:
パラメータが存在しないのはpgsqlスクリプトが古いため、置換が必要です.
scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/
scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/
7.5 Q5
問題:[root@node1 ~]# crm_mon -Afr -1
Last updated: Tue Jan 21 05:10:56 2014
Last change: Tue Jan 21 05:10:08 2014 via cibadmin on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Stopped
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Stopped
vip-rep (ocf::heartbeat:IPaddr2): Stopped
Master/Slave Set: msPostgresql [pgsql]
Stopped: [ node1 node2 ]
Clone Set: clnPingCheck [pingCheck]
Stopped: [ node1 node2 ]
Node Attributes:
* Node node1:
* Node node2:
Migration summary:
* Node node1:
* Node node2:
Failed actions:
pingCheck_monitor_0 on node1 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:10:10 2014', queued=200ms, exec=0ms
pingCheck_monitor_0 on node2 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:09:36 2014', queued=281ms, exec=0ms
解決方法:
このエラーは、スクリプト定義のpingCheckで呼び出されたpingdスクリプトに未知のパラメータが存在し、ocf/pacemaker/pingdにmultiplierパラメータが存在しないことを確認したためです.
primitive pingCheck ocf:pacemaker:pingd\
params \
name="default_ping_set" \
host_list="192.168.100.1" \
multiplier="100" \
op start timeout="60s"interval="0s" on-fail="restart"\
op monitor timeout="60s"interval="10s"on-fail="restart"\
op stop timeout="60s"interval="0s" on-fail="ignore"
したがって呼び出しをocf:heartbeat:pingdに変更
7.6 Q6
問題:
corosyncログでエラーが発生しました:
Jan 21 04:36:02 corosync [TOTEM ] Received message has invalid digest... ignoring.
Jan 21 04:36:02 corosync [TOTEM ] Invalid packet data
解決方法:
ネットワークに同じマルチキャストが存在することを説明し、マルチキャストアドレスを変更すればよい.
八、参考資源
スクリプト:
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql
スクリプトの使用方法:
https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication
crm_resouceコマンド:
http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmresource.html
crm_failcountコマンド:
http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmfailcount.html
[root@node1 ~]# crm configure load update pgsql.crm
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for notify is smaller than the advised 90
WARNING: pgsql: specified timeout 60s for demote is smaller than the advised 120
WARNING: pgsql: specified timeout 60s for promote is smaller than the advised 120
ERROR: master-group: attribute ordered does not exist
Do you still want to commit? no
[root@node1 ~]# vim /usr/lib64/python2.6/site-packages/crmsh/cibconfig.py
[root@node1 ~]# cd /usr/lib64/python2.6/site-packages/crmsh/
[root@node1 crmsh]# mv cibconfig.py cibconfig.py.bak
[root@node1 crmsh]# wget https://github.com/ClusterLabs/pacemaker-1.0/blob/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc/shell/modules/cibconfig.py
# yum install pacemaker*
……
--> Processing Dependency: libesmtp.so.5()(64bit) for package: pacemaker
--> Finished Dependency Resolution
pacemaker-1.0.12-1.el5.centos.i386 from clusterlabs has depsolving problems
--> Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
pacemaker-1.0.12-1.el5.centos.x86_64 from clusterlabs has depsolving problems
--> Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5 is needed by package pacemaker-1.0.12-1.el5.centos.i386 (clusterlabs)
Error: Missing Dependency: libesmtp.so.5()(64bit) is needed by package pacemaker-1.0.12-1.el5.centos.x86_64 (clusterlabs)
You could try using --skip-broken to work around the problem
You could try running: package-cleanup --problems
package-cleanup --dupes
rpm -Va --nofiles --nodigest
The program package-cleanup is found in the yum-utils package.
# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/x86_64/libesmtp-1.0.4-5.el5.x86_64.rpm
# wget ftp://ftp.univie.ac.at/systems/linux/fedora/epel/5/i386/libesmtp-1.0.4-5.el5.i386.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.x86_64.rpm
# rpm -ivh libesmtp-1.0.4-5.el5.i386.rpm
[root@node1 ~]# crm configure load update pgsql.crm
ERROR: pgsql: parameter rep_mode does not exist
ERROR: pgsql: parameter node_list does not exist
ERROR: pgsql: parameter master_ip does not exist
ERROR: pgsql: parameter restore_command does not exist
ERROR: pgsql: parameter primary_conninfo_opt does not exist
WARNING: pgsql: specified timeout 60s for stop is smaller than the advised 120
WARNING: pgsql: action monitor_Master not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: specified timeout 60s for start is smaller than the advised 120
WARNING: pgsql: action notify not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action demote not advertised in meta-data, it may not be supported by the RA
WARNING: pgsql: action promote not advertised in meta-data, it may not be supported by the RA
WARNING: pingCheck: specified timeout 60s for start is smaller than the advised 90
WARNING: pingCheck: specified timeout 60s for stop is smaller than the advised 100
Do you still want to commit? no
scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/
scp pgsql [email protected]:/usr/lib/ocf/resource.d/heartbeat/
scp ocf-shellfuncs.in [email protected]:/usr/lib/ocf/lib/heartbeat/
[root@node1 ~]# crm_mon -Afr -1
Last updated: Tue Jan 21 05:10:56 2014
Last change: Tue Jan 21 05:10:08 2014 via cibadmin on node1
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
2 Nodes configured, 2 expected votes
7 Resources configured
Online: [ node1 node2 ]
Full list of resources:
vip-slave (ocf::heartbeat:IPaddr2): Stopped
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Stopped
vip-rep (ocf::heartbeat:IPaddr2): Stopped
Master/Slave Set: msPostgresql [pgsql]
Stopped: [ node1 node2 ]
Clone Set: clnPingCheck [pingCheck]
Stopped: [ node1 node2 ]
Node Attributes:
* Node node1:
* Node node2:
Migration summary:
* Node node1:
* Node node2:
Failed actions:
pingCheck_monitor_0 on node1 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:10:10 2014', queued=200ms, exec=0ms
pingCheck_monitor_0 on node2 'invalid parameter' (2): call=23, status=complete, last-rc-change='Tue Jan 21 05:09:36 2014', queued=281ms, exec=0ms
スクリプト:
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql
スクリプトの使用方法:
https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication
crm_resouceコマンド:
http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmresource.html
crm_failcountコマンド:
http://www.novell.com/zh-cn/documentation/sle_ha/book_sleha/data/man_crmfailcount.html