Dataguard構築問題メモ
DGの問題をテストしたいので、Vmware上にテスト環境を構築し、primaryとstandbyを同じホストに置きます.構築過程は言うまでもなく、最後に、いくつかの問題に遭遇しました.ここで記録して、後続の調査に備えます.
----構築中のいくつかの命令----
6.最もstdのできるpfileファイルの内容
問題1:
Listenerを設定する過程で、静的リスニングを設定するため、standbyが起動すると同時に動的リスニングが登録され、現在は2つのリスニングがあり、動的リスニングステータスはblockedです.問題は、duplicate接続時にORA-12528エラーが発生し、standbyに正常に接続できないことです.
tnsnamesを設定します.ora、この問題を解決することができます.
問題2:
次のコマンドを実行した後、直接primaryはcrashを削除し、alertを調べたところ、redo brokenによるALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSIONであることが分かった.
standby alertを検索すると、次の情報が表示されます.(ALTER DATABASE RECOVER...が開始された後、最初のことはclear redo logで、primary crashを直接招きます)
何度も再建された後も、問題は依然として残っている.Datafileもarchivelogもconvertパラメータを使用して変換されていますが、redoは変換できません.この問題をどのように解決しますか?分析の結果、duplicateの最後に警告メッセージがいくつかあることがわかりました.
後で手動で変更standby_file_managementはMANUALモードであり、duplicateでrenameに成功し、rename正broken 0~2.(これは私がpfileでこのstandby_file_management=autoを持っています)
ALTER DATABASE RECOVERを実行中...の場合は、依然としてredoをクリアする必要がありますが、redo renameをクリアすべきなので、メインライブラリのredo logには影響しません.
まとめ:これまでのDG構築は、2つの異なるホスト上で実施されていたため、この問題にも関心を持っていなかった.standbyがprimary redoをクリアする動作は失敗したに違いないからだ.しかし、現在はホスト上で、standbyは元のprimaryのredoを発見することができ、renameが失敗したため、直接clear primary redoでprimaryが起動できない.root causeは私がpfileを編集したとき、standby_をたくさん書きました.file_management=auto、実は書かないとデフォルト値がmanualなので、この問題は発生しません.
----構築中のいくつかの命令----
1. backup database format '/tmp/bk_%U';
2. backup current controlfile for standby format '/tmp/stdbyctl.bkp';
3. catalog start with '/tmp/';
4. set controlfile autobackup format for device type disk to '/tmp/%F';
restore standby controlfile from '/tmp/stdbyctl.bkp';
5. RMAN> CONNECT TARGET SYS/oracle@db;
RMAN> CONNECT AUXILIARY SYS/oracle@stddb;
6.最もstdのできるpfileファイルの内容
db_file_name_convert= '+data/db/','+reco/stddb/'
log_file_name_convert= '+RECO/db/archivelog/','+RECO/stddb/archivelog_std/'
log_archive_format=%t_%s_%r.arc
standby_file_management=auto
compatible='11.2.0.0.0'
control_files='+RECO/stddb/controlfile/current.260.834947597' <<<<<< control , control_files='+RECO' ,
db_block_size=8192
processes=150
remote_login_passwordfile='EXCLUSIVE'
undo_tablespace='UNDOTBS1'
undo_management=auto
log_archive_config='dg_config=(primdb,stbdb)'
log_archive_dest_1='location=+RECO/stddb/archivelog_std/ valid_for=(all_logfiles,all_roles) db_unique_name=stbdb'
問題1:
Listenerを設定する過程で、静的リスニングを設定するため、standbyが起動すると同時に動的リスニングが登録され、現在は2つのリスニングがあり、動的リスニングステータスはblockedです.問題は、duplicate接続時にORA-12528エラーが発生し、standbyに正常に接続できないことです.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[oracle@OEL ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 30-DEC-2013 10:46:03
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 27-NOV-2013 12:40:26
Uptime 32 days 22 hr. 5 min. 37 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/OEL/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=OEL.localdomain)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "PRIMDB" has 1 instance(s).
Instance "db", status READY, has 1 handler(s) for this service...
Service "dbXDB" has 1 instance(s).
Instance "db", status READY, has 1 handler(s) for this service...
Service "stbdb" has 1 instance(s).
Instance "stbdb", status UNKNOWN, has 1 handler(s) for this service... <<<<<<<<<<<<<<<<<<<<<<
Service "stddb" has 1 instance(s).
Instance "stddb", status READY, has 1 handler(s) for this service... <<<<<<<<<<<<<<<<<<<<<<
The command completed successfully
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tnsnamesを設定します.ora、この問題を解決することができます.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stddb =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = OEL.localdomain)(PORT = 1521))
)
(CONNECT_DATA = (SERVICE_NAME = stddb)(UR=A)) <--------------------In order to avoid error ORA-12528 )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
問題2:
次のコマンドを実行した後、直接primaryはcrashを削除し、alertを調べたところ、redo brokenによるALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSIONであることが分かった.
standby alertを検索すると、次の情報が表示されます.(ALTER DATABASE RECOVER...が開始された後、最初のことはclear redo logで、primary crashを直接招きます)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# tail -f alert_stddb.log
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 +DATA/db/redo02.log
Clearing online log 2 of thread 1 sequence number 5
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 +DATA/db/redo03.log
Clearing online log 3 of thread 1 sequence number 3
Tue Dec 24 14:32:49 2013
Clearing online redo logfile 3 complete
Tue Dec 24 14:32:49 2013
Media Recovery Waiting for thread 1 sequence 4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
何度も再建された後も、問題は依然として残っている.Datafileもarchivelogもconvertパラメータを使用して変換されていますが、redoは変換できません.この問題をどのように解決しますか?分析の結果、duplicateの最後に警告メッセージがいくつかあることがわかりました.
~~~~~~~~~duplicate the standby databsae~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RMAN> CONNECT TARGET SYS/oracle@db;
RMAN> CONNECT AUXILIARY SYS/oracle@stddb;
connected to auxiliary database: DB (not mounted)
RMAN> duplicate target database for standby;
Starting Duplicate Db at 24-DEC-13
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=13 device type=DISK
contents of Memory Script:
{
restore clone standby controlfile;
}
executing Memory Script
Starting restore at 24-DEC-13
using channel ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: reading from backup piece /tmp/stdbyctl.bkp
channel ORA_AUX_DISK_1: piece handle=/tmp/stdbyctl.bkp tag=TAG20131224T133449
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:08
output file name=+RECO/stddb/controlfile/current.259.835018533
Finished restore at 24-DEC-13
contents of Memory Script:
{
sql clone 'alter database mount standby database';
}
executing Memory Script
sql statement: alter database mount standby database
contents of Memory Script:
{
set newname for tempfile 1 to
"+RECO/stddb/temp01.dbf";
switch clone tempfile all;
set newname for datafile 1 to
"+RECO/stddb/system01.dbf";
set newname for datafile 2 to
"+RECO/stddb/sysaux01.dbf";
set newname for datafile 3 to
"+RECO/stddb/undotbs01.dbf";
set newname for datafile 4 to
"+RECO/stddb/users01.dbf";
restore
clone database
;
}
executing Memory Script
executing command: SET NEWNAME
renamed tempfile 1 to +RECO/stddb/temp01.dbf in control file
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
executing command: SET NEWNAME
Starting restore at 24-DEC-13
using channel ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to +RECO/stddb/system01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00002 to +RECO/stddb/sysaux01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00003 to +RECO/stddb/undotbs01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00004 to +RECO/stddb/users01.dbf
channel ORA_AUX_DISK_1: reading from backup piece /tmp/bk_01osanei_1_1
channel ORA_AUX_DISK_1: piece handle=/tmp/bk_01osanei_1_1 tag=TAG20131224T132953
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:04:27
Finished restore at 24-DEC-13
contents of Memory Script:
{
switch clone datafile all;
}
executing Memory Script
datafile 1 switched to datafile copy
input datafile copy RECID=1 STAMP=835019033 file name=+RECO/stddb/system01.dbf
datafile 2 switched to datafile copy
input datafile copy RECID=2 STAMP=835019033 file name=+RECO/stddb/sysaux01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=3 STAMP=835019033 file name=+RECO/stddb/undotbs01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=4 STAMP=835019033 file name=+RECO/stddb/users01.dbf
ORACLE error from auxiliary database: ORA-01511: error in renaming log/data files <<<<<<<<<<<<<<<<<<<<<<<<
ORA-01275: Operation RENAME is not allowed if standby file management is automatic. <<<<<<<<<<<<<<<<<<<<<<<<
RMAN-05535: WARNING: All redo log files were not defined properly.
ORACLE error from auxiliary database: ORA-01511: error in renaming log/data files
ORA-01275: Operation RENAME is not allowed if standby file management is automatic.
RMAN-05535: WARNING: All redo log files were not defined properly.
ORACLE error from auxiliary database: ORA-01511: error in renaming log/data files
ORA-01275: Operation RENAME is not allowed if standby file management is automatic.
RMAN-05535: WARNING: All redo log files were not defined properly.
Finished Duplicate Db at 24-DEC-13
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
後で手動で変更standby_file_managementはMANUALモードであり、duplicateでrenameに成功し、rename正broken 0~2.(これは私がpfileでこのstandby_file_management=autoを持っています)
checkpoint is 977553
last deallocation scn is 959057
Undo Optimization current scn is 974128
Tue Dec 24 15:16:57 2013
Switch of datafile 1 complete to datafile copy
checkpoint is 977553
Switch of datafile 2 complete to datafile copy
checkpoint is 977553
Switch of datafile 3 complete to datafile copy
checkpoint is 977553
Switch of datafile 4 complete to datafile copy
checkpoint is 977553
alter database rename file '+DATA/db/redo01.log' to 'broken0'
Completed: alter database rename file '+DATA/db/redo01.log' to 'broken0' <<<<<<<<
alter database rename file '+DATA/db/redo02.log' to 'broken1'
Completed: alter database rename file '+DATA/db/redo02.log' to 'broken1'
alter database rename file '+DATA/db/redo03.log' to 'broken2'
Completed: alter database rename file '+DATA/db/redo03.log' to 'broken2'
RFS connections are allowed
ALTER DATABASE RECOVERを実行中...の場合は、依然としてredoをクリアする必要がありますが、redo renameをクリアすべきなので、メインライブラリのredo logには影響しません.
~~~~~~~~~~~~~~~~~~~~~
Tue Dec 24 15:45:37 2013
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Attempt to start background Managed Standby Recovery process (stddb)
Tue Dec 24 15:45:37 2013
MRP0 started with pid=21, OS id=21343
MRP0: Background Managed Standby Recovery process started (stddb)
started logmerger process
Tue Dec 24 15:45:43 2013
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 2 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Errors in file /u01/app/oracle/product/11.2.0/dbhome_1/log/diag/rdbms/stddb/stddb/trace/stddb_mrp0_21343.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/broken0' <<<<<<<<<<<<<<<<
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
......
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION
Clearing online redo logfile 1 complete
Errors in file /u01/app/oracle/product/11.2.0/dbhome_1/log/diag/rdbms/stddb/stddb/trace/stddb_mrp0_21343.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/broken1'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
......
Tue Dec 24 15:45:48 2013
Clearing online redo logfile 2 complete
Errors in file /u01/app/oracle/product/11.2.0/dbhome_1/log/diag/rdbms/stddb/stddb/trace/stddb_mrp0_21343.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/broken2'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
......
Clearing online redo logfile 3 complete
Media Recovery Waiting for thread 1 sequence 4
まとめ:これまでのDG構築は、2つの異なるホスト上で実施されていたため、この問題にも関心を持っていなかった.standbyがprimary redoをクリアする動作は失敗したに違いないからだ.しかし、現在はホスト上で、standbyは元のprimaryのredoを発見することができ、renameが失敗したため、直接clear primary redoでprimaryが起動できない.root causeは私がpfileを編集したとき、standby_をたくさん書きました.file_management=auto、実は書かないとデフォルト値がmanualなので、この問題は発生しません.