Different Results from Data Guard’s Show Configuration Command

cat_lion_mirror
Recently I built a Data Guard environment on two Exadatas with three RAC databases and did a lot of tests. The Show Configuration is probably the most frequent command I used in DG Broker.

When running show configuration from dgmgrl, we usually see the same result no matter where the command is executed, primary or any standby databases. During one switchover test, I run into a weird situation. The show configuration command can return me three different results from one primary database and two standby databases, just like above the image above (cat changes into a lion from the mirror). Here are the result:

Primary Database (wzxdb)

DGMGRL> show configuration
Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzsdb - Physical standby database
	  Error: ORA-16664: unable to receive the result from a database

	wzpdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR 

Checked the DG Broker log at /u01/app/oracle/diag/rdbms/wzxdb/wzkdb1/trace/drcwzkdb1.log, it has something below

07/16/2014 09:30:20
Site wzsdb returned ORA-16664.
Data Guard Broker Status Summary:
  Type                        Name              Severity  Status
  Configuration               DG_Config         Warning  ORA-16607
  Primary Database            wzxdb             Success  ORA-00000
  Physical Standby Database   wzsdb               Error  ORA-16664
  Physical Standby Database   wzpdb             Success  ORA-00000

Let’s continue to check status for the standby databases.
1st Standby Database, wzpdb

DGMGRL> show configuration
Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzsdb - Physical standby database
	wzpdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

2nd Standby Database, wzsdb

DGMGRL> show configuration
Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzsdb - Physical standby database
	wzpdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ORA-16501: the Data Guard broker operation failed
ORA-16625: cannot reach database "wzxdb"
DGM-17017: unable to determine configuration status	

The first thing I checked whether Data Guard replication was still working or not. Did a few switch logfile from primary and can see the logs were replicated to two standby databases. Verified data guard related parameters, tnsnames and listener entries in all databases. Found no issue there. At this moment, I narrowed down the issue to DG Broker and suspect it could relate to DG Broker configuration. After a few tries, I found a solution to fix this issue.
1. On primary db (wzxdb), remove the database wzsdb from DG Broker configuration, then add it back.
2. On standby db (wzsdb), bounce the database.

Here are the detail steps:

Primary Database (wzxdb)
DGMGRL> remove database wzsdb
Removed database "wzsdb" from the configuration
DGMGRL> add database wzsdb as connect identifier is wzsdb;
Database "wzsdb" added
DGMGRL> enable configuration
Enabled.
DGMGRL> show configuration

Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzpdb - Physical standby database
	wzsdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

After fixing the issue in primary database, let’s goto the standby database with issues. It still has the same error from show configuration command. So I went ahead bouncing the database.
srvctl stop database -d wzsdb
srvctl start database -d wzsdb

DGMGRL> show configuration

Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzpdb - Physical standby database
	wzsdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Here is part of the content from data guard broker log on this standby database.

07/16/2014 09:29:58
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:31:09
REMOVE DATABASE wzsdb [PRESERVE DESTINATIONS]
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:31:37
DMON Deregistering service wzsdb_DGB with listener(s)
07/16/2014 09:32:12
DMON Registering service wzsdb_DGB with listener(s)
07/16/2014 09:32:15
Apply Instance for Database wzsdb set to wzdb1
07/16/2014 09:32:19
Failed to send message to site wzxdb. Error code is ORA-16501.
Command EDIT DATABASE wzsdb SET PROPERTY ActualApplyInstance = wzdb1 completed
07/16/2014 09:32:29
Command ENABLE CONFIGURATION completed
07/16/2014 09:32:48
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:33:04
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:33:23
Failed to connect to remote database wzxdb. Error is ORA-12154
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:34:15
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:35:30
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:36:45
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:38:00
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:38:27
Data Guard Broker shutting down
RSM0 successfully terminated
07/16/2014 09:38:29
>> DMON Process Shutdown <<
07/16/2014 09:39:18
>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
	  dg_broker_config_file1 = "+RECO/wzsdb/dataguardconfig/dgb_config02.ora"
	  dg_broker_config_file2 = "+DATA/wzsdb/dataguardconfig/dgb_config01.ora"
07/16/2014 09:39:22
DMON Registering service wzsdb_DGB with listener(s)
Broker Configuration:       "DG_Config"
	  Protection Mode:            Maximum Performance
	  Fast-Start Failover (FSFO): Disabled, flags=0x0, version=0
	  Primary Database:           wzxdb (0x03010000)
	  Standby Database:           wzsdb, Enabled Physical Standby (0x01010000)
	  Standby Database:           wzpdb, Enabled Physical Standby (0x02010000)
07/16/2014 09:39:25
wzsdb version check successfully completed
wzsdb has current configuration metadata,
	  completing bootstrap
Creating process RSM0
07/16/2014 09:39:28
Apply Instance for Database wzsdb set to wzdb1
07/16/2014 09:39:37 

We can see it seems the standby database received commands like REMOVE DATABASE wzsdb and ENABLE CONFIGURATION from primary DG Broker, but just can not send the message back to primary database. After bouncing the standby database, it returned normal and can communicate back to primary database.

Finally, all databases have this SUCCESS status no matter where I run the show configuration command.

Script to identify the restore/recover point for archive logs

I did some work on an interesting project to keep a standby database in sync with a production primary database manually. This is not a true standby database as the primary database does not communicate with this standby database. Due to certain reason, we could not configure Data Guard to allow data replication between these two database. So no way to do the redo log shipping like we do in Data Guard environment. What I mean manually is we take the archivelog backup from the previous day, restore and recover to this standby database. As this database is a VLDB, the volume of daily archive log files is in size of multi-terabyte. We use an Exadata X-4 full rack to host this standby database. Even with restoring using all db nodes and 200+ channels, it still take several hours for restore only. And similar timing in recovering these archive logs. Not mention the time copying file between two data centers. It takes a lot of efforts to keep up with production primary database and reduce the lag between these two databases.

The benefit doing this manually is the minimum impact in current production environment. The only overhead on production db is when copying files to Exadata. The impact is quite low. We scp rman backup pieces using all 8 db nodes to maximize the utilization of band width.

One major task during this restore and recover is to identify the correct restore and recover point from the daily rman backupset for the archive logs. Identify the right recover point, different people might have different opinions. Just like the image below, how many bars can you see, three or four?

three_or_four_bars

There are many blogs and articles discussing the way to identify the correct restore and recover points. The majority of people like to use v$archived_log view to get the recover point. In my scenario, it did not work well as I can get the correct recovery point only after I restore all the archive logfiles. What I want is after cataloging the rman backup piece, what are last applied archive log sequence for each thread, and what are my next recover point for the current rman backup pieces that are just cataloged.

Using both v$archived_log and v$backup_archivelog_details views,  I created a script that can help me answer all the questions I have.
1. The restore commands can be used for each thread
2. The last applied archive log sequence for each thread, also include the timestamp and next change SCN#
3. The last possible recover point for each thread for the cataloged rman backup pieces
4. The recover command

The script is listed as follows:

db_arc_seq_range.sql

col "Restore Command" for a100
col "Applied Logs" for a100
col "Catalog Logs" for a100
col "Recover Command" for a80
select ' restore archivelog from logseq ' || applied_arc.startNo || ' until logseq ' || catalog_arc.endNo || ' thread=' || catalog_arc.thread# || ';' "Restore Command"
from
--(select thread#,max(sequence#) + 1 startNo from gv$archived_log where applied='YES' group by thread#) applied_arc,
(select thread#,max(sequence#) startNo from gv$archived_log where applied='YES' group by thread#) applied_arc,
(select thread#, max(sequence#) endNo from v$backup_archivelog_details group by thread#) catalog_arc
where applied_arc.thread# = catalog_arc.thread#;

prompt '=========== Archive Log Info ============='
select distinct 'Thread ' || thread# || ': last applied archive log ' || sequence# || ' at ' || to_char(next_time, 'MON/DD/YYYY HH24:MI:SS') || ' next change# ' || next_change# "Applied Logs"
from v$archived_log
where thread# || '_' || sequence# in
(select thread# || '_' || max(sequence#) from v$archived_log where applied='YES' group by thread#)
--and applied='YES'
;
select 'Thread ' || thread# || ': last cataloged archive log ' || sequence# || ' at ' || to_char(next_time, 'MON/DD/YYYY HH24:MI:SS') || ' next change# ' || next_change# "Catalog Logs"
from v$backup_archivelog_details
where thread# || '_' || sequence# in
(select thread# || '_' || max(sequence#) from v$backup_archivelog_details group by thread#)
;

prompt '=========== recover point ================'
--select 'recover database until sequence ' || seq# || ' thread ' || thread# || ' delete archivelog maxsize 4000g; ' Content
select 'set until sequence ' || seq# || ' thread ' || thread# || '; ' || chr(13)|| chr(10) || 'recover database delete archivelog maxsize 4000g; ' "Recover Command"
from (
select * from (
select thread#, sequence# + 1 seq#, next_change# from (
select * from v$backup_archivelog_details
where thread# || '_' || sequence# in
(select thread# || '_' || max(sequence#) from v$backup_archivelog_details group by thread#)
)
order by next_change#
)
where
rownum = 1
)
;

The following example shows the execution of the script.

SYS&gt; <strong>@db_arc_seq_range</strong>
Restore Command
----------------------------------------------------------------------------------------------------
restore archivelog from logseq 323498 until logseq 324015 thread=1;
restore archivelog from logseq 351250 until logseq 351828 thread=2;
restore archivelog from logseq 308766 until logseq 309396 thread=3;
restore archivelog from logseq 345805 until logseq 346271 thread=4;
restore archivelog from logseq 629650 until logseq 630749 thread=5;
restore archivelog from logseq 502202 until logseq 502899 thread=6;

6 rows selected.

'=========== Archive Log Info ============='

Applied Logs
----------------------------------------------------------------------------------------------------
Thread 1: last applied archive log 323498 at SEP/16/2014 22:41:27 next change# 10900757473229
Thread 2: last applied archive log 351250 at SEP/16/2014 22:41:28 next change# 10900757476463
Thread 3: last applied archive log 308766 at SEP/16/2014 22:44:30 next change# 10900759270706
Thread 4: last applied archive log 345805 at SEP/16/2014 22:43:42 next change# 10900758591989
Thread 5: last applied archive log 629650 at SEP/16/2014 22:43:39 next change# 10900758575645
Thread 6: last applied archive log 502202 at SEP/16/2014 22:42:06 next change# 10900757720611

6 rows selected.
Catalog Logs
----------------------------------------------------------------------------------------------------
Thread 1: last cataloged archive log 324015 at SEP/17/2014 23:12:31 next change# 10902104866903
Thread 2: last cataloged archive log 351828 at SEP/17/2014 23:12:31 next change# 10902104871653
Thread 3: last cataloged archive log 309396 at SEP/17/2014 23:12:29 next change# 10902104850854
Thread 4: last cataloged archive log 346271 at SEP/17/2014 23:12:30 next change# 10902104860405
Thread 5: last cataloged archive log 630749 at SEP/17/2014 23:12:30 next change# 10902104862135
Thread 6: last cataloged archive log 502899 at SEP/17/2014 23:12:32 next change# 10902104879394

6 rows selected.

'=========== recover point ================'

Recover Command
--------------------------------------------------------------------------------
set until sequence 309397 thread 3;
recover database delete archivelog maxsize 4000g;

It shows that we completed the restore and recover of Sep. 16’s archive logs and the recover point for Sep. 17’s archive logs is sequence 309397 thread 3.

Switchover Failed in Data Guard Broker

Recently I did some Data Guard tests on 11.2.0.3 RAC. Both primary and standby databases were on different Exadata QuarterRack. During one test, I might mess up some data guard parameters. When I performed switchover operation wzsdb->wzpdb, it failed in the middle of the process. This is an interesing scenario I have never run into in the past. Here is the result from the execution:

DGMGRL> show configuration
Configuration - DG_Config
  Protection Mode: MaxPerformance
  Databases:
	wzsdb - Primary database
	wzpdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

DGMGRL> switchover to wzpdb
Performing switchover NOW, please wait...
Error: ORA-16664: unable to receive the result from a database
Failed.
Unable to switchover, primary database is still "wzsdb"

two_primary
Majority of the time when there is an issue during the switchover using DG Broker, bounce both new primary database and new standby can usually resolve the issue. It didn’t work this time. Tried multiple bounce of both databases, restarted MRP manually. None of them works. Both databases claimed to be Primary database in the DG Broker, just like the two bears above. Here is what the result from DG Broker looks like.

Database wzpdb (Supposed new primary database)
DGMGRL> show configuration
Configuration – DG_Config
Protection Mode: MaxPerformance
Databases:
wzsdb – Primary database
wzpdb – Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ORA-16623: database detected role change
ORA-16625: cannot reach database “wzsdb”
DGM-17017: unable to determine configuration status

Database wzsdb (Supposed new standby database)
DGMGRL> show configuration
Configuration – DG_Config
Protection Mode: MaxPerformance
Databases:
wzpdb – Primary database
wzsdb – Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ORA-16597: Data Guard broker detects two or more primary databases
ORA-16625: cannot reach database “wzpdb”
DGM-17017: unable to determine configuration status

Obvious I should not have two primary databases in Data Guard. Next thing I would like to check whether this is the issue inside Data Guard Broker. I run the the following queries on both databases.
Database wzpdb (Supposed new primary database)

SYS:wzdb1> @db_mode
    INST_ID DATABASE_ROLE    OPEN_MODE            LOG_MODE     FLASHBACK_ON       FOR
----------- ---------------- -------------------- ------------ ------------------ ---
          1 PRIMARY          READ WRITE           ARCHIVELOG   NO                 YES
          2 PRIMARY          READ WRITE           ARCHIVELOG   NO                 YES

The above result is what I expected and wzpdb is in primary role

Database wzsdb (Supposed new standby database)

SYS@wzdb1> @db_mode
   INST_ID DATABASE_ROLE    OPEN_MODE            LOG_MODE     FLASHBACK_ON       FOR
---------- ---------------- -------------------- ------------ ------------------ ---
	 2 PHYSICAL STANDBY MOUNTED              ARCHIVELOG   NO                 YES
	 1 PHYSICAL STANDBY MOUNTED              ARCHIVELOG   NO                 YES

The result is also correct in wzsdb and it is in standby role.

Tried to test out MRP process

SYS@wzdb1> @dg_sby_process
PROCESS   CLIENT_P  SEQUENCE# STATUS
--------- -------- ---------- ------------
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
8 rows selected.

SYS@wzdb1> alter database recover managed standby database using current logfile disconnect;
Database altered.				 

SYS@wzdb1> @dg_sby_process
PROCESS   CLIENT_P  SEQUENCE# STATUS
--------- -------- ---------- ------------
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
MRP0      N/A             168 WAIT_FOR_LOG
9 rows selected.

PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK#     BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
MRP0      WAIT_FOR_LOG          1        168          0          0

The data guard processes on standby also looked ok. Retried show configuration command on both databases and got the same errors. At this moment, it seems like the solution is to recreate the DG Broker. So I went ahead and do the followings to recreate the brokers:
Standby Database (wzsdb)
Step 1. Make sure to stop MRP first
SYS@wzdb1> alter database recover managed standby database cancel;
Database altered.
Step 2. Stop dg broker and remove the files
SYS@wzdb1> alter system set dg_broker_start=false scope=both sid=’*’;
System altered.

ASMCMD> cd +data/wzsdb/dataguardconfig
ASMCMD> ls
dgb_config01.ora
wzsdb.1188.852022993
ASMCMD> rm dgb_config*
You may delete multiple files and/or directories.
Are you sure? (y/n) y
ASMCMD> cd +reco/wzsdb/dataguardconfig
ASMCMD> ls
dgb_config02.ora
wzsdb.1049.852022971
ASMCMD> rm dgb_config*
You may delete multiple files and/or directories.
Are you sure? (y/n) y
ASMCMD> ls

Step 3. Recreate the broker
SYS@wzdb1> alter system set dg_broker_start=false scope=both sid=’*’;
System altered.
SYS@wzdb1> ALTER SYSTEM SET DG_BROKER_CONFIG_FILE1=’+reco/wzsdb/DATAGUARDCONFIG/dgb_config02.ora’ SCOPE=BOTH sid=’*’;
System altered.
SYS@wzdb1> ALTER SYSTEM SET DG_BROKER_CONFIG_FILE2=’+data/wzsdb/DATAGUARDCONFIG/dgb_config01.ora’ SCOPE=BOTH sid=’*’;
System altered.
SYS@wzdb1> alter system set dg_broker_start=true scope=both sid=’*’;
System altered.

Primary Database (wzpdb)
Perform similar steps as above on primary database.

Reconfigure the DG Broker
At this moment, both dg broker on both primary and standby were started. Configure and enable the configuration.
[enkdb01:oracle:wzdb1] /home/oracle/wzhou/dg
> dgmgrl
DGMGRL for Linux: Version 11.2.0.3.0 – 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type “help” for information.
DGMGRL> connect sys
Password:
Connected.
DGMGRL> show configuration
ORA-16532: Data Guard broker configuration does not exist
Configuration details cannot be determined by DGMGRL

DGMGRL> CREATE CONFIGURATION ‘DG_Config’ AS PRIMARY DATABASE IS ‘wzpdb’ CONNECT IDENTIFIER IS ‘wzpdb’;
Configuration “DG_Config” created with primary database “wzpdb”
DGMGRL> ADD DATABASE ‘wzsdb’ AS CONNECT IDENTIFIER IS wzsdb;
Database “wzsdb” added
DGMGRL> enable configuration;
Enabled.

Let see the result.

DGMGRL> show configuration
Configuration - DG_Config
  Protection Mode: MaxPerformance
  Databases:
	wzpdb - Primary database
	wzsdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

We are back in business. Another possible solution is not to recreate the DG Broker files completely, but just remove the DG Broker configuration from dgmgrl, and then recreate the the configuration in dgmgrl. Next time if I run into the similar issue, I will try it out.

Script used in the blog:
db_mode.sql
select inst_id, database_role, open_mode, log_mode, flashback_on, force_logging from gv$database;

dg_sby_process.sql

select process, client_process, sequence#, status from v$managed_standby;select process, status, thread#, sequence#, block#, blocks from gv$managed_standby where thread# <> 0;