Switchover Failed in Data Guard Broker

Recently I did some Data Guard tests on 11.2.0.3 RAC. Both primary and standby databases were on different Exadata QuarterRack. During one test, I might mess up some data guard parameters. When I performed switchover operation wzsdb->wzpdb, it failed in the middle of the process. This is an interesing scenario I have never run into in the past. Here is the result from the execution:

DGMGRL> show configuration
Configuration - DG_Config
  Protection Mode: MaxPerformance
  Databases:
	wzsdb - Primary database
	wzpdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

DGMGRL> switchover to wzpdb
Performing switchover NOW, please wait...
Error: ORA-16664: unable to receive the result from a database
Failed.
Unable to switchover, primary database is still "wzsdb"

two_primary
Majority of the time when there is an issue during the switchover using DG Broker, bounce both new primary database and new standby can usually resolve the issue. It didn’t work this time. Tried multiple bounce of both databases, restarted MRP manually. None of them works. Both databases claimed to be Primary database in the DG Broker, just like the two bears above. Here is what the result from DG Broker looks like.

Database wzpdb (Supposed new primary database)
DGMGRL> show configuration
Configuration – DG_Config
Protection Mode: MaxPerformance
Databases:
wzsdb – Primary database
wzpdb – Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ORA-16623: database detected role change
ORA-16625: cannot reach database “wzsdb”
DGM-17017: unable to determine configuration status

Database wzsdb (Supposed new standby database)
DGMGRL> show configuration
Configuration – DG_Config
Protection Mode: MaxPerformance
Databases:
wzpdb – Primary database
wzsdb – Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
ORA-16597: Data Guard broker detects two or more primary databases
ORA-16625: cannot reach database “wzpdb”
DGM-17017: unable to determine configuration status

Obvious I should not have two primary databases in Data Guard. Next thing I would like to check whether this is the issue inside Data Guard Broker. I run the the following queries on both databases.
Database wzpdb (Supposed new primary database)

SYS:wzdb1> @db_mode
    INST_ID DATABASE_ROLE    OPEN_MODE            LOG_MODE     FLASHBACK_ON       FOR
----------- ---------------- -------------------- ------------ ------------------ ---
          1 PRIMARY          READ WRITE           ARCHIVELOG   NO                 YES
          2 PRIMARY          READ WRITE           ARCHIVELOG   NO                 YES

The above result is what I expected and wzpdb is in primary role

Database wzsdb (Supposed new standby database)

SYS@wzdb1> @db_mode
   INST_ID DATABASE_ROLE    OPEN_MODE            LOG_MODE     FLASHBACK_ON       FOR
---------- ---------------- -------------------- ------------ ------------------ ---
	 2 PHYSICAL STANDBY MOUNTED              ARCHIVELOG   NO                 YES
	 1 PHYSICAL STANDBY MOUNTED              ARCHIVELOG   NO                 YES

The result is also correct in wzsdb and it is in standby role.

Tried to test out MRP process

SYS@wzdb1> @dg_sby_process
PROCESS   CLIENT_P  SEQUENCE# STATUS
--------- -------- ---------- ------------
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
8 rows selected.

SYS@wzdb1> alter database recover managed standby database using current logfile disconnect;
Database altered.				 

SYS@wzdb1> @dg_sby_process
PROCESS   CLIENT_P  SEQUENCE# STATUS
--------- -------- ---------- ------------
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
MRP0      N/A             168 WAIT_FOR_LOG
9 rows selected.

PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK#     BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
MRP0      WAIT_FOR_LOG          1        168          0          0

The data guard processes on standby also looked ok. Retried show configuration command on both databases and got the same errors. At this moment, it seems like the solution is to recreate the DG Broker. So I went ahead and do the followings to recreate the brokers:
Standby Database (wzsdb)
Step 1. Make sure to stop MRP first
SYS@wzdb1> alter database recover managed standby database cancel;
Database altered.
Step 2. Stop dg broker and remove the files
SYS@wzdb1> alter system set dg_broker_start=false scope=both sid=’*’;
System altered.

ASMCMD> cd +data/wzsdb/dataguardconfig
ASMCMD> ls
dgb_config01.ora
wzsdb.1188.852022993
ASMCMD> rm dgb_config*
You may delete multiple files and/or directories.
Are you sure? (y/n) y
ASMCMD> cd +reco/wzsdb/dataguardconfig
ASMCMD> ls
dgb_config02.ora
wzsdb.1049.852022971
ASMCMD> rm dgb_config*
You may delete multiple files and/or directories.
Are you sure? (y/n) y
ASMCMD> ls

Step 3. Recreate the broker
SYS@wzdb1> alter system set dg_broker_start=false scope=both sid=’*’;
System altered.
SYS@wzdb1> ALTER SYSTEM SET DG_BROKER_CONFIG_FILE1=’+reco/wzsdb/DATAGUARDCONFIG/dgb_config02.ora’ SCOPE=BOTH sid=’*’;
System altered.
SYS@wzdb1> ALTER SYSTEM SET DG_BROKER_CONFIG_FILE2=’+data/wzsdb/DATAGUARDCONFIG/dgb_config01.ora’ SCOPE=BOTH sid=’*’;
System altered.
SYS@wzdb1> alter system set dg_broker_start=true scope=both sid=’*’;
System altered.

Primary Database (wzpdb)
Perform similar steps as above on primary database.

Reconfigure the DG Broker
At this moment, both dg broker on both primary and standby were started. Configure and enable the configuration.
[enkdb01:oracle:wzdb1] /home/oracle/wzhou/dg
> dgmgrl
DGMGRL for Linux: Version 11.2.0.3.0 – 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type “help” for information.
DGMGRL> connect sys
Password:
Connected.
DGMGRL> show configuration
ORA-16532: Data Guard broker configuration does not exist
Configuration details cannot be determined by DGMGRL

DGMGRL> CREATE CONFIGURATION ‘DG_Config’ AS PRIMARY DATABASE IS ‘wzpdb’ CONNECT IDENTIFIER IS ‘wzpdb’;
Configuration “DG_Config” created with primary database “wzpdb”
DGMGRL> ADD DATABASE ‘wzsdb’ AS CONNECT IDENTIFIER IS wzsdb;
Database “wzsdb” added
DGMGRL> enable configuration;
Enabled.

Let see the result.

DGMGRL> show configuration
Configuration - DG_Config
  Protection Mode: MaxPerformance
  Databases:
	wzpdb - Primary database
	wzsdb - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

We are back in business. Another possible solution is not to recreate the DG Broker files completely, but just remove the DG Broker configuration from dgmgrl, and then recreate the the configuration in dgmgrl. Next time if I run into the similar issue, I will try it out.

Script used in the blog:
db_mode.sql
select inst_id, database_role, open_mode, log_mode, flashback_on, force_logging from gv$database;

dg_sby_process.sql

select process, client_process, sequence#, status from v$managed_standby;select process, status, thread#, sequence#, block#, blocks from gv$managed_standby where thread# <> 0;