Moving SQL Profile during Exadata Migration

My last post discusses the steps to move stats to Exadata environment. During the same cutover window, we actually did another task: move certain SQL Profiles from UAT environment to PROD environment. Just like photo below, let’s do the moving in one run.
moving_stuff

For SQL Profile, I usually use it as the last approach to resolve production performance issue. I always recommend my clients to change application code to tune the poor performed queries first. Only at the time no code change is allowed and production database is suffering from bad queries significantly, SQL Profiles will be used. For a large legacy database suffering from poor performance in the past, it’s not surprise to see many SQL Profiles in the legacy database. If there are only 10 or 20 SQL Profiles, it can be manageable. But there is no way to maintain hundreds, even thousands SQL Profiles in the database, especially during migration from 11.2.0.3 Non-Exadata to 11.2.0.4 Exadata environment. So after we restore the legacy database into an UAT environment on Exadata, we disabled all SQL Profiles in the database. During the testing, if query performance was not good in Exadata environment, we implemented new SQL Profile or reused the old one if existing one performed better. By using strategy, we cut down the SQL Profiles from 256 before migration to 9 during the migration. During the cutover window, we only need to move these 9 SQL Profiles from UAT to PROD environment.

The followings are the sample steps for moving SQL Profiles. It uses the same db_my_stats schema created in the previous post. Here is the summary of the steps.
1. Copy the enabled SQL Profile to a staging table under the same db_my_stats schema created in the last post.
2. Export the staging table.
3. Copy the export data pump files to PROD database on Exadata.
4. Disable all SQL Profile in PROD database.
5. Import the enabled SQL Profiles in PROD database.

Step 2 and 3 are discussed in my last post and I am not going to repeat them here. Only step 4 and 5 are performed during cutover window. Here are the sample scripts to illustrate the steps:

Copy the enabled SQL Profile to a staging table
Generate necessary sql scripts.
vi gen_exp_imp_sqlprofile.sql

set echo off
set feedback off
set lines 150
set pages 0

prompt ---------------- Generate SQL Profile Export and Import -----------------

prompt
prompt -- generate SQL to export SQL profile
prompt
spool EXPORT_TO_SQLPROFILE_TABLE.SQL
select 'spool EXPORT_TO_SQLPROFILE_TABLE.log' from dual;
select
  'EXEC DBMS_SQLTUNE.PACK_STGTAB_SQLPROF ( ' ||
  'staging_table_name => ''PROFILE_STG'', profile_name=>''' || name || '''); ' content
  from dba_sql_profiles
 where status = 'ENABLED'
order by name;
select 'spool off' from dual;

prompt
prompt -- generate SQL to get total rows in the SQL Profile table
prompt
spool GET_EXPORT_SQL_PROFILE_ROW_COUNT.SQL
select 'spool GET_EXPORT_SQL_PROFILE_ROW_COUNT.log' from dual;
select 'prompt SQL profiles qualifed for export ' from dual;
select 'SELECT name, status FROM dba_sql_profiles where status = ''ENABLED'' order by name;' from dual;

select 'prompt SQL profiles exported ' from dual;
select 'SELECT count(*) FROM DB_MY_STATS.PROFILE_STG;' from dual;
select 'select obj_name, obj_type, sql_handle  status from dbmir.PROFILE_STG order by object_name;' from dual;
select 'spool off' from dual;

prompt
prompt -- generate SQL to import SQL profile
prompt
spool IMPORT_TO_SQLPROFILE_TABLE.SQL
select 'spool IMPORT_TO_SQLPROFILE_TABLE.log' from dual;
select
  'EXEC DBMS_SQLTUNE.UNPACK_STGTAB_SQLPROF(REPLACE => TRUE, ' ||
  'staging_table_name => ''PROFILE_STG'', profile_name=>''' || name || '''); ' content
  from dba_sql_profiles
 where status = 'ENABLED'
order by name;
select 'spool off' from dual;

spool off

Run the gen script.
@gen_exp_imp_sqlprofile

Export SQL Profile
sqlplus / as sysdba
connect db_my_stats/db_my_stats

Create SQL profile staging table
EXEC DBMS_SQLTUNE.CREATE_STGTAB_SQLPROF (table_name => ‘PROFILE_STG’, schema_name=>’DB_MY_STATS’);

Run the script to export SQL Profile to the staging table and get the row count for the exported profiles.
@EXPORT_TO_SQLPROFILE_TABLE.SQL
@GET_EXPORT_SQL_PROFILE_ROW_COUNT.SQL

At this moment, the export is done. Then follows the export and import step from last post to populate the staging table PROFILE_STG in the target database.

Disable all SQL Profiles
vi disable_all_sqlprofile.sql

BEGIN
for r in
( select name, category, status, sql_text, force_matching
    from dba_sql_profiles
   order by category, name)
  loop
     DBMS_SQLTUNE.ALTER_SQL_PROFILE (name => r.name, attribute_name => 'STATUS', value =>  'DISABLED');
  end loop;
END;
/

Run the script.
@disable_all_sqlprofile.sql

Import SQL Profile
@IMPORT_TO_SQLPROFILE_TABLE.SQL

Verify the the enabled SQL Profile
select name, category, type, status from dba_sql_profiles
where status = ‘ENABLED';

Ok, we are done with the moving of SQL Profile here.

Moving STATS for Large Database during Exadata Migration

I have done many Exadata migrations for the past two and half years. For the majority of migrations, my clients have enough downtime to complete the statistic gathering during the cutover window. However, this strategy is not going to work for a large database. Recently we migrated a 110+ TB database to Exadata. There were many chanllenges involved in this migration, like STATS gathering, tablespace reaching 32TB limit and 1022 datafile limit, upgrade from 11.2.0.3 to 11.2.0.4 during the cutover window and critial query tuning with only a few weeks before cutover. Anyway, there were many moving parts during the migration. In this blog, I am focusing only on STATS strategy during the cutover window.

This database had issue in gathering stats in the past and run for days without completion in the legacy environment. As the stats gathering had significant impact on the database performance, the client had to turn off the stats gathering and lock the stats. So many tables’ stats were at least two years old. With the database moving from non-Exadata 11.2.0.3 environment to Exadata 11.2.0.4 environment, we need to have the new stats available before the database can be released to business users. The question was how we were going to do it? Just like the drawing below, we need to find the fastest window to get our work done.

fast_stats

Both UAT and PROD environments are using X4 full rack Exadata. Even with eight database servers, the full stats gathering with 384 parallism could still take a few days. Even incremental stats gathering could take 10+ hours. It is definitely not going to work during cutover window with limited downtime available.

For this migration, we used Data Guard and our physical standby on Exadata was exact physical copy of the legacy primary database. We also built the UAT database with RMAN restore and last build of the UAT was just two weeks away from the cutover date. So we use the following strategy to work around the stats gathering problem during cutover window.
1.  Gather stats on UAT database. It doesn’t matter whether it takes one day or two days. As long as it is outside the cutover window, we were fine with that.
2. Copy the STATS in UAT database to STATS staging tables.
3. Export the staging tables.
4. Copy the export data pump files to PROD database on Exadata
5. Import the stats to PROD DB.

Step 1 to 4 do not have downtime requirement and can be done ahead of time. Only the last step needs to be performed during the cutover window. In this blog, I am not going to discuss Step 1 and Step 4, and only on exporting and importing the stats. I used our lab to demonstrate the steps and scripts involved.

Export Stats from Source Database

cd /dbfs/work/wzhou/test_stat

sqlplus / as sysdba
create user db_my_stats identified by db_my_stats default tablespace users temporary tablespace temp; 
grant connect, resource, dba to db_my_stats;  
create directory DB_MY_STATS_DIR as '/dbfs/work/wzhou/test_stat'; 
grant read, write on directory DB_MY_STATS_DIR to system;

Generate necessary sql scripts.
vi gen_exp_imp_stats.sql

-- Generate all the necessary scripts for export and import schema stats
--set time on
--set timing on
set echo off
set feedback off
set lines 150
set pages 0

prompt
prompt generate SQL to create schema stat table
prompt
spool CREATE_SCHEMA_STATS_TABLE.SQL
select 'set timing on' from dual;
select 'spool CREATE_SCHEMA_STATS_TABLE.log' from dual;
select 'prompt '||owner||chr(10)||
  'exec DBMS_STATS.CREATE_STAT_TABLE(''DB_MY_STATS'', ''STATS_' || owner || ''');' content
  from dba_segments
 where owner not in ( 'DB_MY_STATS', 'ANONYMOUS','APPQOSSYS','DBSNMP','ORACLE_OCM','OUTLN','SYS','SYSTEM','WMSYS','XDB','XS$NULL', 'ORDSYS','ORDDATA')
group by owner
order by owner;
select 'spool off' from dual;
spool off

prompt
prompt -- generate SQL to export schema stat to stat table
prompt
spool EXPORT_TO_SCHEMA_STATS_TABLE.SQL
select 'set timing on' from dual;
select 'spool EXPORT_TO_SCHEMA_STATS_TABLE.log' from dual;
select 'prompt '||owner||chr(10)||
  'exec DBMS_STATS.EXPORT_SCHEMA_STATS( ' ||
  'ownname => ''' || owner || ''', ' ||
  'stattab => ''STATS_' || owner || ''', ' ||
  'statid => ''' || owner || ''', ' ||
  'statown => ''DB_MY_STATS'');' content
  from dba_segments
 where owner not in ( 'DB_MY_STATS', 'ANONYMOUS','APPQOSSYS','DBSNMP','ORACLE_OCM','OUTLN','SYS','SYSTEM','WMSYS','XDB','XS$NULL', 'ORDSYS','ORDDATA')
group by owner
order by owner;
select 'spool off' from dual;
spool off


prompt
prompt -- generate SQL to import schema stat to stat table
prompt
spool IMPORT_TO_SCHEMA_STATS_TABLE.SQL
select 'set timing on' from dual;
select 'spool IMPORT_TO_SCHEMA_STATS_TABLE.log' from dual;
select 'prompt '|| owner || chr(10) ||
  --'--exec dbms_stats.delete_schema_stats (ownname => '''||owner||''');' || chr(10)||
  'exec DBMS_STATS.IMPORT_SCHEMA_STATS( ' ||
  'ownname => ''' || owner || ''', ' ||
  'stattab => ''STATS_' || owner || ''', ' ||
  'statid => ''' || owner || ''', ' ||
  'statown => ''DB_MY_STATS'');' content
  from dba_segments
 where owner not in ( 'DB_MY_STATS', 'ANONYMOUS','APPQOSSYS','DBSNMP','ORACLE_OCM','OUTLN','SYS','SYSTEM','WMSYS','XDB','XS$NULL', 'ORDSYS','ORDDATA')
group by owner
order by owner;
select 'spool off' from dual;

Run the gen script.
@gen_exp_imp_stats

Create stats staging table.
@CREATE_SCHEMA_STATS_TABLE.SQL

Export schema stats to the stats staging table
@EXPORT_TO_SCHEMA_STATS_TABLE.SQL

Prepare the export script
vi run_exp_stats
expdp parfile=exp_schema_stats.par

vi exp_schema_stats.par
USERID=db_my_stats/db_my_stats
PARALLEL=4
DIRECTORY=DB_MY_STATS_DIR
DUMPFILE=stats_%U.dmp
LOGFILE=stats_exp.log
METRICS=Y
FILESIZE=4G
SCHEMAS=DB_MY_STATS
JOB_NAME=stats_exp

Run the script to export the schema
run_exp_stats

Ok, at this moment, the stats export is done. Copy the datapump file to target system.

Import Stats to Target Database
On target database, do the similar step as above.

create user db_my_stats identified by db_my_stats default tablespace users temporary tablespace temp; 
grant connect, resource, dba to db_my_stats;  
create directory DB_MY_STATS_DIR as '/dbfs/work/wzhou/test_stat'; 
grant read, write on directory DB_MY_STATS_DIR to system;

Prepare the import script.
vi run_imp_stats
impdp parfile=imp_schema_stats.par

vi imp_schema_stats.par
USERID=db_my_stats/db_my_stats
PARALLEL=4
DIRECTORY=DB_MY_STATS_DIR
DUMPFILE=stats_%U.dmp
LOGFILE=stats_imp.log
METRICS=Y
TABLE_EXISTS_ACTION=REPLACE
SCHEMAS=DB_MY_STATS
JOB_NAME=stats_imp

Run the script to import the schema
run_imp_stats

Import the stats from stats staging tables.
@IMPORT_TO_SCHEMA_STATS_TABLE.SQL

Ok, we’re done with the stats import.

The above strategy worked pretty well during the cutover window. The export of STATS took about half hour and the import of STATS took slight less time than the export. We are happy the STATS work does not take significant amount of time for this 110+ TB databases during the cutover.

Out of Space Error while still Have Space

Recently I worked on a large database on a X-4 full rack Exadata for a few months. I am using sqlplus command every day on this database without any issue. Then suddenly, I got the following error on db node 1 when trying to run sqlplus.

$ sqlplus / as sysdba
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 28: No space left on device
Additional information: 9925
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 28: No space left on device
Additional information: 9925

I know this is the issue at OS level, not at database level. The audit trail directory should be under /u01. Run the df command. Interesting, I see it still have about 12G available on /u01, just like the parking space image below.

$ df -kh /u01
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbOra1
                       99G   82G   12G  88% /u01

space_issue_201410
I know /u01 takes some space, but 82G out of 99G seem too much. Check out the same /u01 on db node 2. Yes, it has only 22G used. Try sqlplus there without any issue. At this moment I roughly know the cause of the issue. I saw similar space issue in another client in the past and the database there had serious performance issue. The cause of the issue was millions of audit files under the audit directory.

When we run df or du command, majority of time we are only interested in how much space we use and how much space is available. This is the space limit in the file system. There is another limit: inode limit. Inode is the metadata of a file, containing information like file size, owner, group, file access/modify/change time and much more. When a file is created, the metadata of the file is stored in an inode (or inode number). Each file has a unique inode number that is used internally by the file system. When accessing a file, the system first seraches inode table for the unique inode number. With the information from the inode, the file can be found and accessed.

We usually don’t see the inode reach to its limit quite often. Use df -i command can help to identify inode limit issue. Here is the result after running df -i command.

$ df -h -i /u01
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/VGExaDb-LVDbOra1
                         13M     13M       0  100% /u01

Obviously we used up 100% of all 13M inodes in /u01 file system.

At this moment, I know where Oracle database can generate many files. It’s audit directory. Run the following command to find out space usage for audit folder and run for almost 6 minutes without the result back. Had to kill the process.

Then use another way to find out the top 10 usage. As expected, audit directory took a lot of space there with over 50G.

$ cd /u01/app/oracle/product/11.2.0.4/dbhome_1
$ du -a . | sort -n -r | head -n 10
57566392        .
52851540        ./rdbms
52722128        ./rdbms/audit
788268  ./lib
768116  ./bin
559916  ./owb
335520  ./oc4j
309500  ./assistants
303920  ./assistants/dbca
303596  ./ctx

There is another way to find out whether the directory is big. Run ls command from parent directory.

$ cd /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms
$ ls -l
total 1039920
drwxr-xr-x 2 oracle oinstall      49152 Aug 22 10:33 admin
drwxr-xr-x 2 oracle oinstall 1063747584 Oct 20 19:20 audit
drwxr-xr-x 2 oracle oinstall       4096 Jun 27 14:47 demo
drwxr-xr-x 2 oracle oinstall       4096 Jun 27 14:47 doc
drwxr-xr-x 5 oracle oinstall       4096 Jun 27 14:48 install
drwxr-xr-x 2 oracle oinstall       4096 Jun 27 14:47 jlib
drwxr-xr-x 2 oracle oinstall       4096 Jul 31 11:45 lib
drwxr-xr-x 2 oracle oinstall       4096 Aug 22 18:30 log
drwxr-xr-x 2 oracle oinstall       4096 Jun 27 14:47 mesg
drwxr-xr-x 2 oracle oinstall       4096 Jun 27 14:47 public
drwxr-xr-x 5 oracle oinstall       4096 Jun 27 14:46 xml

So the solution seems easy. Just remove aud file under the audit directory. Then I run rm command. It runs for a few minutes and finally gave an error message below.

$ rm *.aud
-bash: /bin/rm: Argument list too long

It seems having a lot of files in this directories and I would like to find out the total number of files in the directory. Tried to run ls -l | wc -l to get file count. It had never finished and taken forever to run. The reason why ls -l is so slow is that by default, ls command sorts the file alphabetically. So if you’re interested in listing some files quickly, you could use ls -f | head -100 command to get a list of files.

At this moment, I had to use rm -rf from its parent directory and it worked. Even with this method, it took more than 8 hours to completed.

In the middle of this delete process, I stopped the delete process after the free inode reached to 1%. As I have some free inodes available for other processes and don’t worry about this space issue in a short time, I would like to find out how long it take to calculate the space usage under audit directory and how many audit files under the same directory.

Here are the results:

$ cd /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms
$ df -i /u01
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/VGExaDb-LVDbOra1
                     13107200 12955396  151804   99% /u01

$ time du -khs /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/audit
50G     /u01/app/oracle/product/11.2.0.4/dbhome_1/rdbms/audit

real    26m6.220s
user    0m5.299s
sys     2m25.058s

$ time ls -l | wc -l
12839426

real    29m16.458s
user    2m47.544s
sys     2m9.860s

The above result shows it took 29 minutes to find out 12.8 million files in the directory and 26 minutes to get the space usage for the audit directory. So basically forget about the idea to use ls -l command when there are millions of files in a directory.

There are a few other useful command to find out inode information.

$ ls -i
4653873 admin  4949095 demo  4653875 install  4653881 lib  4653883 mesg    4653885 xml
4734977 audit  4653874 doc   4653880 jlib     4653882 log  4653884 public

$ stat audit
  File: `audit'
  Size: 1063747584      Blocks: 2079672    IO Block: 4096   directory
Device: fc03h/64515d    Inode: 4734977     Links: 2
Access: (0755/drwxr-xr-x)  Uid: ( 4085/  oracle)   Gid: ( 1140/oinstall)
Access: 2014-10-21 14:39:39.000000000 -0400
Modify: 2014-10-21 17:30:32.000000000 -0400
Change: 2014-10-21 17:30:32.000000000 -0400

As ls -l command is painfully slow to list files in a directory with millions of files, there are other ways to retrieve the files faster.

find . -type f -printf ‘%T+ %p\n’ | sort -r | head -100
This commands finds all files in the current directory, lists them from newest to oldest, then prints only the first 100 files.

find . -type f -mtime -3 -printf ‘%T+ %p\n’ | sort -r | head -100
This commands finds only files created in the last day (-mtime -3), and sorts only these files.

find . -type f -mmin -20 -printf ‘%T+ %p\n’ | sort -r | head -100
For finer control, this command selects only files created/modified less than 20 minutes ago.

Different Results from Data Guard’s Show Configuration Command

cat_lion_mirror
Recently I built a Data Guard environment on two Exadatas with three RAC databases and did a lot of tests. The Show Configuration is probably the most frequent command I used in DG Broker.

When running show configuration from dgmgrl, we usually see the same result no matter where the command is executed, primary or any standby databases. During one switchover test, I run into a weird situation. The show configuration command can return me three different results from one primary database and two standby databases, just like above the image above (cat changes into a lion from the mirror). Here are the result:

Primary Database (wzxdb)

DGMGRL> show configuration
Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzsdb - Physical standby database
	  Error: ORA-16664: unable to receive the result from a database

	wzpdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR 

Checked the DG Broker log at /u01/app/oracle/diag/rdbms/wzxdb/wzkdb1/trace/drcwzkdb1.log, it has something below

07/16/2014 09:30:20
Site wzsdb returned ORA-16664.
Data Guard Broker Status Summary:
  Type                        Name              Severity  Status
  Configuration               DG_Config         Warning  ORA-16607
  Primary Database            wzxdb             Success  ORA-00000
  Physical Standby Database   wzsdb               Error  ORA-16664
  Physical Standby Database   wzpdb             Success  ORA-00000

Let’s continue to check status for the standby databases.
1st Standby Database, wzpdb

DGMGRL> show configuration
Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzsdb - Physical standby database
	wzpdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

2nd Standby Database, wzsdb

DGMGRL> show configuration
Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzsdb - Physical standby database
	wzpdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ORA-16501: the Data Guard broker operation failed
ORA-16625: cannot reach database "wzxdb"
DGM-17017: unable to determine configuration status	

The first thing I checked whether Data Guard replication was still working or not. Did a few switch logfile from primary and can see the logs were replicated to two standby databases. Verified data guard related parameters, tnsnames and listener entries in all databases. Found no issue there. At this moment, I narrowed down the issue to DG Broker and suspect it could relate to DG Broker configuration. After a few tries, I found a solution to fix this issue.
1. On primary db (wzxdb), remove the database wzsdb from DG Broker configuration, then add it back.
2. On standby db (wzsdb), bounce the database.

Here are the detail steps:

Primary Database (wzxdb)
DGMGRL> remove database wzsdb
Removed database "wzsdb" from the configuration
DGMGRL> add database wzsdb as connect identifier is wzsdb;
Database "wzsdb" added
DGMGRL> enable configuration
Enabled.
DGMGRL> show configuration

Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzpdb - Physical standby database
	wzsdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

After fixing the issue in primary database, let’s goto the standby database with issues. It still has the same error from show configuration command. So I went ahead bouncing the database.
srvctl stop database -d wzsdb
srvctl start database -d wzsdb

DGMGRL> show configuration

Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
	wzxdb - Primary database
	wzpdb - Physical standby database
	wzsdb - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Here is part of the content from data guard broker log on this standby database.

07/16/2014 09:29:58
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:31:09
REMOVE DATABASE wzsdb [PRESERVE DESTINATIONS]
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:31:37
DMON Deregistering service wzsdb_DGB with listener(s)
07/16/2014 09:32:12
DMON Registering service wzsdb_DGB with listener(s)
07/16/2014 09:32:15
Apply Instance for Database wzsdb set to wzdb1
07/16/2014 09:32:19
Failed to send message to site wzxdb. Error code is ORA-16501.
Command EDIT DATABASE wzsdb SET PROPERTY ActualApplyInstance = wzdb1 completed
07/16/2014 09:32:29
Command ENABLE CONFIGURATION completed
07/16/2014 09:32:48
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:33:04
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:33:23
Failed to connect to remote database wzxdb. Error is ORA-12154
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:34:15
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:35:30
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:36:45
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:38:00
Failed to send message to site wzxdb. Error code is ORA-16501.
07/16/2014 09:38:27
Data Guard Broker shutting down
RSM0 successfully terminated
07/16/2014 09:38:29
>> DMON Process Shutdown <<
07/16/2014 09:39:18
>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
	  dg_broker_config_file1 = "+RECO/wzsdb/dataguardconfig/dgb_config02.ora"
	  dg_broker_config_file2 = "+DATA/wzsdb/dataguardconfig/dgb_config01.ora"
07/16/2014 09:39:22
DMON Registering service wzsdb_DGB with listener(s)
Broker Configuration:       "DG_Config"
	  Protection Mode:            Maximum Performance
	  Fast-Start Failover (FSFO): Disabled, flags=0x0, version=0
	  Primary Database:           wzxdb (0x03010000)
	  Standby Database:           wzsdb, Enabled Physical Standby (0x01010000)
	  Standby Database:           wzpdb, Enabled Physical Standby (0x02010000)
07/16/2014 09:39:25
wzsdb version check successfully completed
wzsdb has current configuration metadata,
	  completing bootstrap
Creating process RSM0
07/16/2014 09:39:28
Apply Instance for Database wzsdb set to wzdb1
07/16/2014 09:39:37 

We can see it seems the standby database received commands like REMOVE DATABASE wzsdb and ENABLE CONFIGURATION from primary DG Broker, but just can not send the message back to primary database. After bouncing the standby database, it returned normal and can communicate back to primary database.

Finally, all databases have this SUCCESS status no matter where I run the show configuration command.

Script to identify the restore/recover point for archive logs

I did some work on an interesting project to keep a standby database in sync with a production primary database manually. This is not a true standby database as the primary database does not communicate with this standby database. Due to certain reason, we could not configure Data Guard to allow data replication between these two database. So no way to do the redo log shipping like we do in Data Guard environment. What I mean manually is we take the archivelog backup from the previous day, restore and recover to this standby database. As this database is a VLDB, the volume of daily archive log files is in size of multi-terabyte. We use an Exadata X-4 full rack to host this standby database. Even with restoring using all db nodes and 200+ channels, it still take several hours for restore only. And similar timing in recovering these archive logs. Not mention the time copying file between two data centers. It takes a lot of efforts to keep up with production primary database and reduce the lag between these two databases.

The benefit doing this manually is the minimum impact in current production environment. The only overhead on production db is when copying files to Exadata. The impact is quite low. We scp rman backup pieces using all 8 db nodes to maximize the utilization of band width.

One major task during this restore and recover is to identify the correct restore and recover point from the daily rman backupset for the archive logs. Identify the right recover point, different people might have different opinions. Just like the image below, how many bars can you see, three or four?

three_or_four_bars

There are many blogs and articles discussing the way to identify the correct restore and recover points. The majority of people like to use v$archived_log view to get the recover point. In my scenario, it did not work well as I can get the correct recovery point only after I restore all the archive logfiles. What I want is after cataloging the rman backup piece, what are last applied archive log sequence for each thread, and what are my next recover point for the current rman backup pieces that are just cataloged.

Using both v$archived_log and v$backup_archivelog_details views,  I created a script that can help me answer all the questions I have.
1. The restore commands can be used for each thread
2. The last applied archive log sequence for each thread, also include the timestamp and next change SCN#
3. The last possible recover point for each thread for the cataloged rman backup pieces
4. The recover command

The script is listed as follows:

db_arc_seq_range.sql

col "Restore Command" for a100
col "Applied Logs" for a100
col "Catalog Logs" for a100
col "Recover Command" for a80
select ' restore archivelog from logseq ' || applied_arc.startNo || ' until logseq ' || catalog_arc.endNo || ' thread=' || catalog_arc.thread# || ';' "Restore Command"
from
--(select thread#,max(sequence#) + 1 startNo from gv$archived_log where applied='YES' group by thread#) applied_arc,
(select thread#,max(sequence#) startNo from gv$archived_log where applied='YES' group by thread#) applied_arc,
(select thread#, max(sequence#) endNo from v$backup_archivelog_details group by thread#) catalog_arc
where applied_arc.thread# = catalog_arc.thread#;

prompt '=========== Archive Log Info ============='
select distinct 'Thread ' || thread# || ': last applied archive log ' || sequence# || ' at ' || to_char(next_time, 'MON/DD/YYYY HH24:MI:SS') || ' next change# ' || next_change# "Applied Logs"
from v$archived_log
where thread# || '_' || sequence# in
(select thread# || '_' || max(sequence#) from v$archived_log where applied='YES' group by thread#)
--and applied='YES'
;
select 'Thread ' || thread# || ': last cataloged archive log ' || sequence# || ' at ' || to_char(next_time, 'MON/DD/YYYY HH24:MI:SS') || ' next change# ' || next_change# "Catalog Logs"
from v$backup_archivelog_details
where thread# || '_' || sequence# in
(select thread# || '_' || max(sequence#) from v$backup_archivelog_details group by thread#)
;

prompt '=========== recover point ================'
--select 'recover database until sequence ' || seq# || ' thread ' || thread# || ' delete archivelog maxsize 4000g; ' Content
select 'set until sequence ' || seq# || ' thread ' || thread# || '; ' || chr(13)|| chr(10) || 'recover database delete archivelog maxsize 4000g; ' "Recover Command"
from (
select * from (
select thread#, sequence# + 1 seq#, next_change# from (
select * from v$backup_archivelog_details
where thread# || '_' || sequence# in
(select thread# || '_' || max(sequence#) from v$backup_archivelog_details group by thread#)
)
order by next_change#
)
where
rownum = 1
)
;

The following example shows the execution of the script.

SYS&gt; <strong>@db_arc_seq_range</strong>
Restore Command
----------------------------------------------------------------------------------------------------
restore archivelog from logseq 323498 until logseq 324015 thread=1;
restore archivelog from logseq 351250 until logseq 351828 thread=2;
restore archivelog from logseq 308766 until logseq 309396 thread=3;
restore archivelog from logseq 345805 until logseq 346271 thread=4;
restore archivelog from logseq 629650 until logseq 630749 thread=5;
restore archivelog from logseq 502202 until logseq 502899 thread=6;

6 rows selected.

'=========== Archive Log Info ============='

Applied Logs
----------------------------------------------------------------------------------------------------
Thread 1: last applied archive log 323498 at SEP/16/2014 22:41:27 next change# 10900757473229
Thread 2: last applied archive log 351250 at SEP/16/2014 22:41:28 next change# 10900757476463
Thread 3: last applied archive log 308766 at SEP/16/2014 22:44:30 next change# 10900759270706
Thread 4: last applied archive log 345805 at SEP/16/2014 22:43:42 next change# 10900758591989
Thread 5: last applied archive log 629650 at SEP/16/2014 22:43:39 next change# 10900758575645
Thread 6: last applied archive log 502202 at SEP/16/2014 22:42:06 next change# 10900757720611

6 rows selected.
Catalog Logs
----------------------------------------------------------------------------------------------------
Thread 1: last cataloged archive log 324015 at SEP/17/2014 23:12:31 next change# 10902104866903
Thread 2: last cataloged archive log 351828 at SEP/17/2014 23:12:31 next change# 10902104871653
Thread 3: last cataloged archive log 309396 at SEP/17/2014 23:12:29 next change# 10902104850854
Thread 4: last cataloged archive log 346271 at SEP/17/2014 23:12:30 next change# 10902104860405
Thread 5: last cataloged archive log 630749 at SEP/17/2014 23:12:30 next change# 10902104862135
Thread 6: last cataloged archive log 502899 at SEP/17/2014 23:12:32 next change# 10902104879394

6 rows selected.

'=========== recover point ================'

Recover Command
--------------------------------------------------------------------------------
set until sequence 309397 thread 3;
recover database delete archivelog maxsize 4000g;

It shows that we completed the restore and recover of Sep. 16’s archive logs and the recover point for Sep. 17’s archive logs is sequence 309397 thread 3.