Oracle Database Appliance (ODA) is the entry level of Oracle Engineering System. It is like Exadata, but it does not have cell nodes and InfiniBand. It does have two db nodes and a shared storage with 24 disks. Both ODA and Exadata are using the same Grid Infrastructure to manage ASM across multiple database nodes. As the ASM db files can be accessed by all db nodes, usually companies are implementing RAC or RAC ONE options to handle the failover in case of hardware failure. As both ODA and Exadata can support single instance database, it makes me wonder how the failover works and how I can bring up the database on a different node in case there is a hardware failure or a failure show above.
As many people know, the failover on RAC and RAC ONE is super easy and you don’t need do anything if the db node is down, the db can continue to run on the other db node. The difference for RAC and RAC ONE is that RAC perform hot failover while RAC ONE perform cold failover. I wouldn’t expect single instance or non-rac db can do the hot failover, but some of kind of cold failover is fine with me. It’s better than nothing.
At first, I thought just shutdown the db and created some necessary directories, add an entry to oratab file on another node should work. I remember reading one article about this procedure long time ago by changing certain db parameters from the pfile, and bring up the db by using the modified pfile. Search the internet and could not find it anymore. The most closed article I can find is, How To Setup ASM (11.2) On An Active/Passive Cluster (Non-RAC). (Doc ID 1296124.1). The document discusses a lot of steps to install and configure GI on the 2nd db node, so I basically can skip those steps. When I worked on the most important step, srvctl add database, I got the following errors:
oracle:testdb@odap2 ~ > /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/srvctl add database -d testdb -n odap2 -o /u01/app/oracle/product/11.2.0.3/dbhome_1 -p +data/TESTDB/spfiletestdb.ora -s OPEN -y AUTOMATIC -a DATA,RECO,REDO -t IMMEDIATE
PRCS-1007 : Server pool testdb already exists
PRCR-1086 : server pool ora.testdb is already registered
It looks weird that this db exists on db node 2 even before I created it. Later I realized the srvctl command can see the resource across all db nodes. Obviously this document is not going to work for me. So let me try another way. What if I shutdown the db on node 1 and bring it up on node 2 directly. Wrong! After I did that, I got many errors during mount stage as follows:
SQL> startup nomount
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2235208 bytes
Variable Size 838862008 bytes
Database Buffers 3388997632 bytes
Redo Buffers 45686784 bytes
SQL> alter database mount
2 ;
alter database mount
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 2838
Session ID: 584 Serial number: 1
The alert log also shows the following errors:
Thu Sep 05 19:56:38 2013
alter database mount
Thu Sep 05 19:56:38 2013
NOTE: Loaded library: System
ORA-15025: could not open disk “/dev/mapper/SSD_E0_S20_805849611p1”
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk “/dev/mapper/SSD_E0_S21_805849344p1”
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk “/dev/mapper/SSD_E0_S22_805849349p1”
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk “/dev/mapper/SSD_E0_S23_805849238p1”
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
Thu Sep 05 19:56:38 2013
SUCCESS: diskgroup REDO was mounted
Thu Sep 05 19:56:38 2013
NOTE: dependency between database testdb and diskgroup resource ora.REDO.dg is established
Errors in file /u01/app/oracle/diag/rdbms/testdb/testdb/trace/testdb_ckpt_2813.trc (incident=24145):
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/testdb/testdb/incident/incdir_24145/testdb_ckpt_2813_i24145.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/testdb/testdb/trace/testdb_ckpt_2813.trc (incident=24146):
ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/testdb/testdb/incident/incdir_24146/testdb_ckpt_2813_i24146.trc
Out of ideas, I posted my question to Enkitec‘s internal technical distribution email list. My colleague, Andy Colvin, came back with an interesting idea. He recommended to use srvctl modify database command to tell the OCR to use the other node. After that, just run srvctl start database to start the database. It is an interesting idea and seem make sense to me. However, when I checked out the Oracle document about srvctl modify database, there is no parameter that can specify the hostname. The srvctl is used to use -n to specify hostname in many occasion. For this command, -n means db name. Here is the syntax about this command from the document.
srvctl modify database -d db_unique_name [-o oracle_home] [-u oracle_user]
[-m domain_name] [-n db_name] [-p spfile] [-s start_options]
[-t stop_options] [-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY |
SNAPSHOT_STANDBY}] [-y {automatic | manual}] [-a disk_group_list] [-z]
It looks weird to me and I pretty sure it must have some ways to specify hostname somewhere when creating the db. Luckily, whenever I created a db using dbca, I always save a copy of the creation script. So I run the following to search the creation scripts.
oracle:testdb@odap1 scripts> cd /u01/app/oracle/admin/testdb/scripts
oracle:testdb@odap1 scripts> grep odap1 *
testdb.sql:host /u01/app/oracle/product/11.2.0.3/dbhome_1/bin/srvctl add database -d testdb -o /u01/app/oracle/product/11.2.0.3/dbhome_1 -p +DATA/testdb/spfiletestdb.ora -n testdb -x odap1 -a “DATA,RECO”
Ah, here is the answer, it does have undocument parameter -x that can specify the hostname. So I did the following:
/u01/app/oracle/product/11.2.0.3/dbhome_1/bin/srvctl modify database -d testdb -o /u01/app/oracle/product/11.2.0.3/dbhome_1 -p +DATA/testdb/spfiletestdb.ora -n testdb -x odap2 -a “DATA,RECO”
Then the database can be brought up by using sqlplus on the 2nd db node.
Sounds like the end of story. Later on, another colleague, Frits Hoogland, brought up a even nicer and cleaner solution than what I did. He mentioned the approach can be used on Exadata. Basically on 2nd db node, I don’t need to do make change in oratab file, and create directories, but must use srvctl command instead of sqlplus to do the work. Another interesting idea, so I reproduced his recommendation and it works easily without any issue. To make the example to use for different scenarios, I created a db on node 1 with the following information:
db name : mytestdb
db unquie name : mytestprd
service : mytest
Here are the steps:
Step 1. Verificate some information about the db and directories
on node 1
oracle:mytestdb@odap1 ~> srvctl config database -d mytestprd
Database unique name: mytestprd
Database name: mytestdb
Oracle home: /u01/app/oracle/product/11.2.0.3/dbhome_1
Oracle user: oracle
Spfile: +DATA/mytestprd/spfilemytestdb.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: mytestprd
Database instance: mytestdb
Disk Groups: DATA,RECO,REDO
Mount point paths:
Services: mytest
Type: SINGLE
Database is administrator managed
SYS@mytestdb> show parameter name NAME TYPE VALUE --------------------------- ----------- ------------------------------ db_file_name_convert string db_name string mytestdb db_unique_name string mytestprd global_names boolean TRUE instance_name string mytestdb lock_name_space string log_file_name_convert string processor_group_name string service_names string mytest
On db node 2, check the entry and db directories exist or not. Of course, they should not exit.
oracle:testdb@odap2 ~> grep mytest /etc/oratab
oracle:testdb@odap2 ~> ls -l /u01/app/oracle/diag/rdbms/mytestprd/mytestdb/trace
ls: /u01/app/oracle/diag/rdbms/mytestprd/mytestdb/trace: No such file or directory
oracle:testdb@odap2 ~> ls -l /u01/app/oracle/admin/mytestprd/adump
ls: /u01/app/oracle/admin/mytestprd/adump: No such file or directory
Step 2. Configure the db to move from db node 1 to node 2
oracle:testdb@odap2 ~> srvctl status database -d mytestprd
Instance mytestdb is running on node odap1
oracle:testdb@odap2 ~> srvctl status service -d mytestprd -s mytest
Service mytest is running on instance(s) mytestdb
oracle:testdb@odap2 ~> srvctl stop database -d mytestprd
oracle:testdb@odap2 ~> srvctl modify database -d mytestprd -x odap2
oracle:testdb@odap2 ~> srvctl start database -d mytestprd
oracle:testdb@odap2 ~> srvctl status database -d mytestprd
Instance mytestdb is running on node odap2
oracle:testdb@odap2 ~> srvctl status service -d mytestprd -s mytest
Service mytest is running on instance(s) mytestdb
oracle:testdb@odap2 ~> srvctl config database -d mytestprd
Database unique name: mytestprd
Database name: mytestdb
Oracle home: /u01/app/oracle/product/11.2.0.3/dbhome_1
Oracle user: oracle
Spfile: +DATA/mytestprd/spfilemytestdb.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: mytestprd
Database instance: mytestdb
Disk Groups: DATA,RECO,REDO
Mount point paths:
Services: mytest
Type: SINGLE
Database is administrator managed
[/code]
Right now, we can verify whether the oratab entry and directories exist or not.
verify the entry in oratab file
oracle:testdb@odap2 ~> grep mytest /etc/oratab mytestdb:/u01/app/oracle/product/11.2.0.3/dbhome_1:N # line added by Agent oracle:testdb@odap2 ~> ls -l /u01/app/oracle/diag/rdbms/mytestprd/mytestdb/trace total 160 -rw-r----- 1 oracle asmadmin 10037 Sep 7 10:20 alert_mytestdb.log -rw-r----- 1 oracle asmadmin 893 Sep 7 10:20 mytestdb_asmb_31403.trc -rw-r----- 1 oracle asmadmin 64 Sep 7 10:20 mytestdb_asmb_31403.trm -rw-r----- 1 oracle asmadmin 1062 Sep 7 10:20 mytestdb_cjq0_31545.trc .... oracle:testdb@odap2 ~> ls -l /u01/app/oracle/admin/mytestprd/adump total 32 -rw-r----- 1 oracle asmadmin 796 Sep 7 10:20 mytestdb_ora_31064_2.aud -rw-r----- 1 oracle asmadmin 2600 Sep 7 10:20 mytestdb_ora_31421_1.aud ....
Sqlplus command also show the db is relocated to 2nd db node
SQL> col host_name for a20 select instance_name, host_name from v$instance; SQL> INSTANCE_NAME HOST_NAME ---------------- -------------------- mytestdb odap2
Let’s do some check under $ORACLE_HOME/dbs directory on node 2.
oracle:mytestdb@odap2 dbs> ls -l *mytestdb* -rw-rw---- 1 oracle asmadmin 1544 Sep 7 10:20 hc_mytestdb.dat -rw-r--r-- 1 oracle oinstall 67 Sep 7 10:19 initmytestdb.ora oracle:mytestdb@odap2 dbs> cat initmytestdb.ora SPFILE='+DATA/mytestprd/spfilemytestdb.ora' # line added by Agent
Check out the same $ORACLE_HOME/dbs directory on node 1
oracle:mytestdb@odap1 dbs> ls -l *mytest* -rw-rw---- 1 oracle asmadmin 1544 Sep 7 10:19 hc_mytestdb.dat -rw-r----- 1 oracle oinstall 44 Sep 6 09:12 initmytestdb.ora -rw-r----- 1 oracle oinstall 1536 Sep 6 09:05 orapwmytestdb
You can notice that the password file is missing on the node 2. So do the copy work from db node 1.
oracle:mytestdb@odap2 dbs> cd $ORACLE_HOME/dbs oracle:mytestdb@odap2 dbs> scp -p oracle@odap1:$PWD/orapwmytestdb . orapwmytestdb 100% 1536 1.5KB/s 00:00 oracle:mytestdb@odap2 dbs> ls -lart *mytest* -rw-r----- 1 oracle oinstall 1536 Sep 6 09:05 orapwmytestdb -rw-r--r-- 1 oracle oinstall 67 Sep 7 10:19 initmytestdb.ora -rw-rw---- 1 oracle asmadmin 1544 Sep 7 10:20 hc_mytestdb.dat
Run the crsstat command, it also shows the right result.
oracle:mytestdb@odap2 dbs> crsstat Resource Name Resource Type Target State Node FC RC State Details ------------------------ ---------------- ---------- ------------ --------------- -- -- ------ ora.DATA.dg diskgroup C ONLINE ONLINE odap1 0 0 ora.DATA.dg diskgroup C ONLINE ONLINE odap2 0 0 ora.LISTENER.lsnr Listener C ONLINE ONLINE odap1 0 0 ora.LISTENER.lsnr Listener C ONLINE ONLINE odap2 0 0 ora.LISTENER_SCAN1.lsnr SCAN Listener L ONLINE ONLINE odap1 0 0 ora.LISTENER_SCAN2.lsnr SCAN Listener L ONLINE ONLINE odap2 0 0 ora.RECO.dg diskgroup C ONLINE ONLINE odap1 0 0 ora.RECO.dg diskgroup C ONLINE ONLINE odap2 0 0 ora.REDO.dg diskgroup C ONLINE ONLINE odap1 0 0 ora.REDO.dg diskgroup C ONLINE ONLINE odap2 0 0 ora.asm ASM C ONLINE ONLINE odap1 0 0 Started ora.asm ASM C ONLINE ONLINE odap2 0 0 Started ora.cvu cvu L ONLINE ONLINE odap2 0 0 ora.dbm.db database L ONLINE ONLINE odap1 0 0 Open ora.odap1.vip Cluster VIP L ONLINE ONLINE odap1 0 0 ora.odap2.vip Cluster VIP L ONLINE ONLINE odap2 0 0 ora.gsd Gbl Svc Daemon C OFFLINE OFFLINE (odap1) 0 0 ora.gsd Gbl Svc Daemon C OFFLINE OFFLINE (odap2) 0 0 ora.mytestprd.db database L ONLINE ONLINE odap2 0 0 Open ora.mytestprd.mytest.svc service L ONLINE ONLINE odap2 0 0 ora.net1.network Network (VIP) C ONLINE ONLINE odap1 0 0 ora.net1.network Network (VIP) C ONLINE ONLINE odap2 0 0 ora.oc4j OC4J L ONLINE ONLINE odap2 0 0 ora.ons Ora Notif Svc C ONLINE ONLINE odap1 0 0 ora.ons Ora Notif Svc C ONLINE ONLINE odap2 0 0 ora.reco.acfsvol.acfs acfs C ONLINE ONLINE odap1 0 0 mounted on /cloudfs ora.reco.acfsvol.acfs acfs C ONLINE ONLINE odap2 0 0 mounted on /cloudfs ora.registry.acfs registry C ONLINE ONLINE odap1 0 0 ora.registry.acfs registry C ONLINE ONLINE odap2 0 0 ora.scan1.vip SCAN VIP L ONLINE ONLINE odap1 0 0 ora.scan2.vip SCAN VIP L ONLINE ONLINE odap2 0 0 ora.testdb.db database L ONLINE ONLINE odap2 0 0 Open ora.testwz.db database L ONLINE ONLINE odap1 0 0 Open
After the db is relocated to node 2, I want to do some testings to try out some interesting scenarios
Scenario 1. While db is running on node 2, what’s happened if I try to bring it up on node 1?
Of course, everyone know it should not work for single-instance db. But just want to see what is the result.
SYS@mytestdb> startup nomount
ORA-00304: requested INSTANCE_NUMBER is busy
As I expected, the db can not start. Otherwise, who is going to buy Oracle RAC license!
Scenario 2. what happened if I stop the db from node 2 and start the db on node 1 using sqlplus?
On node 1
oracle:mytestdb@odap1 dbs> srvctl stop database -d mytestdb
oracle:mytestdb@odap1 dbs> sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Sep 7 10:44:56 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SYS@mytestdb> startup nomount
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2235208 bytes
Variable Size 822084792 bytes
Database Buffers 3422552064 bytes
Redo Buffers 28909568 bytes
SYS@mytestdb> alter database mount;
Database altered.
SYS@mytestdb> alter database open;
Database altered.
SYS@mytestdb> col host_name for a20
select instance_name, host_name from v$instance;
INSTANCE_NAME HOST_NAME
—————- ——————–
mytestdb odap1
It’s an interesting result as I am expecting it can not start on the node 1. So what’s happened if shutdown the db and start the db using srvctl on db node 1.
SYS@mytestdb> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SYS@mytestdb> quit
oracle:mytestdb@odap1 dbs> srvctl start database -d mytestprd
oracle:mytestdb@odap1 dbs> srvctl status database -d mytestprd
Instance mytestdb is running on node odap2
oracle:mytestdb@odap1 dbs> srvctl status service -d mytestprd -s mytest
Service mytest is running on instance(s) mytestdb
Not surprise. The db and service were started on db node 2 as expected.
Scenario 3. Based on the scenario 2 with db startup on node 1, can I start service on node 1 as well?
oracle:mytestdb@odap1 dbs> sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sat Sep 7 10:59:20 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SYS@mytestdb> startup
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2235208 bytes
Variable Size 822084792 bytes
Database Buffers 3422552064 bytes
Redo Buffers 28909568 bytes
Database mounted.
Database opened.
SYS@mytestdb> quit
oracle:mytestdb@odap1 dbs> srvctl status database -d mytestprd
Instance mytestdb is not running on node odap2
oracle:mytestdb@odap1 dbs> srvctl status service -d mytestprd -s mytest
Service mytest is not running.
oracle:mytestdb@odap1 dbs> srvctl start service -d mytestprd -s mytest
PRCD-1084 : Failed to start service mytest
PRCR-1079 : Failed to start resource ora.mytestprd.mytest.svc
CRS-5017: The resource action “ora.mytestprd.db start” encountered the following error:
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to “(:CLSN00107:)” in “/u01/app/11.2.0.3/grid/log/odap2/agent/crsd/oraagent_oracle/oraagent_oracle.log”.
CRS-2674: Start of ‘ora.mytestprd.db’ on ‘odap2’ failed
CRS-2632: There are no more servers to try to place resource ‘ora.mytestprd.mytest.svc’ on that would satisfy its placement policy
oracle:mytestdb@odap1 dbs> ls -l /u01/app/11.2.0.3/grid/log/odap2/agent/crsd/oraagent_oracle/oraagent_oracle.log
ls: /u01/app/11.2.0.3/grid/log/odap2/agent/crsd/oraagent_oracle/oraagent_oracle.log: No such file or directory
There is no odap2 directory under /u01/app/11.2.0.3/grid/log on node 1. It is on db node 2.
oracle:mytestdb@odap1 dbs> ls -l /u01/app/11.2.0.3/grid/log
total 12
drwxr-xr-x 2 grid oinstall 4096 Apr 5 22:29 crs
drwxr-xr-t 25 root oinstall 4096 Jul 1 12:10 odap1
drwxrwxr-x 5 grid asmadmin 4096 Jul 1 12:44 diag
oracle:mytestdb@odap1 dbs> crsstat | egrep 'Target|mytest' Resource Name Resource Type Target State Node FC RC State Details ora.mytestprd.db database L ONLINE OFFLINE (odap2) 0 0 Instance Shutdown ora.mytestprd.mytest.svc service L ONLINE OFFLINE (odap2) 0 0
Therefore, from the above test case, once configured the db starting up on a different node. You can bring up the db from either database nodes using sqlplus. However, if you are using service name, you have to start db from the server you associate db with by the “srvctl modify database” command. In general, this trick can be used in non-RAC environment on ODA, Exadata and any cluster that is using the same GI across multiple nodes.
You must be logged in to post a comment.