Replace Cisco Ethernet Switch on Exadata


Usually there is no need to replace Cisco switch on Exadata. However, certain enterprises might have their own standards to use different switch as part of enterprise standard. In this case, the Cisco ethernet switch on Exadata will be replaced. Oracle Support has a nice document about the process to replace Cisco switch, How to replace a Cisco ethernet switch in an Engineered Systems rack (Doc ID 1531203.1). This document is a good one, but focuses only on steps of replacing Cisco Switch, not enough to specify whether additional steps that need to be performed.

At first, I thought Cisco switch only affected the traffic on Management Network on Exadata and don’t have to shut down database and cell nodes. After discussing with my colleague, Andy Colvin, he brought some good points. Although it is not required to shut down the system, there will be no way to get into any of the components via SSH. Furthermore the storage servers will lose connectivity to DNS, which will have adverse consequences on performance. With so many network cables moving around, it would definitely be easier to shut down the entire system and replace the switch. Yes, that makes sense. Here are the high level steps to replace Cisco Switch.
1. Shutdown database nodes
2. Shutdown cell nodes
3. Flip off the switches on PDUs to make sure everything is down.
4. Replace the Cisco switch
5. Turn on PDUs and verify new Ethernet switch
6. Start cell nodes
7. Start database nodes.

Here are the detail steps.
Step 1. Shudown database nodes

1) Logon as oracle user to db node 1 and source one db env, get the status of the database.

crsctl status res -­t | more

check status of oracle instances.
ps -ef |grep pmon

The above steps are optional. Just to make sure all databases are running normal. If seeing issues in database, you might want to resolve it first before replacing Cisco switch. You don’t add the complexity of issues in the middle of switch changes.

2) Stop all the database currently running on Exadata by using srvctl command.
srvctl stop database -d yourdbname

3) Logon as root user to db node 1 and stop crs on the current node.
/u01/app/11.2/grid/bin/crsctl stop crs

During the shutdown process of CRS, run the following command regularly to check the number of oracle processes. It should reduce to 0 when CRS is stopped.
ps -ef|grep d.bin|grep -v grep|wc -l

4) Verify all oracle databases are shut down.
ps -­ef | grep pmon

5) Power off the node
Logon to ILOM to Power Off Server
shutdown -h -y now

6) Repeat the above steps for the rest of database nodes

Step 2. Shutdown cell nodes
1) Logon as root user to cell node 1

2) Check cell service, verify the disk status
service celld status
cellcli -e list cell detail

3) Verify disk status and they should be ONLINE, not SYNCING state
cellcli -e list griddisk attributes name,asmmodestatus

4) Stop the cell service
service celld stop
cellcli -e alter cell shutdown services all

5) Verify the cell services are down
service celld status

6) Logon to ILOM
Logon to ILOM to Power Off Server
shutdown -h -y now

7) Repeat the same process for the rest of cell nodes

Step 3. Turn off the PDUs
There is no power button on IB switches. As long as PDUs is on, the IB switches are always on. Before turn off the PDUS, verify ILOM for IB switch is working.

Login to IB switches using ILOM to verify it you can login from there.

If ILOM for IB switches is working, flip off the switch on PDUs

Step 4. Replace the Cisco Switch
Use Oracle Support document to replace the switch
How to replace a Cisco ethernet switch in an Engineered Systems rack (Doc ID 1531203.1)

Step 5. Turn on PDU and verify accessibility to/from IB switches

1) Turn on PDUs
After turning on PDU, the IB switches are automatically starts. Make sure to give a few minutes to allow IB switches fully boot up before doing anything.

2) Verify the IB switch
To verify IB switch is ok, run the following command as root user on IB switch

3) Verify the network connectivity to/from the IB switch. You don’t want start cell nodes if you know you have connectivity issues from/to IB switches. There is no nslookup command on IB switch. So you have to use ping command to figure out whether DNS is working or not on IB switches.
a. First ping IB switch and ssh to it as root user

b. After login, ping a server outside Exadata by hostname. It should work.

c. Then ping a db node and a cell node by hostname

d. Finally, login to IB switch using ILOM to verify it you can login from there

Step 6. Start the cell nodes
1) Verify you can access ALL cell nodes from ILOM

2) From ILOM, boot up the cell node, monitor the progress from remote console

3) ssh to cell node as root user, this is to verify NET0 connection is working

4) Verify all cell services are up
service celld status
cellcli -e list cell detail

5) Verify all disks are from SYNCING state to ONLINE state
cellcli -e list griddisk attributes name, asmmodestatus

6) Wait until all cell nodes’ disks showing ONLINE state. Highly recommend to wait them complete the SYNCING before starting the db node.

Step 7. Start the DB Nodes
1) Verify you can access ALL db nodes from ILOM.

2) From ILOM, boot up the db node 1, monitor the progress from remote console.

3) ssh to db node as oracle user, source ASM environment.

4) Check whether database are online by the following command.
crsctl stat res -t

5) Repeat the above process for the rest of database nodes.

6) Verify database alert files to see anything unusual.