Disabling Firewall after Turning off Firewall

firewall
Many applications requires to disable firewall on Linux. The most common used commands are as follows:

Stop the ipchains service.
# service ipchains stop
Stop the iptables service.
# service iptables stop
Stop the ipchains service after reboot.
# chkconfig ipchains off
Stop the iptables service after reboot.
# chkconfig iptables off

Another popular one is to set SELINUX=disabled in the /etc/selinux/config file to disable some extra security restrictions.

The above usually works fine with me when turning off firewall. Recently I run into a situation that makes me to add extra check for firewall stuff. The consultant tried to install Oracle Big Data Discovery on a Red Hat Linux VM and connect it to an Oracle Big Data Appliance (BDA) X6-2 Starter Rack. He used similar approaches as above to turn off the firewall and Linux security between this Red Hat VM and BDA. But still run into a weird issue when BDD application on BDA nodes try to pull a request from a web service on this Red Hat VM. The result has never come back.

I tried ping and ssh. Both worked. Hmm, it does show the connectivity between both. Looks like
firewall issue. Check with network infrastructure team. It has firewall rules between the two, but not enabled yet.

I noticed the OS is Red Hat 7.1 Linux. Could be some new firewall feature in 7.1? After some investigation, yes, it does. On Redhat 7 Linux, the firewall run as firewalld daemon. So let me find out what it does.

[root@bddhost ~]# firewall-cmd --zone=public --list-services 
dhcpv6-client ssh

[root@bddhost ~]# firewall-cmd --get-default-zone 
public

[root@bddhost ~]# firewall-cmd --list-all
public (default, active)
  interfaces: eth0 eth2
  sources:
  services: dhcpv6-client ssh
  ports:
  masquerade: no
  forward-ports:
  icmp-blocks:
  rich rules:

The above commands shows the firewall allows only ssh service. Not wonder http web service is not working.

Ok, let me stop it.

[root@bddhost ~]# systemctl stop firewalld
[root@bddhost ~]# firewall-cmd --list-ports
FirewallD is not running

Right now the WGET is working from BDA to BDD VM.

[root@uat-bda1node01 ~]# wget http://192.168.2113:7003/endeca-server/ws/config?wsdl
--2016-10-03 18:56:29--  http://192.168.2113:7003/endeca-server/ws/config?wsdl
Connecting to 192.168.2113:7003... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2529 (2.5K) [text/xml]
Saving to: “config?wsdl”
100%[============================================>] 2,529       --.-K/s   in 0s
2016-10-03 18:56:29 (456 MB/s) - “config?wsdl” saved [2529/2529]

The above changes works only if the server is not rebooted.

[root@bddhost ~]# systemctl status firewalld
firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled)
   Active: inactive (dead) since Mon 2016-10-03 18:56:22 SGT; 10min ago
 Main PID: 1016 (code=exited, status=0/SUCCESS)

Sep 30 12:52:35 localhost.localdomain systemd[1]: Started firewalld - dynamic fire....
Sep 30 15:13:09 bddhost.example.com firewalld[1016]: 2016-09-30 15:13:09 ERR...
Oct 03 18:56:21 bddhost systemd[1]: Stopping firewalld - dynamic firewall dae.....
Oct 03 18:56:22 bddhost systemd[1]: Stopped firewalld - dynamic firewall daemon.
Hint: Some lines were ellipsized, use -l to show in full.

To make the change to be permeant, need to do the following:

[root@bddhost ~]# systemctl disable firewalld
rm '/etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service'
rm '/etc/systemd/system/basic.target.wants/firewalld.service’

[root@bddhost ~]# systemctl status firewalld
firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled)
   Active: inactive (dead)

Sep 30 12:52:33 localhost.localdomain systemd[1]: Starting firewalld - dynamic fir....
Sep 30 12:52:35 localhost.localdomain systemd[1]: Started firewalld - dynamic fire....
Sep 30 15:13:09 bddhost.example.com firewalld[1016]: 2016-09-30 15:13:09 ERR...
Oct 03 18:56:21 bddhost systemd[1]: Stopping firewalld - dynamic firewall dae.....
Oct 03 18:56:22 bddhost systemd[1]: Stopped firewalld - dynamic firewall daemon.
Hint: Some lines were ellipsized, use -l to show in full.

To learn more about this firewalld daemon, please check out this link at https://www.digitalocean.com/community/tutorials/how-to-set-up-a-firewall-using-firewalld-on-centos-7.

Advertisements

Replace Cisco Ethernet Switch on Exadata

network_port

Usually there is no need to replace Cisco switch on Exadata. However, certain enterprises might have their own standards to use different switch as part of enterprise standard. In this case, the Cisco ethernet switch on Exadata will be replaced. Oracle Support has a nice document about the process to replace Cisco switch, How to replace a Cisco ethernet switch in an Engineered Systems rack (Doc ID 1531203.1). This document is a good one, but focuses only on steps of replacing Cisco Switch, not enough to specify whether additional steps that need to be performed.

At first, I thought Cisco switch only affected the traffic on Management Network on Exadata and don’t have to shut down database and cell nodes. After discussing with my colleague, Andy Colvin, he brought some good points. Although it is not required to shut down the system, there will be no way to get into any of the components via SSH. Furthermore the storage servers will lose connectivity to DNS, which will have adverse consequences on performance. With so many network cables moving around, it would definitely be easier to shut down the entire system and replace the switch. Yes, that makes sense. Here are the high level steps to replace Cisco Switch.
1. Shutdown database nodes
2. Shutdown cell nodes
3. Flip off the switches on PDUs to make sure everything is down.
4. Replace the Cisco switch
5. Turn on PDUs and verify new Ethernet switch
6. Start cell nodes
7. Start database nodes.

Here are the detail steps.
Step 1. Shudown database nodes

1) Logon as oracle user to db node 1 and source one db env, get the status of the database.

crsctl status res -­t | more

check status of oracle instances.
ps -ef |grep pmon

The above steps are optional. Just to make sure all databases are running normal. If seeing issues in database, you might want to resolve it first before replacing Cisco switch. You don’t add the complexity of issues in the middle of switch changes.

2) Stop all the database currently running on Exadata by using srvctl command.
srvctl stop database -d yourdbname

3) Logon as root user to db node 1 and stop crs on the current node.
/u01/app/11.2/grid/bin/crsctl stop crs

During the shutdown process of CRS, run the following command regularly to check the number of oracle processes. It should reduce to 0 when CRS is stopped.
ps -ef|grep d.bin|grep -v grep|wc -l

4) Verify all oracle databases are shut down.
ps -­ef | grep pmon

5) Power off the node
Logon to ILOM to Power Off Server
or
shutdown -h -y now

6) Repeat the above steps for the rest of database nodes

Step 2. Shutdown cell nodes
1) Logon as root user to cell node 1

2) Check cell service, verify the disk status
service celld status
or
cellcli -e list cell detail

3) Verify disk status and they should be ONLINE, not SYNCING state
cellcli -e list griddisk attributes name,asmmodestatus

4) Stop the cell service
service celld stop
or
cellcli -e alter cell shutdown services all

5) Verify the cell services are down
service celld status

6) Logon to ILOM
Logon to ILOM to Power Off Server
or
shutdown -h -y now

7) Repeat the same process for the rest of cell nodes

Step 3. Turn off the PDUs
There is no power button on IB switches. As long as PDUs is on, the IB switches are always on. Before turn off the PDUS, verify ILOM for IB switch is working.

Login to IB switches using ILOM to verify it you can login from there.
http://ib-switch-name/

If ILOM for IB switches is working, flip off the switch on PDUs

Step 4. Replace the Cisco Switch
Use Oracle Support document to replace the switch
How to replace a Cisco ethernet switch in an Engineered Systems rack (Doc ID 1531203.1)

Step 5. Turn on PDU and verify accessibility to/from IB switches

1) Turn on PDUs
After turning on PDU, the IB switches are automatically starts. Make sure to give a few minutes to allow IB switches fully boot up before doing anything.

2) Verify the IB switch
To verify IB switch is ok, run the following command as root user on IB switch
env_test

3) Verify the network connectivity to/from the IB switch. You don’t want start cell nodes if you know you have connectivity issues from/to IB switches. There is no nslookup command on IB switch. So you have to use ping command to figure out whether DNS is working or not on IB switches.
a. First ping IB switch and ssh to it as root user

b. After login, ping a server outside Exadata by hostname. It should work.

c. Then ping a db node and a cell node by hostname

d. Finally, login to IB switch using ILOM to verify it you can login from there
http://ib-switch-name/

Step 6. Start the cell nodes
1) Verify you can access ALL cell nodes from ILOM
http://cell-node-ilom/

2) From ILOM, boot up the cell node, monitor the progress from remote console

3) ssh to cell node as root user, this is to verify NET0 connection is working

4) Verify all cell services are up
service celld status
or
cellcli -e list cell detail

5) Verify all disks are from SYNCING state to ONLINE state
cellcli -e list griddisk attributes name, asmmodestatus

6) Wait until all cell nodes’ disks showing ONLINE state. Highly recommend to wait them complete the SYNCING before starting the db node.

Step 7. Start the DB Nodes
1) Verify you can access ALL db nodes from ILOM.
http://db-node-ilom/

2) From ILOM, boot up the db node 1, monitor the progress from remote console.

3) ssh to db node as oracle user, source ASM environment.

4) Check whether database are online by the following command.
crsctl stat res -t

5) Repeat the above process for the rest of database nodes.

6) Verify database alert files to see anything unusual.

Change DNS on Exadata

phone

At least at the time I wrote this blog, there is no oracle support document showing how to change DNS on Exadata. So it might be a good idea to show how to do it. Similar to my previous post, Change Time Zone Configuration on Exadata, changing DNS also involves the changes in the four components on Exadata.

  • DB nodes
  • Cell nodes
  • IB Switches
  • Ethernet Switches

The following example, we assume the current DNS servers are using the following two IPs, 192.168.10.12 and 192.168.10.13 and we would like to change nameserver to 192.168.10.14 and 192.168.10.15

Step 1. Change at InfiniBand Switches

1. Logon to the first IB switch as root user.
ssh root@enkx3sw-ib2.enkitec.com

2. Edit file /etc/resolv.conf

cp -p /etc/resolv.conf /etc/resolv.conf.yyyymmdd
vi /etc/resolv.conf

Change the line of
nameserver 192.168.10.12
nameserver 192.168.10.13
to
nameserver 192.168.10.14
nameserver 192.168.10.15
3. Verify the change
Note: Interestingly, there is no nslookup program at CentOS on InfiniBand Switch. So have to use ping a hostname to see whether it can translate hostname to an IP.

[root@enkx3sw-ib2 etc]# cat /etc/redhat-release
CentOS release 5.2 (Final)

[root@enkx3sw-ib2 etc]# nslookup enkitec.com
-bash: nslookup: command not found

[root@enkx3sw-ib2 etc]# ping enkitec.com
PING enkitec.com (192.168.100.19) 56(84) bytes of data.
64 bytes from esert-cloud.enkitec.com (192.168.100.19): icmp_seq=1 ttl=64 time=7.99 ms
64 bytes from esert-cloud.enkitec.com (192.168.100.19): icmp_seq=2 ttl=64 time=5.99 ms
64 bytes from esert-cloud.enkitec.com (192.168.100.19): icmp_seq=3 ttl=64 time=4.99 ms
^C

4. Goto the rest of IB switches to make the similar change

Step 2. Changes at db nodes
1. Logon to db node 1 as root user
ssh root@enkx3db01.enkitec.com

2. Modify /etc/resolv.conf

cp -p /etc/resolv.conf /etc/resolv.conf.yyyymmdd
vi /etc/resolv.conf

Do the similar changes at IB switches

3. Use nslookup command to verify the change

[root@enkx3db01 ~]# nslookup enkx3cel01
Server:		192.168.10.14
Address:	192.168.10.14#53

Name:	enkx3cel01.enkitec.com
Address: 192.168.8.234

4. Repeat the same process to the rest of db nodes

Step 3. Changes at cell nodes and their ILOM’s
When make changes to cell nodes, make sure working on one cell node at a time and ensure all disks are re-synced before proceeding to the next cell node.

1. Make a backup of /opt/oracle.cellos/cell.conf

cd /opt/oracle.cellos
cp -p cell.conf cell.conf.yyyymmdd

2. Shutdown all cell services by running the following command
cellcli -e alter cell shutdown services all

3. Execute ipconf to change DNS entry. Make sure only change DNS entry, not others.

[root@dm01cel01 oracle.cellos]# /opt/oracle.cellos/ipconf
Logging started to /var/log/cellos/ipconf.log
Interface ib0 is Linked.  hca: mlx4_0
Interface ib1 is Linked.  hca: mlx4_0
Interface eth0 is Linked.  driver/mac: igb/00:21:28:f8:6c:de
Interface eth1 is ... Unlinked.  driver/mac: igb/00:21:28:f8:6c:df
Interface eth2 is ... Unlinked.  driver/mac: igb/00:21:28:f8:6c:e0
Interface eth3 is ... Unlinked.  driver/mac: igb/00:21:28:f8:6c:e1

Network interfaces
Name     State      IP address      Netmask         Gateway         Net type     Hostname
ib0      Linked
ib1      Linked
eth0     Linked
eth1     Unlinked
eth2     Unlinked
eth3     Unlinked
Warning. Some network interface(s) are disconnected. Check cables and swicthes and retry
Do you want to retry (y/n) [y]: n

The current nameserver(s): 192.168.10.12 192.168.10.13
Do you want to change it (y/n) [n]: y
Nameserver: 192.168.10.14
Add more nameservers (y/n) [n]: y
Nameserver: 192.168.10.15
Add more nameservers (y/n) [n]: n
The current timezone: America/Chicago
Do you want to change it (y/n) [n]:
The current NTP server(s): 192.168.10.15
Do you want to change it (y/n) [n]:

Network interfaces
Name     State      IP address      Netmask         Gateway         Net type     Hostname
eth0     Linked     192.168.100.73  255.255.255.0   192.168.100.254 Management   enkx3cel01.enkitec.com
eth1     Unlinked
eth2     Unlinked
eth3     Unlinked
bondib0  ib0,ib1    172.30.1.3      255.255.255.0                   Private      enkx3cel01-priv.enkitec.com
Select interface name to configure or press Enter to continue:

Select canonical hostname from the list below
1: enkx3cel01.enkitec.com
2: enkx3cel01-priv.enkitec.com
Canonical fully qualified domain name [1]:

Select default gateway interface from the list below
1: eth0
Default gateway interface [1]:

Canonical hostname: enkx3cel01.enkitec.com
Nameservers: 192.168.10.14 192.168.10.15
Timezone: America/Chicago
NTP servers: 192.168.10.15
Default gateway device: eth0
Network interfaces
Name     State      IP address      Netmask         Gateway         Net type     Hostname
eth0     Linked     192.168.100.73  255.255.255.0   192.168.100.254 Management   enkx3cel01.enkitec.com
eth1     Unlinked
eth2     Unlinked
eth3     Unlinked
bondib0  ib0,ib1    172.30.1.3      255.255.255.0                   Private      enkx3cel01-priv.enkitec.com
Is this correct (y/n) [y]:

Do you want to configure basic ILOM settings (y/n) [y]:
Loading basic configuration settings from ILOM ...
ILOM Fully qualified hostname [enkx3cel01-ilom.enkitec.com]:
ILOM IP address [192.168.100.78]:
ILOM Netmask [255.255.255.0]:
ILOM Gateway or none [192.168.100.254]:
ILOM Nameserver or none [192.168.10.12]: 192.168.10.14
ILOM Use NTP Servers (enabled/disabled) [enabled]:
ILOM First NTP server. Fully qualified hostname or ip address or none [192.168.10.15]:
ILOM Second NTP server. Fully qualified hostname or ip address or none [none]:

Basic ILOM configuration settings:
Hostname             : enkx3cel01-ilom.enkitec.com
IP Address           : 192.168.100.78
Netmask              : 255.255.255.0
Gateway              : 192.168.100.254
DNS servers          : 192.168.10.14
Use NTP servers      : enabled
First NTP server     : 192.168.10.15
Second NTP server    : none
Timezone (read-only) : America/Chicago

Is this correct (y/n) [y]: y
Connected. Use ^D to exit.
-> set /SP/clients/dns nameserver=192.168.10.14
Set 'nameserver' to '192.168.10.14'

-> Session closed
Disconnected

Info. Run /opt/oracle.cellos/validations/init.d/saveconfig
Info. Custom changes have been detected in /etc/resolv.conf
Info. Original file will be saved in /etc/resolv.conf.backupbyExadata

Warning. You modified DNS name server.
         Ensure you also update the Infiniband Switch DNS server
         if the same DNS server was also used by the Infiniband switch.

4. Compare the differences between the following two files:

diff /opt/oracle.cellos/cell.conf /opt/oracle.cellos/cell.conf.yyyymmdd

You will see a lot of difference. But if look at it more closely, they are actually the same one, just move things around the file.

5. Restart all cell services by running the following command
cellcli -e alter cell restart services all

6. Verify cell processes are up by running the following
cellcli -e list cell detail

The last three lines should be in running state

	 cellsrvStatus:     	 running
	 msStatus:          	 running
	 rsStatus:          	 running

7. Regularly run the following command to make sure griddisk status changes from SYNCING to ONLINE

[root@enkx3cel01 ~]# cellcli -e list griddisk attributes name,asmmodestatus
	 DATA_CD_00_enkx3cel01   	 SYNCING
	 DATA_CD_01_enkx3cel01   	 SYNCING
	 DATA_CD_02_enkx3cel01   	 SYNCING
	 DATA_CD_03_enkx3cel01   	 SYNCING
	 DATA_CD_04_enkx3cel01   	 SYNCING
	 DATA_CD_05_enkx3cel01   	 SYNCING
	 DATA_CD_06_enkx3cel01   	 SYNCING
	 DATA_CD_07_enkx3cel01   	 SYNCING
	 DATA_CD_08_enkx3cel01   	 SYNCING
	 DATA_CD_09_enkx3cel01   	 SYNCING
	 DATA_CD_10_enkx3cel01   	 SYNCING
	 DATA_CD_11_enkx3cel01   	 SYNCING
	 DBFS_DG_CD_02_enkx3cel01	 ONLINE
	 DBFS_DG_CD_03_enkx3cel01	 ONLINE
	 DBFS_DG_CD_04_enkx3cel01	 ONLINE
	 DBFS_DG_CD_05_enkx3cel01	 ONLINE
	 DBFS_DG_CD_06_enkx3cel01	 ONLINE
	 DBFS_DG_CD_07_enkx3cel01	 ONLINE
	 DBFS_DG_CD_08_enkx3cel01	 ONLINE
	 DBFS_DG_CD_09_enkx3cel01	 ONLINE
	 DBFS_DG_CD_10_enkx3cel01	 ONLINE
	 DBFS_DG_CD_11_enkx3cel01	 ONLINE
	 RECO_CD_00_enkx3cel01   	 SYNCING
	 RECO_CD_01_enkx3cel01   	 SYNCING
	 RECO_CD_02_enkx3cel01   	 SYNCING
	 RECO_CD_03_enkx3cel01   	 SYNCING
	 RECO_CD_04_enkx3cel01   	 SYNCING
	 RECO_CD_05_enkx3cel01   	 SYNCING
	 RECO_CD_06_enkx3cel01   	 SYNCING
	 RECO_CD_07_enkx3cel01   	 SYNCING
	 RECO_CD_08_enkx3cel01   	 SYNCING
	 RECO_CD_09_enkx3cel01   	 SYNCING
	 RECO_CD_10_enkx3cel01   	 SYNCING
	 RECO_CD_11_enkx3cel01   	 SYNCING

8. This step is optional, only uses when iptables command still shows using old DNS entries.

[root@enkx3cel01 oracle.cellos]# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:5042
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:5042 flags:0x17/0x02
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:3260
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:3260 flags:0x17/0x02
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22 flags:0x17/0x02
ACCEPT     udp  --  192.168.100.240      0.0.0.0/0           udp spt:123
ACCEPT     tcp  --  192.168.10.14        0.0.0.0/0           tcp spt:53
ACCEPT     udp  --  192.168.10.14        0.0.0.0/0           udp spt:53
ACCEPT     tcp  --  192.168.10.15        0.0.0.0/0           tcp spt:53
ACCEPT     udp  --  192.168.10.15        0.0.0.0/0           udp spt:53
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpts:1024:65535 flags:0x17/0x02 reject-with icmp-port-unreachable
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpts:1024:65535
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
REJECT     udp  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
ACCEPT     udp  --  192.168.100.78       0.0.0.0/0           udp dpt:162
ACCEPT     udp  --  192.168.100.78       0.0.0.0/0           udp spt:623 dpts:1024:65535
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22 flags:0x17/0x02
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22
ACCEPT     udp  --  192.168.100.240      0.0.0.0/0           udp spt:123
ACCEPT     tcp  --  192.168.10.14        0.0.0.0/0           tcp spt:53
ACCEPT     udp  --  192.168.10.14        0.0.0.0/0           udp spt:53
ACCEPT     tcp  --  192.168.10.15        0.0.0.0/0           tcp spt:53
ACCEPT     udp  --  192.168.10.15        0.0.0.0/0           udp spt:53
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpts:1024:65535 flags:0x17/0x02 reject-with icmp-port-unreachable
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpts:1024:65535
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
REJECT     udp  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0
REJECT     udp  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

If you see there are still entries for old DNS, like 192.168.10.12 or 192.168.10.13, then you need to restart cellwall service. Cellwall implements firewall services on each cell using IPTables.

service cellwall restart

9. Verify the result using nslookup command.

Commandline Utilities for Infiniband Network on Exadata

highspeed_data

Many times I need to check out the network traffic on Exadata. 12c OEM Cloud Control is good way to monitor system performance on Exadata. However, sometime I need something quick and want to see the network traffic result from command line. Here are the commands I usually use to check out network traffic for ethernet network and infiniband network.

The first command is dstat.
dstat -dnyc -N eth0,bondeth0,bondib0 -C total -f

Here is the result I run from our X3 box.

Network_dstat_2
The Options I used are
-c enable cpu stats
-d enable disk read/write stats
-n enable network stats (receive, send)
-y enable system stats (interrupts, context switches)

If add three options lms, it will also shows load, memory usage and swap usage.
dstat -dnyclms -N eth0,bondeth0,bondib0 -C total -f

Network_dstat_1

Another command is sar
sar -n DEV 3 100|egrep ‘bondib0|bondeth0|eth0’

network_sar_1

The above command does not show the heading for the sar command. Here the one with heading:

Network_sar_2

In the previous post, iDB vs RDS vs SDP on Exadata, I discussed high level overview about these three different concepts used in Oracle Exadata and related Oracle Engineered Systems. I will show a few more useful commands to illustrate these protocols.

The first command I would like to talk about is ibhost. This InfiniBand command discovers the InfiniBand fabric topology or uses the existing topology file to extract the channel adapter nodes. The followings is the output from our x3 1/8 rack Exadata.

[root@enkx3db01 ~]# ibhosts
Ca	: 0x0010e00b4e20c000 ports 2 "SUN IB QDR GW switch enkbda1sw-ib2 192.168.8.150 Bridge 0"
Ca	: 0x0010e00b4e20c040 ports 2 "SUN IB QDR GW switch enkbda1sw-ib2 192.168.8.150 Bridge 1"
Ca	: 0x0021280001efdf70 ports 2 "enkbda1node10 BDA 192.168.12.110 HCA-1"
Ca	: 0x0021280001efd5ee ports 2 "enkbda1node09 BDA 192.168.12.109 HCA-1"
Ca	: 0x0021280001efd4ea ports 2 "enkbda1node12 BDA 192.168.12.112 HCA-1"
Ca	: 0x0021280001efd4d6 ports 2 "enkbda1node11 BDA 192.168.12.111 HCA-1"
Ca	: 0x0021280001efd5f6 ports 2 "enkbda1node14 BDA 192.168.12.114 HCA-1"
Ca	: 0x0021280001efd4e6 ports 2 "enkbda1node13 BDA 192.168.12.113 HCA-1"
Ca	: 0x0021280001ceda62 ports 2 "enkbda1node16 BDA 192.168.12.116 HCA-1"
Ca	: 0x0021280001cf5abe ports 2 "enkbda1node15 BDA 192.168.12.115 HCA-1"
Ca	: 0x0021280001efac6a ports 2 "enkbda1node18 BDA 192.168.12.118 HCA-1"
Ca	: 0x0021280001efd4fa ports 2 "enkbda1node17 BDA 192.168.12.117 HCA-1"
Ca	: 0x0021280001efdf68 ports 2 "enkbda1node08 BDA 192.168.12.108 HCA-1"
Ca	: 0x0021280001efd5e6 ports 2 "enkbda1node07 BDA 192.168.12.107 HCA-1"
Ca	: 0x0021280001efd606 ports 2 "enkbda1node05 BDA 192.168.12.105 HCA-1"
Ca	: 0x0021280001efd4ee ports 2 "enkbda1node06 BDA 192.168.12.106 HCA-1"
Ca	: 0x0021280001efd616 ports 2 "enkbda1node03 BDA 192.168.12.103 HCA-1"
Ca	: 0x0021280001efdf98 ports 2 "enkbda1node04 BDA 192.168.12.104 HCA-1"
Ca	: 0x0021280001efd84e ports 2 "enkbda1node01 BDA 192.168.12.101 HCA-1"
Ca	: 0x0021280001efdf6c ports 2 "enkbda1node02 BDA 192.168.12.102 HCA-1"
Ca	: 0x0010e00b88c0c000 ports 2 "SUN IB QDR GW switch enkbda1sw-ib3 192.168.8.151 Bridge 0"
Ca	: 0x0010e00b88c0c040 ports 2 "SUN IB QDR GW switch enkbda1sw-ib3 192.168.8.151 Bridge 1"
Ca	: 0x0021280001fcb9ec ports 2 "enkalytics EL-C 192.168.12.131 HCA-1"
Ca	: 0x0021280001fc4a1e ports 2 "enkx3db02 S 192.168.12.2 HCA-1"
Ca	: 0x0021280001fcbf5c ports 2 "enkx3cel03 C 192.168.12.5 HCA-1"
Ca	: 0x0021280001fbe18e ports 2 "enkx3cel01 C 192.168.12.3 HCA-1"
Ca	: 0x0021280001fc80c6 ports 2 "enkx3cel02 C 192.168.12.4 HCA-1"
Ca	: 0x0010e0000128ce64 ports 2 "enkx3db01 S 192.168.12.1 HCA-1"

For a 1/8 rack, it has 2 database nodes, 3 cell nodes and 2 IB switches. You might notice we have many more nodes than supposed to be on the InfiniBand fabric. From the naming, you might figure out we have our X3 Exadata, Oracle Big Data Appliance, and Oracle Exlatics connected together all within the same InfiniBand network.

If just want to list IB switches only, use ibswitches command

[root@enkx3db01 ~]# ibswitches
Switch	: 0x002128f57326a0a0 ports 36 "SUN DCS 36P QDR enkbda1sw-ib1 192.168.8.149" enhanced port 0 lid 59 lmc 0
Switch	: 0x0010e00b88c0c0a0 ports 36 "SUN IB QDR GW switch enkbda1sw-ib3 192.168.8.151" enhanced port 0 lid 61 lmc 0
Switch	: 0x0010e00b4e20c0a0 ports 36 "SUN IB QDR GW switch enkbda1sw-ib2 192.168.8.150" enhanced port 0 lid 60 lmc 0
Switch	: 0x002128f575bba0a0 ports 36 "SUN DCS 36P QDR enkx3sw-ib3.enkitec.com" enhanced port 0 lid 1 lmc 0
Switch	: 0x002128f57469a0a0 ports 36 "SUN DCS 36P QDR enkx3sw-ib2.enkitec.com" enhanced port 0 lid 2 lmc 0

In TCP/IP network, we use ping command to verify whether a host can be accessed or not. Similarly, in InfiniBand network, we use rds-ping command to ping another IB node in the network. The following example shows we could do rds-ping from Exlatics node to the ibvip on the first database node.

[root@enkalytics ~]# rds-ping -c 5 enkx3db01-ibvip.enkitec.com
   1: 240 usec
   2: 214 usec
   3: 201 usec
   4: 199 usec
   5: 269 usec

usec is microseconds.

Another useful rds related command is rds-info. To save space, I removed some smiliar messages.

[root@enkx3db01 ~]# rds-info

RDS IB Connections:
      LocalAddr      RemoteAddr                         LocalDev                        RemoteDev
  192.168.12.31  192.168.12.131           fe80::10:e000:128:ce66           fe80::21:2800:1fc:b9ee
   192.168.12.1    192.168.12.3           fe80::10:e000:128:ce66           fe80::21:2800:1fb:e18f
   192.168.12.1    192.168.12.1           fe80::10:e000:128:ce66           fe80::10:e000:128:ce66
  192.168.12.31   192.168.12.31           fe80::10:e000:128:ce66           fe80::10:e000:128:ce66
   192.168.12.1  192.168.12.101                               ::                               ::
 169.254.87.194  169.254.87.194           fe80::10:e000:128:ce66           fe80::10:e000:128:ce66
   192.168.12.1  192.168.12.118                               ::                               ::
   192.168.12.1    192.168.12.5           fe80::10:e000:128:ce66           fe80::21:2800:1fc:bf5e
   192.168.12.1    192.168.12.4           fe80::10:e000:128:ce66           fe80::21:2800:1fc:80c8
   192.168.12.1    192.168.12.2           fe80::10:e000:128:ce66           fe80::21:2800:1fc:4a20
  192.168.12.31    192.168.12.2           fe80::10:e000:128:ce66           fe80::21:2800:1fc:4a20
 169.254.87.194  169.254.97.245           fe80::10:e000:128:ce66           fe80::21:2800:1fc:4a20
rds-info: Unable get statistics: Protocol not available

Counters:
              CounterName            Value
               conn_reset          2879033
   recv_drop_bad_checksum                0
        recv_drop_old_seq               17
        recv_drop_no_sock             2985
      recv_drop_dead_sock                0
       recv_deliver_raced                0
           recv_delivered        222260977
              recv_queued        130604931
     recv_immediate_retry                0
       recv_delayed_retry                0
        recv_ack_required         14752884
          recv_rdma_bytes     136276672512
                recv_ping           288786
         send_queue_empty         85915668
          send_queue_full               15
     send_lock_contention           764917
    send_lock_queue_raced            16222
     send_immediate_retry                0
       send_delayed_retry             1222
          send_drop_acked                0
        send_ack_required         12818936
              send_queued        115370469
                send_rdma           261202
          send_rdma_bytes     136280842240
                send_pong           288786
       page_remainder_hit        102755106
      page_remainder_miss         11378031
             copy_to_user     165149391125
           copy_from_user     126738139340
       cong_update_queued                0
     cong_update_received               49
          cong_send_error                0
        cong_send_blocked                0
         ib_connect_raced               24
   ib_listen_closed_stale                0
      ib_evt_handler_call        278198961
          ib_tasklet_call        278198961
           ib_tx_cq_event        138358343
          ib_tx_ring_full             1319
           ib_tx_throttle                0
 ib_tx_sg_mapping_failure                0
            ib_tx_stalled              259
     ib_tx_credit_updates                0
           ib_rx_cq_event        172332234
         ib_rx_ring_empty               83
     ib_rx_refill_from_cq                0
 ib_rx_refill_from_thread                0
        ib_rx_alloc_limit                0
     ib_rx_credit_updates                0
              ib_ack_sent         14648252
      ib_ack_send_failure                0
      ib_ack_send_delayed           125532
  ib_ack_send_piggybacked            73902
          ib_ack_received         12960925
         ib_rdma_mr_alloc             6355
          ib_rdma_mr_free             5488
          ib_rdma_mr_used         48938489
    ib_rdma_mr_pool_flush          6438472
     ib_rdma_mr_pool_wait                0
 ib_rdma_mr_pool_depleted                0
           ib_atomic_cswp                0
           ib_atomic_fadd                0
         iw_connect_raced                0
   iw_listen_closed_stale                0
            iw_tx_cq_call                0
           iw_tx_cq_event                0
          iw_tx_ring_full                0
           iw_tx_throttle                0
 iw_tx_sg_mapping_failure                0
            iw_tx_stalled                0
     iw_tx_credit_updates                0
            iw_rx_cq_call                0
           iw_rx_cq_event                0
         iw_rx_ring_empty                0
     iw_rx_refill_from_cq                0
 iw_rx_refill_from_thread                0
        iw_rx_alloc_limit                0
     iw_rx_credit_updates                0
              iw_ack_sent                0
      iw_ack_send_failure                0
      iw_ack_send_delayed                0
  iw_ack_send_piggybacked                0
          iw_ack_received                0
         iw_rdma_mr_alloc                0
          iw_rdma_mr_free                0
          iw_rdma_mr_used                0
    iw_rdma_mr_pool_flush                0
     iw_rdma_mr_pool_wait                0
 iw_rdma_mr_pool_depleted                0

RDS Sockets:
      BoundAddr BPort        ConnAddr CPort     SndBuf     RcvBuf    Inode
   192.168.12.1  7978         0.0.0.0     0     262144    2097152 1422468668
   192.168.12.1 31215         0.0.0.0     0     262144    2097152 1422492506
   192.168.12.1  7588         0.0.0.0     0     262144    2097152 1422492510

....

 169.254.87.194 61081         0.0.0.0     0     131072    2097152 1531223507
   192.168.12.1 16962         0.0.0.0     0     262144    2097152 1534922520
   192.168.12.1   442         0.0.0.0     0     131072    2097152 1534922522
   192.168.12.1 49167         0.0.0.0     0     262144    2097152 1539515072
   192.168.12.1 48917         0.0.0.0     0     131072    2097152 1539515074
   192.168.12.1 14675         0.0.0.0     0     262144    2097152 1539517764
   192.168.12.1 12371         0.0.0.0     0     131072    2097152 1539517766
        0.0.0.0     0         0.0.0.0     0     131072    2097152 1539617228

RDS Connections:
      LocalAddr      RemoteAddr           NextTX           NextRX Flg
  192.168.12.31  192.168.12.131               13               13 --C
   192.168.12.1    192.168.12.3         22473392          1304865 --C
   192.168.12.1    192.168.12.1          2385972           139482 --C
  192.168.12.31   192.168.12.31                9                9 --C
   192.168.12.1  192.168.12.101                4                0 ---
 169.254.87.194  169.254.87.194           492250                0 --C
   192.168.12.1  192.168.12.118              119                0 ---
   192.168.12.1    192.168.12.5         36615339        150138911 --C
   192.168.12.1    192.168.12.4         17227536         60920516 --C
   192.168.12.1    192.168.12.2         28714935          5551186 --C
  192.168.12.31    192.168.12.2              287              287 --C
      127.0.0.1       127.0.0.1            18895            18895 --C
 169.254.87.194  169.254.97.245          7302242          1582910 --C

Receive Message Queue:
      LocalAddr LPort      RemoteAddr RPort              Seq      Bytes
   192.168.12.1 22526    192.168.12.2 20819          3971282        168
   192.168.12.1 22526    192.168.12.2 20819          4210130        168
   192.168.12.1 22526    192.168.12.2 20819          5177334        168
   192.168.12.1 22526    192.168.12.2 20819          5288457        168
   192.168.12.1 44950    192.168.12.2 27716          4485037        168
   192.168.12.1 44950    192.168.12.2 27716          4603330        168
   192.168.12.1 44950    192.168.12.2 27716          4717860        168

....

 169.254.87.194 61209  169.254.97.245  2286          1322929        168
 169.254.87.194 61209  169.254.97.245 32997          1322939        168
   192.168.12.1 62729    192.168.12.2 62458          5513836        168
   192.168.12.1 62729    192.168.12.2 33848          5513844        168
   192.168.12.1 62729    192.168.12.2 37522          5513850        168

Send Message Queue:
      LocalAddr LPort      RemoteAddr RPort              Seq      Bytes

Retransmit Message Queue:
      LocalAddr LPort      RemoteAddr RPort              Seq      Bytes
 169.254.87.194 31175  169.254.87.194 42828           492248        156
 169.254.87.194 31175  169.254.87.194 42828           492249        156
 169.254.87.194   104  169.254.97.245 27039          7302241        252

If you want to test the throughput between two IB nodes, you could use rds-stress.
First, start rds-stress on the target node.

[root@enkx3db01 ~]# rds-stress
waiting for incoming connection on 0.0.0.0:4000

Then, run the following command on source IB node.
rds-stress -s enkx3db01-ibvip.enkitec.com -p 4000 -t 1 -D 600000
-s specify the hostname
-p specify the port number
-t specify the number of tasks
-D specify the total bytes in the RDMA message
After it starts on the source node, the target node will also show the progress.

Source IB node

[root@enkalytics ~]#  rds-stress -s enkx3db01-ibvip.enkitec.com -p 4000 -t 1 -D 600000
connecting to 192.168.12.31:4000
negotiated options, tasks will start in 2 seconds
Starting up....
tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu %
   1   1668   1668    3531.12  977247.98  977247.98   31.57   558.62 -1.00
   1   1652   1653    3497.93  968354.24  967768.42   34.82   565.02 -1.00
   1   1673   1673    3541.62  980153.86  980153.86   35.05   556.53 -1.00
   1   1682   1682    3560.71  985437.49  985437.49   28.11   555.55 -1.00
   1   1673   1673    3541.71  980179.34  980179.34   29.50   558.08 -1.00
   1   1663   1663    3520.50  974308.84  974308.84   34.43   560.88 -1.00
   1   1692   1692    3581.88  991294.23  991294.23   30.13   552.21 -1.00
   1   1681   1681    3558.60  984852.60  984852.60   29.30   555.98 -1.00
   1   1666   1666    3526.84  976063.53  976063.53   34.09   560.13 -1.00
   1   1678   1678    3552.24  983093.02  983093.02   31.21   556.31 -1.00
   1   1726   1726    3653.88 1011220.94 1011220.94   31.72   538.88 -1.00
   1   1678   1678    3552.26  983097.93  983097.93   29.29   557.05 -1.00
^C

Target IB node

[root@enkx3db01 ~]# rds-stress
waiting for incoming connection on 0.0.0.0:4000
accepted connection from 192.168.12.131:19942 on 192.168.12.31:4000
negotiated options, tasks will start in 2 seconds
Starting up....
tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu %
   1   1670   1670    3531.99  977489.26  977489.26   15.12   573.84 -1.00
   1   1654   1653    3496.99  967509.78  968095.08   15.71   578.94 -1.00
   1   1675   1675    3542.75  981052.16  979881.45   14.75   570.90 -1.00
   1   1683   1683    3559.70  984572.15  985742.86   15.76   568.26 -1.00
   1   1674   1674    3540.45  979829.57  979829.57   15.93   570.79 -1.00
   1   1666   1666    3523.46  975127.51  975127.51   15.21   575.50 -1.00
   1   1692   1692    3581.02  991057.40  991057.40   16.02   566.28 -1.00
   1   1681   1682    3557.17  984749.23  984163.76   15.50   569.27 -1.00
   1   1667   1667    3526.36  975931.20  975931.20   15.12   574.15 -1.00
   1   1681   1680    3554.11  983903.24  983317.93   15.85   569.99 -1.00
   1   1728   1728    3654.66 1011436.98 1011436.98   16.37   556.93 -1.00
   1   1678   1678    3551.18  982799.19  982799.19   15.53   571.33 -1.00
---------------------------------------------
   1   1677   1677    3551.65  982954.38  982905.59   15.59   570.97 -1.00  (average)

After press CTRL-C on source node, the result average is printed out on target node.

Finally, let’s talk about SDP. My colleague, Andy Colvin, has setup a SDP listener on our X3 Exadata. Here are the command showing Oracle listens on the SDP port.

[enkx3db01:oracle:dbm1] /home/oracle
> srvctl status vip -i enkx3db01-ibvip
VIP enkx3db01-ibvip is enabled
VIP enkx3db01-ibvip is running on node: enkx3db01

[enkx3db01:oracle:dbm1] /home/oracle
> srvctl config listener -l LISTENER_IB
Name: LISTENER_IB
Network: 2, Owner: oracle
Home:
End points: TCP:1522/SDP:1522

[enkx3db01:oracle:+ASM1] /home/oracle
> lsnrctl status LISTENER_IB

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 09-AUG-2013 21:52:19

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_IB
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                24-JUL-2013 11:36:40
Uptime                    16 days 10 hr. 15 min. 38 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /u01/app/11.2.0.3/grid/log/diag/tnslsnr/enkx3db01/listener_ib/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=sdp)(HOST=192.168.12.31)(PORT=1522)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.12.31)(PORT=1522)))
Services Summary...
Service "DBM_ETL" has 1 instance(s).
  Instance "dbm1", status READY, has 1 handler(s) for this service...
Service "DBM_REPORTING" has 1 instance(s).
  Instance "dbm1", status READY, has 1 handler(s) for this service...
Service "dbm" has 1 instance(s).
  Instance "dbm1", status READY, has 1 handler(s) for this service...
The command completed successfully

Like netstat command to check out TCP connections, there is corresponding command, sdpnetstat, for SDP connections. Unfortunately, by default, this command does not exist on Exadata for now, but it does exist on Oracle Big Data Appliance or Oracle Exlatics. Here is one example of the output from Exlatics.

[root@enkalytics ~]# sdpnetstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 localhost.localdom:6700 localhost.localdo:35791 ESTABLISHED 
tcp        0      0 enkalytics.Enkitec:9710 enkalytics.Enkite:42991 ESTABLISHED 
tcp        0      0 enkalytics.Enkitec:9706 enkalytics.Enkite:11093 ESTABLISHED 
tcp        0      0 enkalytics.Enkitec:9701 enkalytics.Enkite:16776 ESTABLISHED 
tcp        0      0 localhost.localdom:6700 localhost.localdo:35793 ESTABLISHED 
tcp        0      0 enkalytics.Enkitec:9710 enkalytics.Enkite:10002 ESTABLISHED 
tcp        0      0 localhost.localdom:6700 localhost.localdo:35790 ESTABLISHED 
....
....
tcp        0      0 localhost.localdo:35791 localhost.localdom:6700 ESTABLISHED 
tcp        0      0 localhost.localdom:6700 localhost.localdo:35798 ESTABLISHED 
tcp        0      0 enkalytics.Enkite:11093 enkalytics.Enkitec:9706 ESTABLISHED 
tcp        0      0 localhost.localdom:6700 localhost.localdo:35797 ESTABLISHED 
tcp        0      0 localhost.localdo:35793 localhost.localdom:6700 ESTABLISHED 
tcp        0      0 enkalytics.Enkite:60565 enkalytics.Enkitec:9710 TIME_WAIT   
tcp        0      0 enkalytics.enkite:36136 enk03-vip.enki:ncube-lm ESTABLISHED 
tcp        0      0 enkalytics.enkite:21666 enkalytic:afs3-callback ESTABLISHED 
tcp        0      0 enkalytics.enkitec:9704 enkalytics.enkite:52854 TIME_WAIT   
tcp        0      0 enkalytics.enkitec:9704 enkalytics.enkite:52849 TIME_WAIT   
tcp        0      0 enkalytics.enkite:25478 enk04-vip.enki:ncube-lm ESTABLISHED 
tcp        0      0 enkalytic:afs3-callback enkalytics.enkite:11226 ESTABLISHED 
tcp        0      0 enkalytics.enkite:44861 enkalytics.enkitec:9704 ESTABLISHED 
tcp        0      0 enkalytics.enkitec:9704 enkalytics.enkite:52850 TIME_WAIT   
tcp        0      0 enkalytics.enkite:44867 enkalytics.enkitec:9704 ESTABLISHED 
tcp        0      0 enkalytics.enkitec:9704 enkalytics.enkite:52846 TIME_WAIT   
tcp        0      0 enkalytics.enkitec:9704 enkalytics.enkite:44867 ESTABLISHED 
tcp        0      0 localhost.localdo:35797 localhost.localdom:6700 ESTABLISHED 
tcp        0      0 enkalytics.enkite:11226 enkalytic:afs3-callback ESTABLISHED 
tcp        0      0 enkalytics.enkitec:9704 enkalytics.enkite:44861 ESTABLISHED 
tcp        0      0 enkalytic:afs3-callback enkalytics.enkite:11210 ESTABLISHED 
sdp        0      0 192.168.12.131:43307    enkx3db01-ib:ricardo-lm ESTABLISHED 
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node Path
unix  20     [ ]         DGRAM                    24350  /dev/log
unix  2      [ ]         DGRAM                    7979   @/org/kernel/udev/udevd
unix  2      [ ]         DGRAM                    27732  @/org/freedesktop/hal/udev_event
unix  2      [ ]         DGRAM                    4592850 
unix  2      [ ]         STREAM     CONNECTED     1138317 
unix  2      [ ]         STREAM     CONNECTED     1104247 
unix  2      [ ]         STREAM     CONNECTED     1099975 
unix  2      [ ]         STREAM     CONNECTED     1099714 
unix  3      [ ]         STREAM     CONNECTED     971435 /var/run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     971434 
unix  2      [ ]         DGRAM                    478947 
unix  3      [ ]         STREAM     CONNECTED     29367  @/tmp/fam-root-
unix  3      [ ]         STREAM     CONNECTED     29366  
unix  3      [ ]         STREAM     CONNECTED     29353  /var/run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     29352  
unix  3      [ ]         STREAM     CONNECTED     29170  /var/run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     29169  
unix  3      [ ]         STREAM     CONNECTED     29164  
unix  3      [ ]         STREAM     CONNECTED     29163  
unix  2      [ ]         DGRAM                    29161  
unix  2      [ ]         DGRAM                    28889  
....
....
unix  2      [ ]         DGRAM                    27890  
unix  3      [ ]         STREAM     CONNECTED     27865  /var/run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     27864  
unix  3      [ ]         STREAM     CONNECTED     27813  @/var/run/hald/dbus-aeLDAYiwqS
unix  3      [ ]         STREAM     CONNECTED     27812  
unix  3      [ ]         STREAM     CONNECTED     27798  @/var/run/hald/dbus-aeLDAYiwqS
unix  3      [ ]         STREAM     CONNECTED     27797  
unix  3      [ ]         STREAM     CONNECTED     27783  @/var/run/hald/dbus-aeLDAYiwqS
unix  3      [ ]         STREAM     CONNECTED     27782  
unix  3      [ ]         STREAM     CONNECTED     27766  /var/run/acpid.socket
unix  3      [ ]         STREAM     CONNECTED     27765  
unix  3      [ ]         STREAM     CONNECTED     27760  @/var/run/hald/dbus-aeLDAYiwqS
unix  3      [ ]         STREAM     CONNECTED     27759  
unix  3      [ ]         STREAM     CONNECTED     27727  @/var/run/hald/dbus-0e5V2Tfgxi
unix  3      [ ]         STREAM     CONNECTED     27726  
unix  2      [ ]         DGRAM                    27562  
unix  3      [ ]         STREAM     CONNECTED     27445  /var/run/dbus/system_bus_socket
unix  3      [ ]         STREAM     CONNECTED     27444  
unix  2      [ ]         DGRAM                    27433  
unix  2      [ ]         DGRAM                    27422  
unix  3      [ ]         STREAM     CONNECTED     27381  
unix  3      [ ]         STREAM     CONNECTED     27380  
unix  3      [ ]         STREAM     CONNECTED     27339  
unix  3      [ ]         STREAM     CONNECTED     27338  
unix  2      [ ]         DGRAM                    26918  
unix  2      [ ]         DGRAM                    24358  
unix  3      [ ]         STREAM     CONNECTED     24299  
unix  3      [ ]         STREAM     CONNECTED     24298  

[root@enkalytics ~]# ifconfig 
bond0     Link encap:InfiniBand  HWaddr 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  
          inet addr:192.168.12.131  Bcast:192.168.12.255  Mask:255.255.255.0
          inet6 addr: fe80::221:2800:1fc:b9ed/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:65520  Metric:1
          RX packets:102393 errors:0 dropped:0 overruns:0 frame:0
          TX packets:133607 errors:0 dropped:16 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:6058691 (5.7 MiB)  TX bytes:4830507 (4.6 MiB)
....

iDB vs RDS vs SDP on Exadata

If you have done work on Exadata, you probably hear many buzz words, like Storage Index, Smart Scan, Offloading, and etc. Many of these features are based on InfiniBand Architecture, which is high-speed interconnect architecture with high throughput and low latency. Talking about InfiniBand, many of us know iDB and RDS on Exadata. But not many people know about SDP. In this post, I discuss more in detail about among iDB, RDS and SDP.

Oracle Exadata uses the Intelligent Database protocol (iDB) to transfer data between Database Node and Storage Cell Node. It is implemented in the database kernel and work as funtion shipping architecture to transparently maps database operations to Exadata operations. iDB can be used to transfer SQL operation from Database Node to Cell node, and get query result back or full data blocks back from Cell Node.

iDB is built on Reliable Datagram Sockets (RDS v3) protocol and runs over InfiniBand ZDP (Zero-loss Zero-copy Datagram Protocol). The objective of ZDP is to eliminate unnessary copying of blocks. RDS is based on Socket API with low overhead, low latency, high bandwidth. Exadata Cell Node can send/receive large transfer using Remote Direct Memory Access (RDMA).

RDMA_copy

RDMA is a direct memory access from the memory of one computer into another computer without involving either’s operating system. The transfer require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. It is quite useful in massively parallel processing environment.

RDS is highly used on Oracle Exadata. RDS can deliver high available and low overhead of datagrams, which is like UDP but more reliable and zero copy. It accesses to InfiniBand via the Socket API. RDS v3 supports both RDMA read and write and can allow large data transfer up to 8MB. It also supports the control messages for asynchronous operation for submit and completion notifications.

rds_sdp_stack

If you want to optimize communications between Oracle Engineered System, like Exadata, Big Data Appliance, and Exlatics, you can use Sockets Direct Protocol (SDP) networking protocol. SDP only deals with stream sockets.

SDP allows high-performance zero-copy data transfers via RDMA network fabrics and uses a standard wire protocol over an RDMA fabric to support stream sockets (SOCK_STREAM). The goal of SDP is to provide an RDMA-accelerated alternative to the TCP protocol on IP, at the same time transparent to the application.

It bypasses the OS resident TCP stack for stream connections between any endpoints on the RDMA fabric. All other socket types (such as datagram, raw, packet, etc.) are supported by the IP stack and operate over standard IP interfaces (i.e., IPoIB on InfiniBand fabrics). The IP stack has no dependency on the SDP stack; however, the SDP stack depends on IP drivers for local IP assignments and for IP address resolution for endpoint identifications.

sdp_stack

In a future post, I will discuss about some commands useful to check out Infiniband traffic, RDS and SDP.