SQL Tuning beyond the Execution Plan

Recently my client asked me to help one development team to find out why one query performed badly for the past few weeks.

The query is a simple update SQL statement with the sqlid of 1am0d1r44m7sx.

UPDATE MYDB_STAGING
SET IS_PROCESSED = 'R',
NM_MODIFIED ='MYDB_VERSIONING'
WHERE DT_COB =:B1
AND IS_LATEST =0
AND IS_PROCESSED = 'N'

Check out the AWR plan and found out two plans associated with this SQL.

SQL> @dplan_awr
Enter value for sql_id: 1am0d1r44m7sx
Enter value for plan_hash_value:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 1am0d1r44m7sx
--------------------
UPDATE MYDB_STAGING SET IS_PROCESSED = 'R', NM_MODIFIED
='MYDB_VERSIONING' WHERE DT_COB=:B1 AND IS_LATEST=0 AND IS_PROCESSED =
'N'

Plan hash value: 1281386787
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                    |                         |       |       |  2398 (100)|          |       |       |
|   1 |  UPDATE                             | MYDB_STAGING            |       |       |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE            |                         |    26 |  1742 |  2398   (1)| 00:00:29 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| MYDB_STAGING            |    26 |  1742 |  2398   (1)| 00:00:29 |   KEY |   KEY |
|   4 |     INDEX RANGE SCAN                | ACTIVE_MKT_DATA_STG_IDX |   104K|       |  2398   (1)| 00:00:29 |   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------------------------
Peeked Binds (identified by position):
--------------------------------------
   1 - :B1 (DATE): 09/12/2014 00:00:00

SQL_ID 1am0d1r44m7sx
--------------------
UPDATE MYDB_STAGING SET IS_PROCESSED = 'R', NM_MODIFIED
='MYDB_VERSIONING' WHERE DT_COB=:B1 AND IS_LATEST=0 AND IS_PROCESSED =
'N'

Plan hash value: 1961169265
------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT        |                  |       |       |    25 (100)|          |       |       |
|   1 |  UPDATE                 | MYDB_STAGING     |       |       |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|                  |  1392 | 22272 |    25   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS FULL    | MYDB_STAGING     |  1392 | 22272 |    25   (0)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------------------
Peeked Binds (identified by position):
--------------------------------------
   1 - :B1 (DATE): 03/13/2015 00:00:00

Next I checked out some basic information about the table.

SQL> @dba_tables
Enter value for owner: MDDBO
Enter value for table_name: MYDB_STAGING

OWNER   TABLE_NAME    STATUS LAST_ANALYZED  NUM_ROWS   BLOCKS    DEGREE PAR
------- ------------- ------ -------------- ---------- --------- ------ ---
MDDBO   MYDB_STAGING  VALID  13-JUN-15      117989022  11234944       1 YES

The table is a partition table and table size is not big, about 120 million rows. Each partition contains 3~7 million rows. According to the user, each execution usually updates 2~3 million rows. Run the following query to get the partition key for the table. It’s DT_COB column.

SQL> select column_name, column_position from dba_part_key_columns
where owner='MDDBO' and name='MYDB_STAGING';  

COLUMN_NAME     COLUMN_POSITION
--------------- ---------------
DT_COB                        1

Based on the above information, the correct access path needs to use single partition full scan, in other words, should use plan 1961169265 instead of plan 1281386787. As I saw many cases of query plan flipped in the past, at first, I thought this could be another example of typical plan changes. Run the query to verify my assumption:

SQL> @awr_plan_change
Enter value for sql_id: 1am0d1r44m7sx
Enter value for instance_number:

   SNAP_ID   NODE BEGIN_INTERVAL_TIME            SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME          AVG_LIO          AVG_PIO
---------- ------ ------------------------------ ------------- --------------- ------------ ------------ ---------------- ----------------
     97353      2 06-MAY-15 07.23.37.548 PM      1am0d1r44m7sx      1961169265            1      118.175      3,531,575.0        270,839.0
     97377      7 07-MAY-15 07.00.24.645 PM      1am0d1r44m7sx                            1    2,660.042      3,820,002.0        341,582.0
     97401      3 08-MAY-15 07.00.04.204 PM      1am0d1r44m7sx                            0    3,573.805      7,998,068.0        980,442.0
     97473      1 11-MAY-15 07.00.12.432 PM      1am0d1r44m7sx                            1      576.099      3,639,026.0        286,946.0
     97497      5 12-MAY-15 07.00.08.174 PM      1am0d1r44m7sx                            1    1,729.939      4,261,057.0        340,114.0
     97521      1 13-MAY-15 07.00.02.560 PM      1am0d1r44m7sx                            1    1,898.494      4,266,891.0        364,785.0
     97545      1 14-MAY-15 07.00.03.598 PM      1am0d1r44m7sx                            1      718.034      3,439,714.0        270,765.0
     97569      1 15-MAY-15 07.00.00.695 PM      1am0d1r44m7sx                            1      829.346      3,731,965.0        284,365.0
     97641      6 18-MAY-15 07.00.04.875 PM      1am0d1r44m7sx                            1      888.942      4,903,243.0        399,901.0
     97665      5 19-MAY-15 07.00.05.411 PM      1am0d1r44m7sx                            1      581.278      3,515,619.0        268,340.0
     97689      6 20-MAY-15 07.00.02.358 PM      1am0d1r44m7sx                            1    1,053.530      3,957,242.0        314,391.0
     97713      8 21-MAY-15 07.00.17.748 PM      1am0d1r44m7sx                            1    3,103.021      5,442,853.0        558,322.0
     97738      1 22-MAY-15 07.00.02.100 PM      1am0d1r44m7sx                            1      723.603      3,640,845.0        285,694.0
     97810      1 25-MAY-15 07.00.01.961 PM      1am0d1r44m7sx                            1      366.762      2,275,088.0        173,936.0
     97834      2 26-MAY-15 07.00.04.461 PM      1am0d1r44m7sx                            0    3,611.451      2,160,988.0        252,119.0
     97835      2 26-MAY-15 08.00.14.555 PM      1am0d1r44m7sx                            0    3,543.604     12,876,728.0      2,320,402.0
     97858      6 27-MAY-15 07.00.09.103 PM      1am0d1r44m7sx                            0    3,589.634      9,592,036.0      1,268,348.0
     97882      8 28-MAY-15 07.00.10.821 PM      1am0d1r44m7sx                            0    3,580.830      6,042,271.0        689,048.0
     97883      8 28-MAY-15 08.00.03.331 PM      1am0d1r44m7sx                            1    1,584.714     17,450,422.0      1,564,930.0
     97906      2 29-MAY-15 07.00.02.462 PM      1am0d1r44m7sx                            0    3,577.977      3,861,047.0        528,738.0
     97978      4 01-JUN-15 07.00.01.451 PM      1am0d1r44m7sx                            0    3,578.352      8,051,298.0        817,057.0
     97979      4 01-JUN-15 08.00.05.179 PM      1am0d1r44m7sx                            1      165.220      2,695,013.0        274,682.0
     98002      3 02-JUN-15 07.00.05.820 PM      1am0d1r44m7sx                            0    3,515.592      6,794,546.0        726,537.0
     98026      7 03-JUN-15 07.00.01.053 PM      1am0d1r44m7sx                            0    3,581.938     11,215,112.0      1,790,826.0
     98027      7 03-JUN-15 08.00.13.693 PM      1am0d1r44m7sx                            1    2,573.582     14,189,836.0      2,530,507.0
     98050      1 04-JUN-15 07.00.04.722 PM      1am0d1r44m7sx                            0    3,572.691      5,255,766.0        736,876.0
     98051      1 04-JUN-15 08.00.06.604 PM      1am0d1r44m7sx                            1    1,801.395     10,407,772.0      2,162,490.0
     98074      7 05-JUN-15 07.00.05.143 PM      1am0d1r44m7sx                            0    3,566.342      3,585,895.0        405,328.0
     98075      7 05-JUN-15 08.00.00.392 PM      1am0d1r44m7sx                            0    3,591.525     30,546,458.0      3,151,443.0
     98076      7 05-JUN-15 09.00.07.352 PM      1am0d1r44m7sx                            1    1,492.171     19,251,729.0      1,558,493.0
     98146      3 08-JUN-15 07.00.01.496 PM      1am0d1r44m7sx                            0    3,530.138      9,209,187.0        652,218.0
     98170      7 09-JUN-15 07.00.02.540 PM      1am0d1r44m7sx                            0    3,538.753      5,187,425.0        481,987.0
     98171      7 09-JUN-15 08.00.08.762 PM      1am0d1r44m7sx                            1      996.519      9,047,047.0        802,864.0
     98194      6 10-JUN-15 07.00.07.800 PM      1am0d1r44m7sx                            0    3,577.875      5,266,386.0        408,865.0
     98195      6 10-JUN-15 08.00.04.068 PM      1am0d1r44m7sx                            0    3,572.545      9,247,723.0      2,239,318.0
     98196      6 10-JUN-15 09.00.02.624 PM      1am0d1r44m7sx                            0    3,493.045     11,131,196.0      4,195,213.0
     98197      6 10-JUN-15 10.00.16.671 PM      1am0d1r44m7sx                            0    3,662.746     18,616,721.0      4,365,932.0
     98198      6 10-JUN-15 11.00.13.227 PM      1am0d1r44m7sx                            0    3,610.344     21,346,638.0      5,733,209.0
     98199      6 11-JUN-15 12.00.08.600 AM      1am0d1r44m7sx                            0    3,648.539      8,701,577.0      2,714,068.0
     98200      6 11-JUN-15 01.00.11.149 AM      1am0d1r44m7sx                            0    3,519.821     17,616,343.0      4,880,834.0
     98201      6 11-JUN-15 02.00.03.849 AM      1am0d1r44m7sx                            0    3,634.058     18,767,326.0      4,584,485.0
     98202      6 11-JUN-15 03.00.03.191 AM      1am0d1r44m7sx                            0    3,596.024     14,784,489.0      4,191,209.0
     98203      6 11-JUN-15 04.00.06.450 AM      1am0d1r44m7sx                            0    3,573.926     17,111,846.0      5,648,559.0
     98204      6 11-JUN-15 05.00.10.709 AM      1am0d1r44m7sx                            0    3,649.037     16,125,390.0      4,559,875.0
     98205      6 11-JUN-15 06.00.02.222 AM      1am0d1r44m7sx                            0    3,627.405     23,724,739.0      5,026,544.0
     98206      6 11-JUN-15 07.00.03.859 AM      1am0d1r44m7sx                            0    3,584.408     22,462,507.0      5,036,174.0
     98207      6 11-JUN-15 08.00.02.770 AM      1am0d1r44m7sx                            0    3,499.634     21,113,112.0      5,622,234.0
     98208      6 11-JUN-15 09.00.07.852 AM      1am0d1r44m7sx                            0    3,622.151     19,325,484.0      8,609,591.0
48 rows selected.

Checked out the plan and it was indeed doing the full scan on one single partition. Interestingly, there was no plan change and the plan was indeed using hash value 1961169265. The query result shows the majority of execution time was below 3,000 seconds before May 25th. After May 25th, the query consistently took more than a few hours, in the worst cases, days without finishing. Both logical IO and physical IO after May 25th are 4~8 times higher than normal period before May 25th.

Client confirmed there is no data volume change during this period. So the sudden data volume jump is ruled out. Thing became interesting here.

My next suspicion is whether there are other DML activities against the same table. Client said yes and there is always insert and delete activities against this table. But the same DML activities also happened before May 25th, why the performance is different right now. Fair statement. Quickly wrote a query sql_text_info.sql and used it to search all SQLs against this table.

 
SQL> @sql_text_info
Enter value for sql_text: %MYDB_STAGING%

interval time    INST SQL_ID         HASH_VALUE      EXECS      ETIME   AVG_ETIME           LIO        AVG_LIO SQL_TEXT
---------------- ---- ------------- ----------- ---------- ---------- ----------- ------------- -------------- ------------------------------------
. . . .
jun-02-15 01:00     5 1j601hg74p2qh                  1,657   95.29188        .058     14,762616        8,909.2 INSERT INTO MDDBO.MYDB_STAGING(S
jun-01-15 21:00     2 1j601hg74p2qh                    988 110.640511        .112     15,292134       15,477.9 INSERT INTO MDDBO.MYDB_STAGING(S
jun-01-15 21:00     2 bgvcuszpxp1x9                    988 110.996902        .112     15,304010       15,489.9 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
jun-01-15 20:00     4 buwna1hmvy2a0  2175852021          1 153.140902     153.141      1,825674    1,825,674.0 UPDATE MYDB_STAGING SET IS_LATES
jun-01-15 20:00     4 1am0d1r44m7sx  1961169265          1 165.219891     165.220      2,695013    2,695,013.0 UPDATE MYDB_STAGING SET IS_PROCE
jun-01-15 19:00     5 bgvcuszpxp1x9           0     11,308 186.595527        .017      1,845134          163.2 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
jun-01-15 19:00     5 1j601hg74p2qh                 11,309 183.916696        .016      1,709844          151.2 INSERT INTO MDDBO.MYDB_STAGING(S
jun-01-15 19:00     2 1j601hg74p2qh                  9,494 210.490704        .022      8,176275          861.2 INSERT INTO MDDBO.MYDB_STAGING(S
jun-01-15 19:00     2 bgvcuszpxp1x9                  9,491 212.032152        .022      8,289925          873.5 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
jun-01-15 16:00     8 1j601hg74p2qh                  2,843 123.610869        .043     20,252536        7,123.6 INSERT INTO MDDBO.MYDB_STAGING(S
jun-01-15 16:00     1 1j601hg74p2qh                  5,990 211.389932        .035     31,781333        5,305.7 INSERT INTO MDDBO.MYDB_STAGING(S
jun-01-15 16:00     8 bgvcuszpxp1x9                  2,838   123.9323        .044     20,155222        7,101.9 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
jun-01-15 16:00     1 bgvcuszpxp1x9                  5,981 212.434201        .036     31,765722        5,311.1 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
. . . .
may-25-15 22:00     7 1j601hg74p2qh                  1,498 105.318346        .070     17,110206       11,422.0 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 22:00     8 1j601hg74p2qh                    500  51.938971        .104      8,436275       16,872.6 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 22:00     8 bgvcuszpxp1x9                    500  52.122208        .104      8,442285       16,884.6 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-25-15 22:00     4 1j601hg74p2qh                  1,000  96.772358        .097     16,339154       16,339.2 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 22:00     4 bgvcuszpxp1x9                    994  96.550291        .097     16,243633       16,341.7 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-25-15 22:00     6 bgvcuszpxp1x9                    505  52.105969        .103      8,401898       16,637.4 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-25-15 22:00     6 1j601hg74p2qh                    505  51.917207        .103      8,395818       16,625.4 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 19:00     1 1am0d1r44m7sx  1961169265          1 366.762256     366.762      2,275088    2,275,088.0 UPDATE MYDB_STAGING SET IS_PROCE
may-25-15 13:00     6 bgvcuszpxp1x9           0     33,406  1444.6719        .043    256,909395        7,690.5 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-25-15 13:00     6 1j601hg74p2qh                 33,476 1439.05292        .043    258,297516        7,715.9 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 13:00     3 bgvcuszpxp1x9                 48,792 2208.19674        .045    387,805213        7,948.1 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-25-15 13:00     3 1j601hg74p2qh                 48,917 2205.34858        .045    389,636787        7,965.3 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 13:00     7 1j601hg74p2qh                 36,201 1801.23233        .050    321,210434        8,873.0 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 13:00     7 bgvcuszpxp1x9                 36,155 1807.59592        .050    320,840251        8,874.0 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-25-15 13:00     2 1j601hg74p2qh                 37,718 1763.97237        .047    288,696415        7,654.1 INSERT INTO MDDBO.MYDB_STAGING(S
may-25-15 13:00     2 bgvcuszpxp1x9                 37,675 1770.14259        .047    288,827077        7,666.3 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-25-15 13:00     4 1j601hg74p2qh                 32,156 1490.11108        .046    260,436584        8,099.2 INSERT INTO MDDBO.MYDB_STAGING(S
. . .  .
may-15-15 21:00     6 bgvcuszpxp1x9                 11,282 312.503981        .028     46,699981        4,139.3 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 21:00     1 1j601hg74p2qh                 26,218 849.230019        .032    131,299020        5,008.0 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 21:00     1 bgvcuszpxp1x9                 26,131 857.433845        .033    131,200097        5,020.9 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 21:00     2 bgvcuszpxp1x9                 14,772 386.170741        .026     58,074157        3,931.4 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 21:00     4 bgvcuszpxp1x9                 11,474 350.669289        .031     55,183416        4,809.4 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 21:00     4 1j601hg74p2qh                 11,489 348.176292        .030     55,437597        4,825.3 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 21:00     2 1j601hg74p2qh                 14,815 383.870156        .026     58,137060        3,924.2 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 21:00     3 bgvcuszpxp1x9                 17,502 512.293434        .029     79,491976        4,541.9 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 21:00     3 1j601hg74p2qh                 17,517  506.51756        .029     79,329693        4,528.7 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 21:00     8 bgvcuszpxp1x9                  8,593 273.929099        .032     44,014364        5,122.1 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 21:00     8 1j601hg74p2qh                  8,601 271.977545        .032     44,284047        5,148.7 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 19:00     1 1am0d1r44m7sx  1961169265          1 829.346177     829.346      3,731965    3,731,965.0 UPDATE MYDB_STAGING SET IS_PROCE
may-15-15 17:00     3 bgvcuszpxp1x9           0     13,998 154.832559        .011     12,076269          862.7 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 17:00     3 1j601hg74p2qh                  5,999  86.716744        .014      5,305216          884.4 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 14:00     8 bgvcuszpxp1x9                  7,772 208.118833        .027     34,702132        4,465.0 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 14:00     8 1j601hg74p2qh                  7,804 207.315633        .027     35,442987        4,541.6 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 14:00     2 bgvcuszpxp1x9                  6,948 231.212222        .033     36,165505        5,205.2 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 14:00     2 1j601hg74p2qh                  6,968 229.081633        .033     35,870516        5,147.9 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 14:00     4 1j601hg74p2qh                  8,067 207.975898        .026     35,897394        4,449.9 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 14:00     4 bgvcuszpxp1x9                  8,038 209.069809        .026     35,470064        4,412.8 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 14:00     5 bgvcuszpxp1x9                  5,010 146.873536        .029     23,838187        4,758.1 DECLARE V_NM_SUPPLY   VARCHAR2 (400)
may-15-15 14:00     5 1j601hg74p2qh                  5,025 146.944746        .029     23,908322        4,757.9 INSERT INTO MDDBO.MYDB_STAGING(S
may-15-15 14:00     6 bgvcuszpxp1x9                  7,462 217.030479        .029     35,107121        4,704.8 DECLARE V_NM_SUPPLY   VARCHAR2 (400)

As I expected, since May 25th, it did have some insert activities during the same time period between 7pm and 8pm. So I recommended them to kill the current update query, just rerun it during the time when low or no insert activity. Unsurprisingly, the query completed within the same timeframe it usually completed. Case closed. To find out the scripts used in this blog, check them out on Scripts.

For the past two months, I have some other interesting activities.
1) Both Randy and I helped our client, one of the largest banks in the world, to successfully migrate their Oracle database to Exadata platform. It was the largest oracle database migration in the bank history and involved many development teams and operation teams. It was one of the most challenge project I have ever done. Hardly slept for the first week. Almost totally exhausted after the system was stabilized. Glad the database is performing well on Exadata right now.
2) I attended our E4 event early June. I have been to every E4 event since the event was launched three years ago. The event becomes bigger and bigger, and this year is the first time under Accenture brand. It is a great event for meeting so many top database experts and chatting with friends and colleagues.
3) Finally, I joined Accenture team in New York and run JP Morgan Chase Corporate Challenge event at Central Park with other 20,000 runners on June 4th. It was fun 5K run and I really enjoy the run. If I am still around New York same time next year, I am definitely going to participate the event again.

Here are some photos from the activities.
Celebration after the Success of the Migration
IMG_1101

The First Day at E4 2015
IMG_1091

My Run at JP Morgan Chase Corporate Challenge
JPMC_CC_2015_WeidongZhou

Advertisements

AWR is not Enough to Track Down IO Problem on Exadata

Recently we run into an interesting performance issue at one of our clients. They reported significant slow down in the system during the day time for some time. Their X2 Exadata does not host many databases, with three production databases and a few test/QA databases. Out of the three production databases, let me give a fake name (CRDB), is the most important and critical one, which is mainly OLTP and have some reporting activities. The other two production databases are tiny and less important databases with not much activities. In other words, the majority of db activities happens at CRDB database.

The slow down is mysterious and randomly during the day and did not seem to follow a pattern. When the slow down happens, the active sessions at CRDB shot up from average 2~4 to at least 30, sometimes reach to 100. At the same time, there were massive slow down in all other databases on the same Exadata. To track down the issue, we requested a few AWR reports from the client for the CRDB database.

In the AWR report, the top event is cell smart table scan. For example, on db node 1 alone, from 10:30am to 2:30pm on August 6, 2013, the total waits for cell smart table scan was 574, 218 with average wait time of 106 ms, equal to 60,980 seconds of DB time, in other words, 78.20% of DB time. Other than that, AWR report did not tell anything useful that should cause performance issue. Therefore, we  mainly focused on IO and smart scan related operation.

On Exadata, cell smart table scan is the scan on cell storage for large IOs. Many offloading activities involve cell smart table scan. As client said there was no application code change recently and data volume remains similar in the past, we assume the total logical and physical IO for CRDB database should be at the similar level as before. They also mentioned that the performance issue began a few days after Exadata patch.

Their resource plan gives  50% allocation to CRDB, and another 50% to the rest of databases with objective of LOW_LATENCY. This resource plan has been in production for over a year and managed quite well in IO.
Luckily during the time we were tracking down the issue, I was attending our company‘s 2013 Exadata conference (E4) . It’s an excellent Exadata and Bigdata specific conference, not only it has many great Exadata/Big Data speakers, but also many Exadata experts/users around the world. I happened to listen one session about IORM presented by Oracle’s Sue Lee. She is a Director of Development in the Oracle RDBMS division and responsible for the Oracle Resource Manager. Her presentation about IORM was an excellent one. She will also give a similar session at OpenWorld. If you are working on Exadata and attend OpenWorld this year, I highly recommended to attend her session, Using Resource Manager for Database Consolidation with Oracle Database 12c (Session ID: CON8884)  and you will gain in-depth knowledge about IORM and Exadata internals.

During her presentation, she mentioned a tool, called iorm_metrics script (Doc ID 1337265.1), and said this tools is frequently used in her group to track down IORM performance problems. It sounds interesting, so I immediately downloaded the script and asked our client run the followings on all of cell nodes.

./metric_iorm.pl “where collectionTime > ‘2013-08-06T11:30:00-06:00’ and collectionTime < ‘2013-08-06T13:20:00-06:00′” > metric_iorm_`hostname`.log

The result was quite interesting and I used one snapshot from cell node 1 as example:

 
Time: 2013-08-06T12:33:03-05:00
Database: CRDB
Utilization:     Small=4%    Large=7%
Flash Cache:     IOPS=226
Disk Throughput: MBPS=120
Small I/O's:     IOPS=258    Avg qtime=1.1ms
Large I/O's:     IOPS=114    Avg qtime=3026ms
	Consumer Group: _ORACLE_MEDPRIBG_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.1
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.3    Avg qtime=0.0ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: OTHER_GROUPS
	Utilization:     Small=0%    Large=7%
	Flash Cache:     IOPS=150
	Disk Throughput: MBPS=116
	Small I/O's:     IOPS=1.8    Avg qtime=0.8ms
	Large I/O's:     IOPS=112    Avg qtime=3077ms
	Consumer Group: _ORACLE_LOWPRIBG_GROUP_
	Utilization:     Small=1%    Large=0%
	Flash Cache:     IOPS=23.8
	Disk Throughput: MBPS=1
	Small I/O's:     IOPS=82.1    Avg qtime=2.7ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: _ORACLE_BACKGROUND_GROUP_
	Utilization:     Small=3%    Large=0%
	Flash Cache:     IOPS=51.6
	Disk Throughput: MBPS=2
	Small I/O's:     IOPS=174    Avg qtime=0.3ms
	Large I/O's:     IOPS=1.9    Avg qtime=66.1ms

Database: _OTHER_DATABASE_
Utilization:     Small=10%    Large=9%
Flash Cache:     IOPS=89.7
Disk Throughput: MBPS=142
Small I/O's:     IOPS=504    Avg qtime=0.5ms
Large I/O's:     IOPS=134    Avg qtime=4137ms
	Consumer Group: _ORACLE_MEDPRIBG_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=0.1
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=0.4    Avg qtime=0.2ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: OTHER_GROUPS
	Utilization:     Small=0%    Large=9%
	Flash Cache:     IOPS=42.8
	Disk Throughput: MBPS=139
	Small I/O's:     IOPS=0.8    Avg qtime=0.6ms
	Large I/O's:     IOPS=134    Avg qtime=4138ms
	Consumer Group: _ORACLE_LOWPRIBG_GROUP_
	Utilization:     Small=0%    Large=0%
	Flash Cache:     IOPS=16.9
	Disk Throughput: MBPS=0
	Small I/O's:     IOPS=56.2    Avg qtime=3.5ms
	Large I/O's:     IOPS=0.0    Avg qtime=0.0ms
	Consumer Group: _ORACLE_BACKGROUND_GROUP_
	Utilization:     Small=9%    Large=0%
	Flash Cache:     IOPS=29.8
	Disk Throughput: MBPS=2
	Small I/O's:     IOPS=447    Avg qtime=0.2ms
	Large I/O's:     IOPS=0.0    Avg qtime=53.0ms

CELL METRICS SUMMARY

Cell Total Utilization:     Small=14%    Large=16%
Cell Total Flash Cache:     IOPS=315.7
Cell Total Disk Throughput: MBPS=249.454
Cell Total Small I/O's:     IOPS=762.2
Cell Total Large I/O's:     IOPS=245.6
Cell Avg small read latency:  11.24 ms
Cell Avg small write latency: 2.64 ms
Cell Avg large read latency:  13.35 ms
Cell Avg large write latency: 4.47 ms

From the above result, we can see the average queue time for every large IO (cell smart scan) was over 3, 000 ms for CRDB resource group and over 4, 000 ms for OTHER_DATABASE. The normal range should be < 30ms. The throughput for OTHER_DATABASE was 142 MB/second while CRDB was 120 MB/second. This indicates the saturated disk I/O.

It’s possible CRDB’s large amount IO caused OTHER_DATABASE to slow down, but Disk Throughpt from OTHER_DATABASE should be small and wait time could be longer. On the contrary, the high IO Throughput from OTHER_DATABASE indicates something not right. So I zoomed into other databases, and compare the IO throughput between databases.

The following charts shows the throughput for CRDB database was around 300MB/second before 3:30pm.
iorm1308_io_CRDB
At the same time, another much smaller and less used database, the throughput was much higher than CRDB database. The throughput was mostly between 500MB to 600MB/second, with some peak to over 1200MB/second at a few times.

iorm1308_io_otherDB

It is normal to see CRDB has a lot of large IOs during the day as it is the major database and there are many activities against this database. However it is unusual to see a tiny small database took significant amount of large IOs. This inspired me to investigate more on this small database.

From 12c OEM’s SQL Monitor screen, we can see a lot of queries with SQL ID d0af9yxrrrvy5 with many IOs, over 80GB for each execution and long running time. There  were  multiple instances for the same query executed during the same timeframe  the slow down happened.

iorm1308_slow_query

The 12c Cloud Control OEM also shows the Average Throttle Time for Disk I/Os for both CRDB and OTHER_DATABASE shot up to 1,500~2,000 milliseconds on August 6 and 7 afternoon. This was exact the time the query was executing in one of OTHER_DATABASE.

iorm1308_disk_throttle_time

After this query was shutdown, system returned to normal.  So the slow down in CRDB database was not the cause of the problem, but the victim of IO throttle caused by other database.

It seems the end of story for the problem. Actually not yet.

Although the system looks normal, there were two queries running about 2~3 times slower than in the past. Our Enkitec‘s awr_plan_change.sql shows almost identical LIO and PIO for each execution, but the timing changes significantly since the patch date of July 15. It’s another interesting issue.

SYS@CRDB1> @awr_plan_change
Enter value for sql_id: 4zbfxnv733dzb
Enter value for instance_number:
SNAP_ID   NODE BEGIN_INTERVAL_TIME		 SQL_ID    EXECS AVG_ETIME	AVG_LIO       AVG_PIO
---------- ------ -------------------------------- ----- --------  ------------   ------------ -------------- --------------
9700	1 28-MAY-13 07.00.37.592 AM	 4zbfxnv733dzb   1	 288.559   55,876,213.0   55,875,801.0
9748	2 29-MAY-13 07.00.16.396 AM	 4zbfxnv733dzb   1	 334.543   55,876,213.0   55,875,801.0
9796	3 30-MAY-13 07.00.01.330 AM	 4zbfxnv733dzb	 1	 315.035   55,876,333.0   55,875,801.0
....
....
11956	3 14-JUL-13 07.00.09.104 AM	 4zbfxnv733dzb	 1	 258.629   55,876,269.0   55,875,804.0
12000	2 15-JUL-13 07.30.03.260 AM	 4zbfxnv733dzb	 1 1,549.712   43,115,159.0   43,107,149.0
12001	2 15-JUL-13 08.00.11.035 AM	 4zbfxnv733dzb	 0	 993.135   12,778,387.0   12,768,812.0
12047	1 16-JUL-13 07.00.11.740 AM	 4zbfxnv733dzb	 1	 565.923   55,876,638.0   55,875,801.0
12096	1 17-JUL-13 07.30.33.069 AM	 4zbfxnv733dzb	 1 1,148.289   55,878,883.0   55,875,923.0
12143	3 18-JUL-13 07.00.10.648 AM	 4zbfxnv733dzb	 1	 567.586   55,876,013.0   55,875,803.0
....
....
13057	1 06-AUG-13 07.00.12.679 AM	 4zbfxnv733dzb	 1	 645.235   55,876,538.0   55,875,821.0
13105	2 07-AUG-13 07.00.03.051 AM	 4zbfxnv733dzb	 1	 986.482   55,877,223.0   55,875,823.0
13153	3 08-AUG-13 07.00.23.646 AM	 4zbfxnv733dzb	 1	 587.454   55,875,957.0   55,875,801.0
13201	1 09-AUG-13 07.00.27.502 AM	 4zbfxnv733dzb	 1	 594.734   55,876,423.0   55,875,801.0
13249	3 10-AUG-13 07.00.48.524 AM	 4zbfxnv733dzb	 1	 515.732   55,877,880.0   55,875,801.0
13297	1 11-AUG-13 07.00.09.256 AM	 4zbfxnv733dzb	 1	 477.941   55,875,965.0   55,875,802.0

So I compare AWR reports one before the patch and another after the patch during the same period (7am~8:30am) when both queries run. Although there are some difference here and there, I focus on more about the difference on IO wait, especially large scan wait.

iorm1308_awr_compare_wait_events

From the above chart, we can see the 2nd line, cell smart table scan, avg wait time jump from 7ms to 38ms and total wait time jump from 2,083 seconds to 3,000 seconds. At the same time, the average Wait Time drop 24% for cell single block physical read, which is small IO. At this moment, it seems I realized IORM internal handling logic seem change after the patch.

Checking out Oracle Support site, found something interesting related to IORM.
For cell version 11.2.2.2 – 11.2.3.1, IORM is disabled by default. To disable IORM, run “alter iormplan objective=off;” command. 11.2.3.1 is cell version our client used before the patch.

For cell version 11.2.3.2 and above, IORM is enabled by default and use basic objective. To disable IORM, run “alter iormplan objective=basic;”. 11.2.3.2 is our client’s current version.

The above indicates there are some changes involving IORM. Their current objective is low-latency. Low-latency will be good for fast response to OLTP requests, with lower wait time for small IO. At the same time, the large IO is scarified to have longer wait time per request. Another possibility is IORM changes the way to throttle IO. These two queries generate over 110 million physical IO within 10 minutes. I saw throttle wait time for CRDB was between 100~200 ms when these two queries run in the morning.

So the LOW-LATENCY might not be a good objective for the client. I remember Sue Lee recommended to use AUTO objective as best practice. If it doesn’t work, try other next. Obviously, this is something we would like to follow. So the client made just one line change as follows to switch LOW-LATENCY to AUTO objective.

Here is the command to make the change to AUTO.
# dcli -g ~/cell_group -l root ‘cellcli -e alter iormplan objective = auto’

Here is the result after the objective change to AUTO.

SNAP_ID   NODE BEGIN_INTERVAL_TIME		 SQL_ID    EXECS AVG_ETIME	AVG_LIO       AVG_PIO
---------- ------ -------------------------------- ----- --------  ------------   ------------ -------------- --------------
13633	3 18-AUG-13 07.00.21.207 AM	 4zbfxnv733dzb	 1	 471.955   55,875,957.0   55,875,801.0
13681	3 19-AUG-13 07.00.35.982 AM	 4zbfxnv733dzb	 1	 556.433   55,877,846.0   55,875,863.0
13729	3 20-AUG-13 07.00.23.422 AM	 4zbfxnv733dzb	 1	 537.509   55,877,821.0   55,875,861.0
13777	2 21-AUG-13 07.00.43.473 AM	 4zbfxnv733dzb   1	 166.424   55,876,181.0   55,875,806.0
13825	2 22-AUG-13 07.00.06.274 AM	 4zbfxnv733dzb	 1	 175.517   55,875,957.0   55,875,801.0

The result was amazing. Not only there is no negative impact on the system, the two queries running time immediately back to their normal range, even better than before. The query above drops from 500+ seconds to less than 200 seconds.

During a few emails back and forth with Sue Lee about this topic, she added a few good points as follows.

When you look at the disk utilization charts, you should keep in mind that IORM is only actively regulating when you see IOs being throttled. So you should only see that the allocations are “kicking in” when the utilization is pretty high.

If you look at slide 59, you’ll see that with LOW LATENCY objective, we don’t allow the disks to run at their full utilization. This is why you got better results with AUTO objective for throughput-intensive applications like these reports. This is basically what you already said in the attached email.

When I checked out her presentation slide again, of course, it explains everything. For Low-Latency, the Peak Disk Utilization for scans is the lowest, only 40% and next one is Balanced with 90%, High Throughput with 100%. No wonder the disk throttle time is that high.

My colleague, Carlos Sierra, the author of famous SQLT tool, also did excellent analysis for the same issue from SQLT perspective and here is the link to his analysis.

I must say I am very happy to attend E4 conference and uses the knowledge immediately to the real time production issue. In case you didn’t attend, here is one photo I took from the conference. This year we had a larger room than last year’s E4 conference with more people coming. I am sure next year it will have even a larger room for bigger audience.
e4_2013_photo1