Fixed Table STATS Issue after Gathering Fixed Object STATS

Recently we were working on another 130+TB database with 5200+ datafiles. After the database was upgraded from 11.2.0.3 to 11.2.0.4 on a full rack X4 Exadata, I did some verifications and noticed my query (using dba_free_space and dba_data_files) to find out tablespace usage took forever and never completed. I have run the same query on other 100+ TB databases and it could take some time, but never run into the situation that it has never completed.

I pulled out the execution plan for the query and found out the majority of was on fixed table X$KTFBUE. The plan was doing full table scan. At the time I killed the query, the actual row processed was 10 million rows while the estimate total number of rows was 100,000 rows. By the way, 100,000 rows is the default value for X$KTFBUE when no stats is available. Obviously it is related to STATS issue. Then I run the query below to find out whether X$KTFBUE has STATS or not.

select * from dba_tab_statistics where  table_name='X$KTFBUE';

It indeed did not have stats. Maybe we forgot to collect the fixed object stats. Run the queries below and find out we had stats in the majority of fixed tables.

SELECT * FROM DBA_TAB_STATISTICS 
WHERE OBJECT_TYPE='FIXED TABLE'; 

SELECT * FROM DBA_TAB_STATISTICS 
WHERE OBJECT_TYPE='FIXED TABLE' 
and last_analyzed is null;

Maybe we had issue during gathering fixed objects. So I rerun the stats collection shown below for fixed objects.

SQL> exec DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
PL/SQL procedure successfully completed.
Elapsed: 00:05:19.42

It completed successfully in 5 minutes. Checked out X$KTFBUE, still no STATS. This became interesting to me. With a fixed table having at least 10+ million rows and seem to be one of important fixed table to get tablespace sizing information, why is it excluded from fixed object stats collection?

With this question in mind, I did some research. There is a document discussing why STATS is missing for certain fixed tables: Why do some Fixed Tables not have Optimizer Statistics (Doc ID 1355608.1). Here is the answer in the document.

Why do some fixed tables not have statistics even after running: DBMS_STATS.GATHER_FIXED_OBJECTS_STATS?

Some fixed tables are left without stats because the development team has flagged the code to avoid statistics gathering for some fixed tables. This is because they know they will perform better without stats or because it is not supported to gather the statistics upon them.
This was confirmed in : Bug 12844116 fixed tables statistics

The document applies to version 10.2.0.4 to 11.2.0.2. Obviously it is covered version 11.2.0.4 in my case.

Another document also discussed the similar behavior: Query Against DBA_FREE_SPACE is Slow After Applying 11.2.0.4 (Doc ID 1904677.1). The document mentions two major symptoms that simialr to my issue:
1. Slow query from dba_free_space after applying 11.2.0.4
2. The query is doing a Full Tabble Scan (FTS) against X$KTFBUE.

Some fixed tables(X$) have no CBO statistics against them and use defaults; it may be too expensive to gather stats or the tables are too volatile such that any statistics are immediately outdated.

The sys.x$ktfbue table is one such object. These statistical inaccuracies can cause suboptimal plans in some cases.

The document mentioned gathering stats on x$ktfbue table. So I went ahead collecting stats as follows:

EXEC DBMS_STATS.gather_table_stats('SYS', 'X$KTFBUE');

WaitFixedTableStats
As it took only 5 minutes to collect fixed object stats, I thought a couple of minutes or maxmium 10 minutes should complete the stats gathering for X$KTFBUE table. I was so wrong, one hour, two hours passed and stats job was still going. Maybe the gather_table_stats was stuck somewhere. Check out OEM SQL Monitor, I can see the number of rows processed still going up slowly. I was just like the above squirrel patiently waiting for the finish. Finally, after 7.5 hours it completed successfully. The table has about 99 million rows, about 1000 times more than default value of 100,000 rows. Now I understand why this fixed table is excluded from GATHER_FIXED_OBJECTS_STATS. It is just too expensive to collect stats. Majority of time it should not cause an issue. This is the first Exadata database I notices having fixed object issue after running GATHER_FIXED_OBJECTS_STATS.

During the time I waited for the stats gathering completion, I found another Oracle Document: Query against DBA_EXTENTS slow after upgrade to 11.2.0.3 (Doc ID 1453425.1). This applies to database version 11.2.0.3 and later. Here are some part of the note:

SYMPTOMS
Queries on DBA_EXTENTS are much slower due to the presence of a merge join cartesian.
This can also influence the performance of script used to collect upgrade/migrate diagnostics as per instructions in:
Document 556610.1 Script to Collect DB Upgrade/Migrate Diagnostic Information (dbupgdiag.sql).
Running script on version 11.2.0.1 prior to upgrade, there was no performance issue.
The following SQL query from dbupgdiag.sql may run slowly (~2 hours) after applying patchset 11.2.0.3.
Prompt ===========================================
Prompt Tablespace and the owner of the aud$ table
Prompt ===========================================
select owner,tablespace_name from dba_extents where segment_name=’AUD$’ group by
owner,tablespace_name;

CHANGES
An execution plan regression occurs if there are a large number of records in the X$KTFBUE table. Gathering of dictionary or fixed object stats does not improve performance as this table is intentionally omitted from those packages.

CAUSE
The Cost Based Optimizer uses a cardinality estimate of 100,000 rows for this table, and the 11.2.0.3 execution plan is doing a full table scan. If you have a large number of extents, this query can take more than 1 hour to complete. There are a number of unpublished bugs open on slow perfomance for queries on DBA_EXTENTS.
For example, following bugs:
Bug:13542477 ON 11.2.0.3 QUERY DBA_EXTENTS IS SO SLOW
Bug:14221159 JOIN CARDINALITY INCORRECT CALCULATED AFTER FIX 11814428

SOLUTION
Following are possible workarounds to improve the performance
Gathering stats on the X$KTFBUE table using the following command:
EXEC DBMS_STATS.GATHER_TABLE_STATS(‘SYS’, ‘X$KTFBUE’);
OR
alter session set “_optimizer_cartesian_enabled” = false;
OR
alter session set “_smm_auto_cost_enabled” = false;
OR
Disable fix for unpublished Bug Bug 11814428: POOR CARDINALITY ESTIMATE FROM JOIN WITH A UNIONALL VIEW:
alter session set “_FIX_CONTROL” = “11814428:0″;

This document reminded me the similar issue we run into last year when we built the same database for POC purpose. At the time when we tried to upgrade the POC database from 11.2.0.3 to 11.2.0.4. The dbupgdiag.sql script run forever and never finished. Opened SR with Oracle and Oracle Support recommended to comment out the following line in dbupgdiag.sql script as a workaround solution.
select owner,tablespace_name from dba_extents where segment_name=’AUD$’ group by owner,tablespace_name;

Finally after the stats collection completed, run the same query again, the result comes back within 15 seconds. Checked out the execution plan, I can see it does not use full table scan of the X$KTFBUE table, but use an index from this table. Issue solved and life is good.

fixedTableStatsIssue

Additional note:
1. So what exactly X$KTFBUE table is used for? X$KTFBUE is a dynamically generated map of used extents in locally managed tablespace.

The followings are the definitions for the columns in x$ktfbue.

ADDR : N/A
INDX : N/A
INST_ID : N/A
KTFBUESEGTSN : ts# containing this segment
KTFBUESEGFNO : Relative number of the file containing the segment header
KTFBUESEGBNO : segment header block number
KTFBUEEXTNO : Extent number
KTFBUEFNO : Relative number of the file containing the extent
KTFBUEBNO : Starting block number of the extent
KTFBUEBLKS : Size of the extent in ORACLE blocks
KTFBUECTM : commit_jtime,Commit Time of the undo in the extent expressed as Julian date
KTFBUESTT :commit_wtime,Commit Time of the undo in the extent expressed as wall clock time
KTFBUESTA : Transaction Status of the undo in the extent;1, ‘ACTIVE’, 2, ‘EXPIRED’, 3, ‘UNEXPIRED’, 0 for non‐undo

2. If looked at the execution plan, even we fixed the stats issue for X$KTFBUE table, we might still run into issue related to recyclebin. The plan shows nested loop involving recyclebin. Here is the DBA_FREE_SPACE looks like.

SQL> set long 90000;
SQL> set pages 999;
SQL> select dbms_metadata.get_ddl('VIEW','DBA_FREE_SPACE') FROM DUAL;

DBMS_METADATA.GET_DDL('VIEW','DBA_FREE_SPACE')
--------------------------------------------------------------------------------
  CREATE OR REPLACE FORCE VIEW "SYS"."DBA_FREE_SPACE" ("TABLESPACE_NAME", "FILE_
ID", "BLOCK_ID", "BYTES", "BLOCKS", "RELATIVE_FNO") AS
  select ts.name, fi.file#, f.block#,
       f.length * ts.blocksize, f.length, f.file#
from sys.ts$ ts, sys.fet$ f, sys.file$ fi
where ts.ts# = f.ts#
  and f.ts# = fi.ts#
  and f.file# = fi.relfile#
  and ts.bitmapped = 0
union all
select
       ts.name, fi.file#, f.ktfbfebno,
       f.ktfbfeblks * ts.blocksize, f.ktfbfeblks, f.ktfbfefno
from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
where ts.ts# = f.ktfbfetsn
  and f.ktfbfetsn = fi.ts#
  and f.ktfbfefno = fi.relfile#
  and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0
union all
select
       ts.name, fi.file#, u.ktfbuebno,
       u.ktfbueblks * ts.blocksize, u.ktfbueblks, u.ktfbuefno
from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
where ts.ts# = rb.ts#
  and rb.ts# = fi.ts#
  and u.ktfbuefno = fi.relfile#
  and u.ktfbuesegtsn = rb.ts#
  and u.ktfbuesegfno = rb.file#
  and u.ktfbuesegbno = rb.block#
  and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0
union all
select ts.name, fi.file#, u.block#,
       u.length * ts.blocksize, u.length, u.file#
from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
where ts.ts# = u.ts#
  and u.ts# = fi.ts#
  and u.segfile# = fi.relfile#
  and u.ts# = rb.ts#
  and u.segfile# = rb.file#
  and u.segblock# = rb.block#
  and ts.bitmapped = 0

Right now our database was just built and recyclebin was empty. If there are a lot of rows in the recyclebin, I will not be surprised the performance could be bad again. If it happens, one possible solution is to purge the recyclebin.

3. Another interesting part is that the total number of rows in X$KTFBUE table was about 99 million rows while the total number of extents in the database was also around 99 million rows. So reducing the total number of extent in the database seem a good way to reduce this stats issue. Pay attention to the INITIAL and NEXT in the storage clause. Setting too small will have a large number of extents and create headache in querying fixed tables.

Moving STATS for Large Database during Exadata Migration

I have done many Exadata migrations for the past two and half years. For the majority of migrations, my clients have enough downtime to complete the statistic gathering during the cutover window. However, this strategy is not going to work for a large database. Recently we migrated a 110+ TB database to Exadata. There were many chanllenges involved in this migration, like STATS gathering, tablespace reaching 32TB limit and 1022 datafile limit, upgrade from 11.2.0.3 to 11.2.0.4 during the cutover window and critial query tuning with only a few weeks before cutover. Anyway, there were many moving parts during the migration. In this blog, I am focusing only on STATS strategy during the cutover window.

This database had issue in gathering stats in the past and run for days without completion in the legacy environment. As the stats gathering had significant impact on the database performance, the client had to turn off the stats gathering and lock the stats. So many tables’ stats were at least two years old. With the database moving from non-Exadata 11.2.0.3 environment to Exadata 11.2.0.4 environment, we need to have the new stats available before the database can be released to business users. The question was how we were going to do it? Just like the drawing below, we need to find the fastest window to get our work done.

fast_stats

Both UAT and PROD environments are using X4 full rack Exadata. Even with eight database servers, the full stats gathering with 384 parallism could still take a few days. Even incremental stats gathering could take 10+ hours. It is definitely not going to work during cutover window with limited downtime available.

For this migration, we used Data Guard and our physical standby on Exadata was exact physical copy of the legacy primary database. We also built the UAT database with RMAN restore and last build of the UAT was just two weeks away from the cutover date. So we use the following strategy to work around the stats gathering problem during cutover window.
1.  Gather stats on UAT database. It doesn’t matter whether it takes one day or two days. As long as it is outside the cutover window, we were fine with that.
2. Copy the STATS in UAT database to STATS staging tables.
3. Export the staging tables.
4. Copy the export data pump files to PROD database on Exadata
5. Import the stats to PROD DB.

Step 1 to 4 do not have downtime requirement and can be done ahead of time. Only the last step needs to be performed during the cutover window. In this blog, I am not going to discuss Step 1 and Step 4, and only on exporting and importing the stats. I used our lab to demonstrate the steps and scripts involved.

Export Stats from Source Database

cd /dbfs/work/wzhou/test_stat

sqlplus / as sysdba
create user db_my_stats identified by db_my_stats default tablespace users temporary tablespace temp; 
grant connect, resource, dba to db_my_stats;  
create directory DB_MY_STATS_DIR as '/dbfs/work/wzhou/test_stat'; 
grant read, write on directory DB_MY_STATS_DIR to system;

Generate necessary sql scripts.
vi gen_exp_imp_stats.sql

-- Generate all the necessary scripts for export and import schema stats
--set time on
--set timing on
set echo off
set feedback off
set lines 150
set pages 0

prompt
prompt generate SQL to create schema stat table
prompt
spool CREATE_SCHEMA_STATS_TABLE.SQL
select 'set timing on' from dual;
select 'spool CREATE_SCHEMA_STATS_TABLE.log' from dual;
select 'prompt '||owner||chr(10)||
  'exec DBMS_STATS.CREATE_STAT_TABLE(''DB_MY_STATS'', ''STATS_' || owner || ''');' content
  from dba_segments
 where owner not in ( 'DB_MY_STATS', 'ANONYMOUS','APPQOSSYS','DBSNMP','ORACLE_OCM','OUTLN','SYS','SYSTEM','WMSYS','XDB','XS$NULL', 'ORDSYS','ORDDATA')
group by owner
order by owner;
select 'spool off' from dual;
spool off

prompt
prompt -- generate SQL to export schema stat to stat table
prompt
spool EXPORT_TO_SCHEMA_STATS_TABLE.SQL
select 'set timing on' from dual;
select 'spool EXPORT_TO_SCHEMA_STATS_TABLE.log' from dual;
select 'prompt '||owner||chr(10)||
  'exec DBMS_STATS.EXPORT_SCHEMA_STATS( ' ||
  'ownname => ''' || owner || ''', ' ||
  'stattab => ''STATS_' || owner || ''', ' ||
  'statid => ''' || owner || ''', ' ||
  'statown => ''DB_MY_STATS'');' content
  from dba_segments
 where owner not in ( 'DB_MY_STATS', 'ANONYMOUS','APPQOSSYS','DBSNMP','ORACLE_OCM','OUTLN','SYS','SYSTEM','WMSYS','XDB','XS$NULL', 'ORDSYS','ORDDATA')
group by owner
order by owner;
select 'spool off' from dual;
spool off


prompt
prompt -- generate SQL to import schema stat to stat table
prompt
spool IMPORT_TO_SCHEMA_STATS_TABLE.SQL
select 'set timing on' from dual;
select 'spool IMPORT_TO_SCHEMA_STATS_TABLE.log' from dual;
select 'prompt '|| owner || chr(10) ||
  --'--exec dbms_stats.delete_schema_stats (ownname => '''||owner||''');' || chr(10)||
  'exec DBMS_STATS.IMPORT_SCHEMA_STATS( ' ||
  'ownname => ''' || owner || ''', ' ||
  'stattab => ''STATS_' || owner || ''', ' ||
  'statid => ''' || owner || ''', ' ||
  'statown => ''DB_MY_STATS'');' content
  from dba_segments
 where owner not in ( 'DB_MY_STATS', 'ANONYMOUS','APPQOSSYS','DBSNMP','ORACLE_OCM','OUTLN','SYS','SYSTEM','WMSYS','XDB','XS$NULL', 'ORDSYS','ORDDATA')
group by owner
order by owner;
select 'spool off' from dual;

Run the gen script.
@gen_exp_imp_stats

Create stats staging table.
@CREATE_SCHEMA_STATS_TABLE.SQL

Export schema stats to the stats staging table
@EXPORT_TO_SCHEMA_STATS_TABLE.SQL

Prepare the export script
vi run_exp_stats
expdp parfile=exp_schema_stats.par

vi exp_schema_stats.par
USERID=db_my_stats/db_my_stats
PARALLEL=4
DIRECTORY=DB_MY_STATS_DIR
DUMPFILE=stats_%U.dmp
LOGFILE=stats_exp.log
METRICS=Y
FILESIZE=4G
SCHEMAS=DB_MY_STATS
JOB_NAME=stats_exp

Run the script to export the schema
run_exp_stats

Ok, at this moment, the stats export is done. Copy the datapump file to target system.

Import Stats to Target Database
On target database, do the similar step as above.

create user db_my_stats identified by db_my_stats default tablespace users temporary tablespace temp; 
grant connect, resource, dba to db_my_stats;  
create directory DB_MY_STATS_DIR as '/dbfs/work/wzhou/test_stat'; 
grant read, write on directory DB_MY_STATS_DIR to system;

Prepare the import script.
vi run_imp_stats
impdp parfile=imp_schema_stats.par

vi imp_schema_stats.par
USERID=db_my_stats/db_my_stats
PARALLEL=4
DIRECTORY=DB_MY_STATS_DIR
DUMPFILE=stats_%U.dmp
LOGFILE=stats_imp.log
METRICS=Y
TABLE_EXISTS_ACTION=REPLACE
SCHEMAS=DB_MY_STATS
JOB_NAME=stats_imp

Run the script to import the schema
run_imp_stats

Import the stats from stats staging tables.
@IMPORT_TO_SCHEMA_STATS_TABLE.SQL

Ok, we’re done with the stats import.

The above strategy worked pretty well during the cutover window. The export of STATS took about half hour and the import of STATS took slight less time than the export. We are happy the STATS work does not take significant amount of time for this 110+ TB databases during the cutover.