Compression Methods on Exadata: Use parallel to speed up compression – Part 3 of 6

parallalData

In the part 1 and part 2 of the blog series, I show the examples to create compress tables using CREATE TABLE and ALTER TABLE MOVE. For small tables in the example, it might be ok not to use the parallelism. But for big tables, parallelism is highly recommended just like the photos above. You have to have many lines on highway to allow high throughput. I wish adding a new line on highway could be as easy as change DOP in SQL. This post discusses parallelism for the compression methods.

As the command will be similar for different kinds of compression method, I pick one HCC method and test the timing for different DOP values. As QUERY LOW gives a lower compression ratio and is very unlikely I am going to use in production, so QUERY LOW is out. Let’s look at ARCHIVE HIGH, although it can give a much better compression ratio, the processing time is usually several times longer than other HCC methods. Given it could also take a longer time to uncompress the data when we need to move data to non-Exadata environment, I will be very unlikely to use this method in the production unless I really need to save significant amount space. So it leaves with QUERY HIGH and ARCHIVE LOW. Both of them are using ZLIB (gzip) compression algorithm. The timing and compression ratios seem quite close from my experience. So either one of them is fine for my test and I choose QUERY HIGH compression method as the base for our test.

I also increase the size of the test table to 80 million rows.

Build Test Data


WZHOU@dbm1> create table paratab_no_parallel as select * from testcp;

Table created.

Elapsed: 00:00:09.17

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;

10000000 rows created.

Elapsed: 00:00:17.32

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;
insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel
                          *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

Elapsed: 00:00:00.00

It seems I need to do the commit first before I can do further insert operation.

WZHOU@dbm1> commit;

Commit complete.

Elapsed: 00:00:00.00

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;

20000000 rows created.

Elapsed: 00:00:24.68

WZHOU@dbm1> commit;

Commit complete.

Elapsed: 00:00:00.01

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;

40000000 rows created.

Elapsed: 00:00:44.70

WZHOU@dbm1> select count(*) from paratab_no_parallel
                     *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

Elapsed: 00:00:00.01

WZHOU@dbm1> commit;

Commit complete.

Elapsed: 00:00:00.02

WZHOU@dbm1> select count(*) from paratab_no_parallel;

  COUNT(*)
----------
  80000000

Elapsed: 00:00:34.27

Let’s find out what is the size for this table.

WZHOU@dbm1> @table_size
Enter value for owner: WZHOU
Enter value for table_name: PARATAB%
Enter value for type:

OWNER  SEGMENT_NAME         TYPE    TOTALSIZE_MEGS TABLESPACE_NAME
------ -------------------- ------- -------------- ---------------
WZHOU  PARATAB_NO_PARALLEL  TABLE          8,824.4 USERS
                                       --------------
sum                                        8,824.4

Elapsed: 00:00:00.09

Then create three more similar tables.

WZHOU@dbm1> create table paratab_dop_2 as select * from paratab_no_parallel;

Table created.

Elapsed: 00:01:09.56

WZHOU@dbm1> create table paratab_dop_4 as select * from paratab_no_parallel;

Table created.

Elapsed: 00:01:08.55

WZHOU@dbm1> create table paratab_dop_8 as select * from paratab_no_parallel;

Table created.

Elapsed: 00:01:08.86

Parallel Tests

WZHOU@dbm1> ALTER TABLE paratab_no_parallel MOVE COMPRESS for query high;

Table altered.

Elapsed: 00:03:32.30

WZHOU@dbm1> ALTER TABLE paratab_dop_2 MOVE COMPRESS for query high parallel 2;

Table altered.

Elapsed: 00:01:48.43

WZHOU@dbm1> ALTER TABLE paratab_dop_4 MOVE COMPRESS for query high parallel 4;

Table altered.

Elapsed: 00:00:57.07

WZHOU@dbm1> ALTER TABLE paratab_dop_8 MOVE COMPRESS for query high parallel 8;

Table altered.

Elapsed: 00:00:29.63

WZHOU@dbm1> @table_size
Enter value for owner: WZHOU
Enter value for table_name: PARATAB%
Enter value for type:

OWNER  SEGMENT_NAME         TYPE    TOTALSIZE_MEGS TABLESPACE_NAME
------ -------------------- ------- -------------- ---------------
WZHOU  PARATAB_DOP_2        TABLE           453.3  USERS
WZHOU  PARATAB_DOP_4        TABLE           454.2  USERS
WZHOU  PARATAB_DOP_8        TABLE           456.7  USERS
WZHOU  PARATAB_NO_PARALLEL  TABLE           464.0  USERS
                                    --------------
sum                                       1,828.2

As the DOP increases, the processing time reduces in a linear fashion. The compression time for a 9G table reduces from three and half minutes to less than half minute with DOP of 8. Another interesting part is that the final compression size is slight different with different DOP value.

Advertisements

One thought on “Compression Methods on Exadata: Use parallel to speed up compression – Part 3 of 6

  1. Pingback: Compression Methods on Exadata: Compression on partition tables – Part 4 of 6 | My Big Data World

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s