Mysterious Time Gap in Spark Job Timeline

Sometime ago one of my clients asked me a question when reviewing a Spark job: why there is a time gap in the event timeline, sometimes can be as long as one minute. If there are a few seconds, it seems make sense it could relate to Spark’s overhead between each job run. But for one minute, it seem to be too long for any overhead activities because the whole job takes only 8~9 minutes. I didn’t have a good answer for the question. Recently I did some benchmark for a spark job on a X3 full rack Oracle BDA in our lab, I did notice the same behavior. I tracked down the issue and finally figured out the cause of the timeline gap. I am going to share my findings in this blog.

My benchmark is on an X3 full rack BDA with Spark version 1.6 and CDH 5.7. The spark testing script is a pretty simple one and important lines related to this timeline gap are listed as follows:

line 42: val myDF = hiveContext.sql(“select * from wzdb.sales_part “)
line 44: myDF.show()
line 47: val rawDF = hiveContext.sql(“select * from wzdb.sales_raw limit 100 “)
line 48: rawDF.show()

Line 42 is pulling all data from wzdb.sales_part table, which is a hive partition table using Parquet and SNAPPY compression. The table has about 1.3 billion rows and 1,680 partitions. Line 44 just show the DataFrame myDF, by default it shows 20 rows. Similarly line 47 pull 100 rows from wzdb.sales_raw table and line 48 show the first 20 rows from the table. Ok, the code can not be simpler than that.

After started the spark job, it finished in 40 seconds. However, when I checked out the Event Timeline, it shows there is a time gap between Job (or stage) Id 1 and Job Id 2. Job Id 1 started at 18:13:24 and completed at 18:13:26. But the Job Id 2 started at 18:13:35 and there was 9 seconds time gap, about 25% of total execution time. 25% seems a lot to me. Job Id 1 executed the line 42 while Job Id 2 executed the line 44. There is no execution code at line 43. Things become interesting.

spark_event_timeline_1

Then I checked out Executors page. It shows there are two Executors and each took about 6~7 seconds tasks time. Then I click the link to stdout Logs for each executor. I paid more attention to the timeline between 18:13:24 and 18:13:35.

spark_executors_2

Here are the part of output from Executor 1:

16/12/02 18:13:14 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 27032@enkbda1node10.enkitec.com
16/12/02 18:13:14 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/12/02 18:13:16 INFO spark.SecurityManager: Changing view acls to: yarn,oracle
16/12/02 18:13:17 INFO spark.SecurityManager: Changing modify acls to: yarn,oracle
16/12/02 18:13:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, oracle); users with modify permissions: Set(yarn, oracle)
16/12/02 18:13:18 INFO spark.SecurityManager: Changing view acls to: yarn,oracle
16/12/02 18:13:18 INFO spark.SecurityManager: Changing modify acls to: yarn,oracle
16/12/02 18:13:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, oracle); users with modify permissions: Set(yarn, oracle)
16/12/02 18:13:18 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/12/02 18:13:18 INFO Remoting: Starting remoting
16/12/02 18:13:19 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@enkbda1node10.enkitec.com:38184]
16/12/02 18:13:19 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutorActorSystem@enkbda1node10.enkitec.com:38184]
16/12/02 18:13:19 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 38184.
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-7af60bfd-9e27-45c8-bf30-a4bf126681f0
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u11/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-1c6099f9-37f2-4b4e-8b60-5c209bffc924
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u10/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-ec991001-9dc1-4017-ba55-314b54dd9109
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-c0df794d-5ee6-46fe-ad57-cf6186cd5ba7
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u08/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-673f1d0b-7e44-47e7-b36e-eed65c656c87
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u07/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-ab27950b-7dfd-48eb-a33c-7fbd02d29137
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u06/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-fe6697c4-d64a-47b3-9781-27d583370710
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u05/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-e0a928ab-5895-46b6-8b10-ad883e895632
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u04/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-4d39319c-b6be-4a17-8755-89477f81e899
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u03/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-cfd8fd9c-22cd-443f-8a1d-99b9867c8507
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u02/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-fff46796-d06a-45af-816b-c46d356be447
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u01/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-6cf05120-f651-4615-8abe-14631c5aadb1
16/12/02 18:13:19 INFO storage.MemoryStore: MemoryStore started with capacity 530.0 MB
16/12/02 18:13:19 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@192.168.12.105:33339
16/12/02 18:13:19 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
16/12/02 18:13:19 INFO executor.Executor: Starting executor ID 2 on host enkbda1node10.enkitec.com
16/12/02 18:13:19 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45880.
16/12/02 18:13:19 INFO netty.NettyBlockTransferService: Server created on 45880
16/12/02 18:13:19 INFO storage.BlockManager: external shuffle service port = 7337
16/12/02 18:13:19 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/12/02 18:13:19 INFO storage.BlockManagerMaster: Registered BlockManager
16/12/02 18:13:19 INFO storage.BlockManager: Registering executor with local external shuffle service.
16/12/02 18:13:19 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
16/12/02 18:13:19 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
16/12/02 18:13:20 INFO executor.Executor: Fetching spark://192.168.12.105:33339/jars/scalawztest1_2.10-1.0.jar with timestamp 1480723975104
16/12/02 18:13:20 INFO util.Utils: Fetching spark://192.168.12.105:33339/jars/scalawztest1_2.10-1.0.jar to /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-4647d3e3-6655-4da1-b75b-94f3d872c71a/fetchFileTemp8581640132534419761.tmp
16/12/02 18:13:20 INFO util.Utils: Copying /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-4647d3e3-6655-4da1-b75b-94f3d872c71a/8566654191480723975104_cache to /u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/container_e81_1480516695248_0056_01_000003/./scalawztest1_2.10-1.0.jar
16/12/02 18:13:20 INFO executor.Executor: Adding file:/u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/container_e81_1480516695248_0056_01_000003/./scalawztest1_2.10-1.0.jar to class loader
16/12/02 18:13:20 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
16/12/02 18:13:20 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.6 KB, free 25.6 KB)
16/12/02 18:13:20 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 101 ms
16/12/02 18:13:20 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 72.5 KB, free 98.1 KB)
16/12/02 18:13:22 INFO client.FusionCommon: Initialized FusionHdfs with URI: hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=1, FileSystem: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_468103070_110, ugi=oracle (auth:SIMPLE)]], instance: 426700950
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=1
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=10
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=11
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=12
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=13
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=14
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=15
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=16
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=17
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=18
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=19
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=2
....
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9
16/12/02 18:13:24 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 727069 bytes result sent to driver
16/12/02 18:13:24 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 3
16/12/02 18:13:24 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
16/12/02 18:13:24 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 25.5 KB, free 123.6 KB)
16/12/02 18:13:24 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 24 ms
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 72.4 KB, free 196.0 KB)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/parquet-hadoop-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/parquet-format-2.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/parquet-pig-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/hive-exec-1.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/hive-jdbc-1.1.0-cdh5.7.0-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
16/12/02 18:13:26 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1503 bytes result sent to driver
16/12/02 18:13:35 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4
16/12/02 18:13:35 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 4)
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 3
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 93.1 KB, free 93.1 KB)
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Reading broadcast variable 3 took 9 ms
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 684.5 KB, free 777.6 KB)
16/12/02 18:13:35 INFO parquet.ParquetRelation$$anonfun$buildInternalScan$1$$anon$1: Input split: ParquetInputSplit{part: hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0 start: 0 end: 2028037 length: 2028037 hosts: []}
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.0 KB, free 801.6 KB)
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2 took 7 ms
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 334.5 KB, free 1136.1 KB)
16/12/02 18:13:35 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/12/02 18:13:36 INFO codegen.GenerateUnsafeProjection: Code generated in 148.139655 ms
16/12/02 18:13:36 INFO codegen.GenerateUnsafeProjection: Code generated in 21.996524 ms
16/12/02 18:13:36 INFO codegen.GenerateSafeProjection: Code generated in 15.975923 ms
16/12/02 18:13:36 WARN parquet.CorruptStatistics: Ignoring statistics because created_by is null or empty! See PARQUET-251 and PARQUET-297
16/12/02 18:13:36 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 4). 5365 bytes result sent to driver
16/12/02 18:13:40 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
16/12/02 18:13:40 WARN executor.CoarseGrainedExecutorBackend: An unknown (enkbda1node05.enkitec.com:33339) driver disconnected.
16/12/02 18:13:40 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.12.105:33339 disassociated! Shutting down.
16/12/02 18:13:40 INFO storage.DiskBlockManager: Shutdown hook called
16/12/02 18:13:40 INFO util.ShutdownHookManager: Shutdown hook called
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u04/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-47de642a-8665-4853-a4f7-a5ba3ece4295
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u03/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-f1aaa12c-c0d5-4c75-b1e6-e222ea9112b6
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u01/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-764ee07d-b082-4bc1-8b4d-3834d6cd14cd
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u02/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-f79cf620-4754-41e2-bb4c-6e242f8e16ad
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u11/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-ae63ef76-400a-4e8f-b68b-3a34ed25414e
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-37543a1b-c5ec-4f06-887d-9357bea573e4
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u05/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-9a5eb5c9-4009-49ff-aca6-64103429b245
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u07/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-6ac13e05-62ec-485f-8016-3bb4d1830492
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u10/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-15e0aa40-4a67-46c7-8eb0-fe8c8f2d0c6e
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u06/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-112c8eb2-e039-4f73-beae-e3c05a04c3c6
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u08/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-752c1bcc-ce1f-41a1-b779-347f77b04227
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-4647d3e3-6655-4da1-b75b-94f3d872c71a

Executor 1 handled 840 Hive partitions. There are no additional logging information between 18:13:26 and 18:13:35. The log immediately jumped from 16/12/02 18:13:26 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1503 bytes result sent to driver to line 16/12/02 18:13:35 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4.

Executor 2 has similar behavior and processed exactly 840 partitions as well. The interested logs are shown below:
16/12/02 18:13:25 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 931 bytes result sent to driver
16/12/02 18:13:38 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 5

It did not tell me anything useful. Ok, let me check out the Stages page. I am more interested in Stage Id 2 and Stage Id 3. Here is the screen for Stages summary.
spark_stage_summary_3

Let’s check out Stage Id 1 screen shown below.
spark_stage_stage1_4

The duration is only between 0.2 and 1 seconds for the two executors. Another interesting statistics is the Peak Execution Memory is 0 Byte for both. I don’t believe this stage can load 1.3 billion rows of data without any memory usage. In other words, I believe it does not do any IO related work at this stage although the stage is doing select * from wzdb.sales_part.

Ok, let me check out the next stage, Stage 2. The DAG chart is so huge that it takes 20 seconds to shows up on the screen. There are literally 3,365 RDD partitions with union operation together to provide the result for show() function.

spark_stage_stage2_5

spark_stage_stage2_6

The Metrics stats for this stage gives other interesting result.
spark_stage_stage2_7

The total duration is unbelievable fast, 1 second and input size of 1980KB and 21 records. Remember, by default, show() function just print out 20 rows. So this 1980KB and 21 records are definitely related to this 20 rows show() result. But with 3,365 RDD partitions are union together, 1 second seems unbelievable fast. Please note the block size is 256 MB in our environment. I just don’t believe it’s physically possible to perform stage 1 operation (select * from wzdb.sales_part with 1.3 billion rows Hive Piquet table) in 1 second and immediately show 20 rows of result in the following 1 second. Yes, Spark is in memory based processing and super fast. But from the DAG, it go through all 1.3 billion rows. It can’t be finished in 2 seconds, even with a full rack BDA. It must has something else not present in the picture.

Luckily, for this test, I used the client mode as the deployment mode. So all of the log output was sent to my driver, the executing session. Then I found out where the missing time goes.

16/12/02 18:13:23 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4178 ms on enkbda1node09.enkitec.com (1/2)
16/12/02 18:13:24 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 4458 ms on enkbda1node10.enkitec.com (2/2)
16/12/02 18:13:24 INFO scheduler.DAGScheduler: ResultStage 0 (sql at test2.scala:42) finished in 20.648 s
16/12/02 18:13:24 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Job 0 finished: sql at test2.scala:42, took 21.026555 s
16/12/02 18:13:24 INFO spark.SparkContext: Starting job: sql at test2.scala:42
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Got job 1 (sql at test2.scala:42) with 2 output partitions
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (sql at test2.scala:42)
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Missing parents: List()
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at sql at test2.scala:42), which has no missing parents
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 72.4 KB, free 170.5 KB)
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 25.5 KB, free 196.0 KB)
16/12/02 18:13:24 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.12.105:14015 (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:24 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at sql at test2.scala:42)
16/12/02 18:13:24 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
16/12/02 18:13:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, enkbda1node09.enkitec.com, partition 0,PROCESS_LOCAL, 2044 bytes)
16/12/02 18:13:24 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, enkbda1node10.enkitec.com, partition 1,PROCESS_LOCAL, 2143 bytes)
16/12/02 18:13:24 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on enkbda1node09.enkitec.com:19013 (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:24 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on enkbda1node10.enkitec.com:45880 (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:25 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 294 ms on enkbda1node09.enkitec.com (1/2)
16/12/02 18:13:26 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 1614 ms on enkbda1node10.enkitec.com (2/2)
16/12/02 18:13:26 INFO scheduler.DAGScheduler: ResultStage 1 (sql at test2.scala:42) finished in 1.620 s
16/12/02 18:13:26 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
16/12/02 18:13:26 INFO scheduler.DAGScheduler: Job 1 finished: sql at test2.scala:42, took 1.665575 s
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.12.105:14015 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node10.enkitec.com:45880 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node09.enkitec.com:19013 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.12.105:14015 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node10.enkitec.com:45880 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node09.enkitec.com:19013 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO spark.ContextCleaner: Cleaned accumulator 1
16/12/02 18:13:26 INFO spark.ContextCleaner: Cleaned accumulator 2
16/12/02 18:13:26 INFO datasources.DataSourceStrategy: Selected 1680 partitions out of 1680, pruned 0.0% partitions.
16/12/02 18:13:26 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 108.4 KB, free 108.4 KB)
16/12/02 18:13:26 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.0 KB, free 132.3 KB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.12.105:14015 (size: 24.0 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO spark.SparkContext: Created broadcast 2 from show at test2.scala:44
16/12/02 18:13:27 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000128_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000128_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000284_0
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000137_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000137_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000293_0
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000138_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000138_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000294_0
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000139_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000139_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000295_0
16/12/02 18:13:28 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000140_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000140_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000296_0

....
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000144_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000144_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000266_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000127_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000127_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000249_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000145_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000145_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000267_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000146_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000146_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000268_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000147_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000147_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000269_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000148_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000148_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000270_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000149_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000149_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000271_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000150_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000150_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000272_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000151_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000151_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000273_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000152_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000152_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000274_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000153_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000153_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000275_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000128_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000128_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000250_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000129_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000129_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000251_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000130_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000130_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000252_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000131_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000131_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000253_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000132_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000132_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000254_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000133_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000133_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000255_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000134_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000134_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000256_0
16/12/02 18:13:35 INFO spark.SparkContext: Starting job: show at test2.scala:44
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Got job 2 (show at test2.scala:44) with 1 output partitions
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (show at test2.scala:44)
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Missing parents: List()
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[3367] at show at test2.scala:44), which has no missing parents
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 684.5 KB, free 816.9 KB)
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 93.1 KB, free 909.9 KB)
16/12/02 18:13:35 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.12.105:14015 (size: 93.1 KB, free: 529.9 MB)
16/12/02 18:13:35 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[3367] at show at test2.scala:44)
16/12/02 18:13:35 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks
16/12/02 18:13:35 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, enkbda1node10.enkitec.com, partition 0,RACK_LOCAL, 2384 bytes)
16/12/02 18:13:35 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on enkbda1node10.enkitec.com:45880 (size: 93.1 KB, free: 529.9 MB)
16/12/02 18:13:35 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on enkbda1node10.enkitec.com:45880 (size: 24.0 KB, free: 529.9 MB)
16/12/02 18:13:36 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 1336 ms on enkbda1node10.enkitec.com (1/1)
16/12/02 18:13:36 INFO scheduler.DAGScheduler: ResultStage 2 (show at test2.scala:44) finished in 1.336 s
16/12/02 18:13:36 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 
16/12/02 18:13:36 INFO scheduler.DAGScheduler: Job 2 finished: show at test2.scala:44, took 1.604959 s
16/12/02 18:13:36 INFO spark.ContextCleaner: Cleaned accumulator 3
16/12/02 18:13:36 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.12.105:14015 in memory (size: 93.1 KB, free: 530.0 MB)
16/12/02 18:13:36 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on enkbda1node10.enkitec.com:45880 in memory (size: 93.1 KB, free: 530.0 MB)
+-------------+----------------+--------+--------------------+-------------+-------+----------------+----------+------------+----+-----+---+
|      tran_id|  tran_timestamp|store_id|           item_name|item_category|dept_id|       dept_name|tran_price|cashier_name|year|month|day|
+-------------+----------------+--------+--------------------+-------------+-------+----------------+----------+------------+----+-----+---+
|1479564513475|2012-01-01 15:47|       4|rxkexrwwrnohuenpm...|          891|     26|             Toy|    185.17|       Maria|2012|    1|  1|
|1479564513608|2012-01-01 10:11|      27|                  zz|          790|     26|   Auto and Tire|     68.55|         Von|2012|    1|  1|
|1479564513748|2012-01-01 16:53|      26|fqzlqxvmpktwjwwgg...|          279|     10|Home Improvement|    233.47|         Von|2012|    1|  1|
|1479564513750|2012-01-01 21:10|      22|               ndmeu|          487|     35|           Photo|     92.42|       Ileen|2012|    1|  1|
|1479564526973|2012-01-01 07:52|       6|sbzmvrnxrvbohorbp...|          632|     18|         Jewelry|    164.34|        Keri|2012|    1|  1|
|1479564469852|2012-01-01 18:54|      27|ptcilplqfvednxmmh...|          416|      3|    Baby Toddler|    144.86|      Michel|2012|    1|  1|
|1479564523772|2012-01-01 11:07|       2|gvjrsdgidzunbbmfi...|          269|     17|          Sports|    231.67|       Penny|2012|    1|  1|
|1479564524666|2012-01-01 08:51|       6|rfskpcezchhbhzsbd...|          595|     19|            Food|    175.85|         Rus|2012|    1|  1|
|1479564470133|2012-01-01 18:36|      17|wzswebdjowfjjbslh...|          679|     10|   Health Beauty|    350.13|        Keri|2012|    1|  1|
|1479564537634|2012-01-01 07:52|      12|             bhxoevw|          281|     34|    Baby Toddler|    352.02|        Keri|2012|    1|  1|
|1479564470197|2012-01-01 06:04|       5|plqxmnrcuqisfygkl...|          152|     19|             Toy|     53.67|     Dorothy|2012|    1|  1|
|1479564470201|2012-01-01 08:23|      13|frcatrjwwrbomxmnj...|           74|     20|   Auto and Tire|    359.81|       Ileen|2012|    1|  1|
|1479564470386|2012-01-01 10:16|      15|cevezkxpsrzszshen...|          814|     13|   Auto and Tire|     27.92|     Sherril|2012|    1|  1|
|1479564470724|2012-01-01 01:44|      26|jjiqfklffyzxzkyiz...|          248|      5|   Auto and Tire|    219.66|       Lidia|2012|    1|  1|
|1479564470799|2012-01-01 23:26|      18|     voakgmajahxfgbq|          769|     17|          Sports|    251.07|       Susan|2012|    1|  1|
|1479564470941|2012-01-01 13:28|      14|axkytaxghyujudtaw...|          207|      5|   Auto and Tire|    168.34|  Christoper|2012|    1|  1|
|1479564471016|2012-01-01 15:37|       3|sdcnxhosatucnwwqk...|          192|     23|         Jewelry|       2.5|      Michel|2012|    1|  1|
|1479564471049|2012-01-01 23:27|       9|zoppybkpqpgitrwlo...|          120|     32|          Sports|    147.28|     Dorothy|2012|    1|  1|
|1479564471063|2012-01-01 23:51|      24|zknmvfsrsdxdysmdw...|          169|      6|         Jewelry|    292.59|   Broderick|2012|    1|  1|
|1479564471113|2012-01-01 19:42|      20|uaqmjikgtisidskzm...|          388|     36|            Food|      3.55|       Maria|2012|    1|  1|
+-------------+----------------+--------+--------------------+-------------+-------+----------------+----------+------------+----+-----+---+
only showing top 20 rows

16/12/02 18:13:36 INFO datasources.DataSourceStrategy: Selected 1680 partitions out of 1680, pruned 0.0% partitions.
16/12/02 18:13:36 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 282.1 KB, free 414.5 KB)

The above log shows that IO operations like parquet.ParquetRelation: Reading Parquet file(s) are completely outside the timeline for Job(/stage) 1 and Job 2. This is where the missing time goes. It is actually pretty good to have only 9~10 seconds to go through the all 1.3 billion rows. Mystery is solved.

With the above findings in mind, I feel if I just do partition pruning and limit the number of rows scanned in the line 42 query, the gap timeline should be reduced as less IO is needed to read the data. So I add the partition pruning in the query on line 42 to select * from wzdb.sales_part where year=2013 and month=11 and day=13. Rerun the test. The result was exactly what I expected.

Here is the new timeline:
spark_event_timeline_8

As you can see, there is only 1 second gap between Job Id 1 and Job Id 2. Here are the execution log. Only one partition of data was read.

16/12/03 11:11:10 INFO scheduler.DAGScheduler: Job 1 finished: sql at test2.scala:42, took 1.322394 s
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.12.105:46755 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node09.enkitec.com:37048 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node03.enkitec.com:19685 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO spark.ContextCleaner: Cleaned accumulator 2
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.12.105:46755 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node09.enkitec.com:37048 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node03.enkitec.com:19685 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO spark.ContextCleaner: Cleaned accumulator 1
16/12/03 11:11:11 INFO datasources.DataSourceStrategy: Selected 1 partitions out of 1680, pruned 99.94047619047619% partitions.
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 108.4 KB, free 108.4 KB)
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.0 KB, free 132.3 KB)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.12.105:46755 (size: 24.0 KB, free: 530.0 MB)
16/12/03 11:11:11 INFO spark.SparkContext: Created broadcast 2 from show at test2.scala:44
16/12/03 11:11:11 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/12/03 11:11:11 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000165_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000165_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000191_0
16/12/03 11:11:11 INFO spark.SparkContext: Starting job: show at test2.scala:44
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Got job 2 (show at test2.scala:44) with 1 output partitions
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (show at test2.scala:44)
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Missing parents: List()
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at show at test2.scala:44), which has no missing parents
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 143.0 KB, free 275.3 KB)
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 16.5 KB, free 291.8 KB)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.12.105:46755 (size: 16.5 KB, free: 530.0 MB)
16/12/03 11:11:11 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at show at test2.scala:44)
16/12/03 11:11:11 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks
16/12/03 11:11:11 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, enkbda1node09.enkitec.com, partition 0,NODE_LOCAL, 2386 bytes)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on enkbda1node09.enkitec.com:37048 (size: 16.5 KB, free: 530.0 MB)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on enkbda1node09.enkitec.com:37048 (size: 24.0 KB, free: 530.0 MB)
16/12/03 11:11:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 1523 ms on enkbda1node09.enkitec.com (1/1)

Lesson learned from this test is that Spark Metrics is helpful to identify the bottleneck of the spark application, but may not tell you the complete story. Just like in this case, if we just focus on the 1 or 2 seconds operations, it seems nothing need to be tuned here. On the contrary, we should need to focus on reducing the IO to access 1+ billion rows table by adding filter of partition keys and limiting total number of rows scan.

Compression Methods on Exadata: Compression on partition tables – Part 4 of 6

partitiondataIn the part 1, part 2, part 3 of the blog series, I show the examples to create compress tables using CREATE TABLE and ALTER TABLE MOVE and how to improve the compression performance using parallelism. For big database, partition table is quite common. This post shows how partition table behaves for different compression methods. There is a nice thing about partition table is that you can compress certain partitions while leave other partition uncompressed. This is very useful feature as you might need to compress historical data and only keep the current partition uncompressed. Printer manufacturer Brother has an interesting newspaper ads for its USP product shown above. Obviously this guy needs a better partition strategy as well as compression approaches to save his files digitally.

First let’s create a partition table. I made a slight change in the create table script from part 1.

CREATE TABLE PARTAB
(
    ID NUMBER,
    OWNER VARCHAR2(30) NOT NULL ENABLE,
    OBJECT_NAME VARCHAR2(30) NOT NULL ENABLE,
    SUBOBJECT_NAME VARCHAR2(30),
    OBJECT_ID NUMBER NOT NULL ENABLE,
    DATA_OBJECT_ID NUMBER,
    OBJECT_TYPE VARCHAR2(19),
    CREATED DATE NOT NULL ENABLE,
    LAST_DDL_TIME DATE NOT NULL ENABLE,
    TIMESTAMP VARCHAR2(19),
    STATUS VARCHAR2(7),
    TEMPORARY VARCHAR2(1),
    GENERATED VARCHAR2(1),
    SECONDARY VARCHAR2(1),
    NAMESPACE NUMBER NOT NULL ENABLE,
    EDITION_NAME VARCHAR2(30)
)
PARTITION BY RANGE (OBJECT_ID)
    (PARTITION P1 values less than (1500),
    PARTITION P2 values less than (4000),
    PARTITION P3 values less than (6000),
    PARTITION P4 values less than (maxvalue)
    );

Then, populate some data with 10 million rows in total.

declare
    l_cnt  number;
    l_rows number := 10000000;
begin
    insert /*+ append */
    into partab
    select rownum, a.*
      from all_objects a;
    l_cnt := sql%rowcount;
    commit;

    while (l_cnt < l_rows)
    loop
        insert /*+ APPEND */ into partab
        select rownum+l_cnt,
               OWNER, OBJECT_NAME, SUBOBJECT_NAME,
               OBJECT_ID, DATA_OBJECT_ID,
               OBJECT_TYPE, CREATED, LAST_DDL_TIME,
               TIMESTAMP, STATUS, TEMPORARY,
               GENERATED, SECONDARY, NAMESPACE, EDITION_NAME
          from partab
         where rownum <= l_rows-l_cnt;         l_cnt := l_cnt + sql%rowcount;         commit;     end loop; end; / 

Check out the row count and table size.

 WZHOU@dbm1> select count(*) from partab;

  COUNT(*)
----------
  10000000

Elapsed: 00:00:07.43

WZHOU@dbm1> select count(*) from partab partition(p1);

  COUNT(*)
----------
   1364992

Elapsed: 00:00:00.08

WZHOU@dbm1> select count(*) from partab partition(p2);

  COUNT(*)
----------
   2560000

Elapsed: 00:00:00.15

WZHOU@dbm1> select count(*) from partab partition(p3);

  COUNT(*)
----------
   1445504

Elapsed: 00:00:00.08

WZHOU@dbm1> select count(*) from partab partition(p4);

  COUNT(*)
----------
   4629504

Elapsed: 00:00:00.27

Before doing the compression work, let me check out the table size and partition size. You can find part_size.sql from the Script page. This script is also one of the Enkitec scripts I use frequently.

WZHOU@dbm1> @table_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for type:

OWNER  SEGMENT_NAME TYPE             TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ ---------------- -------------- ------------------------------
WZHOU        PARTAB TABLE PARTITION       1,112.0   USERS
                                     --------------
sum                                       1,112.0

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name:
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P1       TABLE PARTITION       152.0    USERS
WZHOU  PARTAB             P2       TABLE PARTITION       280.0    USERS
WZHOU  PARTAB             P3       TABLE PARTITION       152.0    USERS
WZHOU  PARTAB             P4       TABLE PARTITION       528.0    USERS
                                                   --------------
sum                                                    1,112.0

Right now, let me start with the Basic compression.

Basic Compression

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 COMPRESS;

Table altered.

Elapsed: 00:00:14.00

WZHOU@dbm1> select table_owner,table_name,partition_name,compress_for from DBA_TAB_PARTITIONS
    where table_name = 'PARTAB' and partition_name='P4';

TABLE_OWNER TABLE_NAME PARTITION_NAME COMPRESS_FOR
----------- ---------- -------------- ------------
WZHOU       PARTAB     P4             BASIC

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name: P4
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P4       TABLE PARTITION       192.0    USERS
                                                   --------------
sum                                                      192.0

There is little change we need to make before we proceed to the next test. We need to restore this partition to the non-compression stage by using nocompress keyword.

OLTP Compression

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 nocompress;

Table altered.

Elapsed: 00:00:05.99

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 compress for oltp;

Table altered.

Elapsed: 00:00:14.22
WZHOU@dbm1> select table_owner,table_name,partition_name,compress_for from DBA_TAB_PARTITIONS     where table_name = 'PARTAB' and partition_name='P4';

TABLE_OWNER TABLE_NAME PARTITION_NAME COMPRESS_FOR
----------- ---------- -------------- ------------
WZHOU       PARTAB     P4             OLTP

Elapsed: 00:00:00.01

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name: P4
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P4       TABLE PARTITION       192.0    USERS
                                                   --------------
sum                                                      192.0

Query Low Compression

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 nocompress;

Table altered.

Elapsed: 00:00:06.01
WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 compress for query low;

Table altered.

Elapsed: 00:00:06.83
WZHOU@dbm1> select table_owner,table_name,partition_name,compress_for from DBA_TAB_PARTITIONS
    where table_name = 'PARTAB' and partition_name='P4';

TABLE_OWNER TABLE_NAME PARTITION_NAME COMPRESS_FOR
----------- ---------- -------------- ------------
WZHOU       PARTAB     P4             QUERY LOW

Elapsed: 00:00:00.01

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name: P4
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P4       TABLE PARTITION        80.0    USERS
                                                   --------------
sum                                                       80.0

Query High Compression

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 nocompress;

Table altered.

Elapsed: 00:00:05.26

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 compress for query high;

Table altered.

Elapsed: 00:00:13.08

WZHOU@dbm1> select table_owner,table_name,partition_name,compress_for from DBA_TAB_PARTITIONS
    where table_name = 'PARTAB' and partition_name='P4';

TABLE_OWNER TABLE_NAME PARTITION_NAME COMPRESS_FOR
----------- ---------- -------------- ------------
WZHOU       PARTAB     P4             QUERY HIGH

Elapsed: 00:00:00.01

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name: P4
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P4       TABLE PARTITION        32.0    USERS
                                                   --------------
sum                                                       32.0

Archive Low Compression

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 nocompress;

Table altered.

Elapsed: 00:00:05.09

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 compress for archive low;

Table altered.

Elapsed: 00:00:13.60

WZHOU@dbm1> select table_owner,table_name,partition_name,compress_for from DBA_TAB_PARTITIONS
  where table_name = 'PARTAB' and partition_name='P4';

TABLE_OWNER TABLE_NAME PARTITION_NAME COMPRESS_FOR
----------- ---------- -------------- ------------
WZHOU       PARTAB     P4             ARCHIVE LOW

Elapsed: 00:00:00.00

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name: P4
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P4       TABLE PARTITION         32.0   USERS
                                                   --------------
sum                                                        32.0

Archive High Compression

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 nocompress;

Table altered.

Elapsed: 00:00:04.93

WZHOU@dbm1> ALTER TABLE partab MOVE partition p4 compress for archive high;

Table altered.

Elapsed: 00:01:13.33

WZHOU@dbm1> select table_owner,table_name,partition_name,compress_for from DBA_TAB_PARTITIONS
  where table_name = 'PARTAB' and partition_name='P4';

TABLE_OWNER TABLE_NAME PARTITION_NAME COMPRESS_FOR
----------- ---------- -------------- ------------
WZHOU       PARTAB     P4             ARCHIVE HIGH

Elapsed: 00:00:00.01

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name: P4
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P4       TABLE PARTITION        24.0    USERS
                                                   --------------
sum                                                       24.0

The timing is quite different from 13 seconds on majority of compression methods to about 73 seconds on Archive High method, about 500% time increase. The compression ratio is similar to what we see in the non-partition table. The partition size drops from 528 MB to about 25~40 MB for most of compression methods.

There is another interest aspect in partition tables is that you can use different compression methods for different partitions. This is useful that allows to compress the recent data in faster methods while very old historical data can use a method with longer processing time, but higher compression ratio. Here is the test to show different compression methods for different partitions.

WZHOU@dbm1> ALTER TABLE partab MOVE partition p1 compress for query low;

Table altered.

Elapsed: 00:00:01.88

WZHOU@dbm1> ALTER TABLE partab MOVE partition p2 compress for query high;

Table altered.

Elapsed: 00:00:05.75

WZHOU@dbm1> ALTER TABLE partab MOVE partition p3 compress for archive low;

Table altered.

Elapsed: 00:00:02.34

WZHOU@dbm1> select table_owner,table_name,partition_name,compress_for from DBA_TAB_PARTITIONS
     where table_name = 'PARTAB';

TABLE_OWNER TABLE_NAME PARTITION_NAME COMPRESS_FOR
----------- ---------- -------------- ------------
WZHOU       PARTAB     P1             QUERY LOW
WZHOU       PARTAB     P2             QUERY HIGH
WZHOU       PARTAB     P3             ARCHIVE LOW
WZHOU       PARTAB     P4             ARCHIVE HIGH

Elapsed: 00:00:00.04

WZHOU@dbm1> @part_size
Enter value for owner: WZHOU
Enter value for table_name: PARTAB
Enter value for partition_name:
Enter value for type:

OWNER  SEGMENT_NAME PARTITION_NAME TYPE            TOTALSIZE_MEGS TABLESPACE_NAME
------ ------------ -------------- --------------- -------------- ---------------
WZHOU  PARTAB             P1       TABLE PARTITION        16.0    USERS
WZHOU  PARTAB             P2       TABLE PARTITION        16.0    USERS
WZHOU  PARTAB             P3       TABLE PARTITION         8.0    USERS
WZHOU  PARTAB             P4       TABLE PARTITION        24.0    USERS
                                                   --------------
sum                                                       64.0

See enough code? In the next part, I am going to go back the basic stuff and discuss the layout of HCC as well as some recommendation when to use and when not to use HCC.

Compression Methods on Exadata: Use parallel to speed up compression – Part 3 of 6

parallalData

In the part 1 and part 2 of the blog series, I show the examples to create compress tables using CREATE TABLE and ALTER TABLE MOVE. For small tables in the example, it might be ok not to use the parallelism. But for big tables, parallelism is highly recommended just like the photos above. You have to have many lines on highway to allow high throughput. I wish adding a new line on highway could be as easy as change DOP in SQL. This post discusses parallelism for the compression methods.

As the command will be similar for different kinds of compression method, I pick one HCC method and test the timing for different DOP values. As QUERY LOW gives a lower compression ratio and is very unlikely I am going to use in production, so QUERY LOW is out. Let’s look at ARCHIVE HIGH, although it can give a much better compression ratio, the processing time is usually several times longer than other HCC methods. Given it could also take a longer time to uncompress the data when we need to move data to non-Exadata environment, I will be very unlikely to use this method in the production unless I really need to save significant amount space. So it leaves with QUERY HIGH and ARCHIVE LOW. Both of them are using ZLIB (gzip) compression algorithm. The timing and compression ratios seem quite close from my experience. So either one of them is fine for my test and I choose QUERY HIGH compression method as the base for our test.

I also increase the size of the test table to 80 million rows.

Build Test Data


WZHOU@dbm1> create table paratab_no_parallel as select * from testcp;

Table created.

Elapsed: 00:00:09.17

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;

10000000 rows created.

Elapsed: 00:00:17.32

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;
insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel
                          *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

Elapsed: 00:00:00.00

It seems I need to do the commit first before I can do further insert operation.

WZHOU@dbm1> commit;

Commit complete.

Elapsed: 00:00:00.00

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;

20000000 rows created.

Elapsed: 00:00:24.68

WZHOU@dbm1> commit;

Commit complete.

Elapsed: 00:00:00.01

WZHOU@dbm1> insert /*+ APPEND */ into paratab_no_parallel select * from paratab_no_parallel;

40000000 rows created.

Elapsed: 00:00:44.70

WZHOU@dbm1> select count(*) from paratab_no_parallel
                     *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

Elapsed: 00:00:00.01

WZHOU@dbm1> commit;

Commit complete.

Elapsed: 00:00:00.02

WZHOU@dbm1> select count(*) from paratab_no_parallel;

  COUNT(*)
----------
  80000000

Elapsed: 00:00:34.27

Let’s find out what is the size for this table.

WZHOU@dbm1> @table_size
Enter value for owner: WZHOU
Enter value for table_name: PARATAB%
Enter value for type:

OWNER  SEGMENT_NAME         TYPE    TOTALSIZE_MEGS TABLESPACE_NAME
------ -------------------- ------- -------------- ---------------
WZHOU  PARATAB_NO_PARALLEL  TABLE          8,824.4 USERS
                                       --------------
sum                                        8,824.4

Elapsed: 00:00:00.09

Then create three more similar tables.

WZHOU@dbm1> create table paratab_dop_2 as select * from paratab_no_parallel;

Table created.

Elapsed: 00:01:09.56

WZHOU@dbm1> create table paratab_dop_4 as select * from paratab_no_parallel;

Table created.

Elapsed: 00:01:08.55

WZHOU@dbm1> create table paratab_dop_8 as select * from paratab_no_parallel;

Table created.

Elapsed: 00:01:08.86

Parallel Tests

WZHOU@dbm1> ALTER TABLE paratab_no_parallel MOVE COMPRESS for query high;

Table altered.

Elapsed: 00:03:32.30

WZHOU@dbm1> ALTER TABLE paratab_dop_2 MOVE COMPRESS for query high parallel 2;

Table altered.

Elapsed: 00:01:48.43

WZHOU@dbm1> ALTER TABLE paratab_dop_4 MOVE COMPRESS for query high parallel 4;

Table altered.

Elapsed: 00:00:57.07

WZHOU@dbm1> ALTER TABLE paratab_dop_8 MOVE COMPRESS for query high parallel 8;

Table altered.

Elapsed: 00:00:29.63

WZHOU@dbm1> @table_size
Enter value for owner: WZHOU
Enter value for table_name: PARATAB%
Enter value for type:

OWNER  SEGMENT_NAME         TYPE    TOTALSIZE_MEGS TABLESPACE_NAME
------ -------------------- ------- -------------- ---------------
WZHOU  PARATAB_DOP_2        TABLE           453.3  USERS
WZHOU  PARATAB_DOP_4        TABLE           454.2  USERS
WZHOU  PARATAB_DOP_8        TABLE           456.7  USERS
WZHOU  PARATAB_NO_PARALLEL  TABLE           464.0  USERS
                                    --------------
sum                                       1,828.2

As the DOP increases, the processing time reduces in a linear fashion. The compression time for a 9G table reduces from three and half minutes to less than half minute with DOP of 8. Another interesting part is that the final compression size is slight different with different DOP value.

Compression Methods on Exadata: Create compression table using ALTER TABLE MOVE – Part 2 of 6

moveData
In the last post, it shows the examples to create compression tables using CREATE TABLE clause. In reality, majority of the time, we have existing tables in place and we just need to change it to a compression table. It is important to know when the compression happens. For example, if uses ALTER TABLE … COMPRESS FOR QUERY HIGH command, there is NO change to the existing data in the current table. Only after you uses MOVE keyword to rebuild the segment, it can really do the compression work. As usual, I shows different ways to use MOVE to compress tables as follows. Of course, in my examples, I do not have a plan to use a dolly to move data. But if you have multiple petabyte or even exabyte of data, dolly from a truck may be the faster way to move compared with slow network transfer via WAN across the country.

BASIC Compression:

WZHOU@dbm1> create table movetab_basic as select * from testcp;

Table created.

Elapsed: 00:00:09.38

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='MOVETAB_BASIC';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_BASIC              YES DISABLED

Elapsed: 00:00:00.01

WZHOU@dbm1> ALTER TABLE movetab_basic MOVE COMPRESS;

Table altered.

Elapsed: 00:00:29.78

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables  2  where table_name='MOVETAB_BASIC';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_BASIC              YES ENABLED  BASIC

Elapsed: 00:00:00.01

OLTP Compression:


WZHOU@dbm1> create table movetab_oltp as select * from testcp;

Table created.

Elapsed: 00:00:08.30

WZHOU@dbm1> ALTER TABLE movetab_oltp MOVE COMPRESS FOR oltp;

Table altered.

Elapsed: 00:00:29.03

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
   where table_name='MOVETAB_OLTP';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_OLTP               YES ENABLED  OLTP

Elapsed: 00:00:00.00

Query Low Compression:

WZHOU@dbm1> create table movetab_query_low as select * from testcp;

Table created.

Elapsed: 00:00:08.82

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='MOVETAB_QUERY_LOW';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_QUERY_LOW          YES DISABLED

Elapsed: 00:00:00.00

WZHOU@dbm1> ALTER TABLE movetab_query_low MOVE COMPRESS for query low;

Table altered.

Elapsed: 00:00:15.70

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='MOVETAB_QUERY_LOW';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_QUERY_LOW          YES ENABLED  QUERY LOW

Elapsed: 00:00:00.01

Query High Compression:


WZHOU@dbm1> create table movetab_query_high as select * from testcp;

Table created.

Elapsed: 00:00:08.21

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
  2  where table_name='MOVETAB_QUERY_HIGH';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_QUERY_HIGH         YES DISABLED

Elapsed: 00:00:00.00

WZHOU@dbm1> ALTER TABLE movetab_query_high MOVE COMPRESS for query high;

Table altered.

Elapsed: 00:00:27.60

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
  2  where table_name='MOVETAB_QUERY_HIGH';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_QUERY_HIGH         YES ENABLED  QUERY HIGH

Elapsed: 00:00:00.00

Archive Low Compression:


WZHOU@dbm1> create table movetab_archive_low as select * from testcp;

Table created.

Elapsed: 00:00:08.74

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
  2  where table_name='MOVETAB_ARCHIVE_LOW';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_ARCHIVE_LOW        YES DISABLED

Elapsed: 00:00:00.00

WZHOU@dbm1> ALTER TABLE movetab_archive_low MOVE COMPRESS for archive low;

Table altered.

Elapsed: 00:00:28.26

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
  2  where table_name='MOVETAB_ARCHIVE_LOW';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_ARCHIVE_LOW        YES ENABLED  ARCHIVE LOW

Elapsed: 00:00:00.00

Archive High Compression:

WZHOU@dbm1> create table movetab_archive_high as select * from testcp;

Table created.

Elapsed: 00:00:08.23

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='MOVETAB_ARCHIVE_HIGH';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_ARCHIVE_HIGH           YES DISABLED

Elapsed: 00:00:00.00

WZHOU@dbm1> ALTER TABLE movetab_archive_high MOVE COMPRESS for archive high;

Table altered.

Elapsed: 00:02:32.26
WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='MOVETAB_ARCHIVE_HIGH';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
MOVETAB_ARCHIVE_HIGH           YES ENABLED  ARCHIVE HIGH

Elapsed: 00:00:00.00

Remember from last post, the uncompressed table size is 1,152 MB. You can see the sizes of the tables after compression are quite similar to the approaches using CREATE TABLE in the last post.

WZHOU@dbm1> @table_size
Enter value for owner: WZHOU
Enter value for table_name: MOVE%
Enter value for type:

OWNER   SEGMENT_NAME           TYPE    TOTALSIZE_MEGS TABLESPACE_NAME
------- ------------------------------ ---------- --------------
WZHOU   MOVETAB_ARCHIVE_HIGH   TABLE      48.0 USERS
WZHOU   MOVETAB_ARCHIVE_LOW    TABLE      58.0 USERS
WZHOU   MOVETAB_QUERY_HIGH     TABLE      59.0 USERS
WZHOU   MOVETAB_QUERY_LOW      TABLE     160.0 USERS
WZHOU   MOVETAB_BASIC          TABLE     384.0 USERS
WZHOU   MOVETAB_OLTP           TABLE     424.0 USERS
                                       --------------
sum                                    1,133.0

6 rows selected.

[update 1]
So what’s happened if I need to access this table during the move, can I still do the select or other DML operation against this table during the move? With this question in mind, I did a few tests to see how it behaves during the MOVE operation. Make the Move operation longer, I use ARCHIVE HIGH compression method to see the difference. I also opened 5 more db sessions to do SELECT, INSERT, UPDATE, DELETE and monitor operations.

Session 1:

WZHOU@dbm1> create table movetab_chkop as select * from testcp;
Table created.
Elapsed: 00:00:08.32

Then in the same session, run the compression for archive high. During the compression process, run other operations in different sessions as follows:

ALTER TABLE movetab_chkop MOVE COMPRESS for archive high;

Session 2 (check SELECT):
select id, object_name from movetab_chkop where id = 10;
select count(*) from movetab_chkop;

Session 3 (check INSERT):
insert into movetab_chkop select * from testcp where id = 1;

Session 4 (check UPDATE):
update movetab_chkop set id = 1001 where id = 10000;

Session 5 (check DELETE):
delete from movetab_chkop where id = 5000;

Session 6 (Monitor the other sessions)
run as.sql script to check out the active session.
as.sql

The script to check who is blocking who

select l1.sid, ‘ IS BLOCKING ‘, l2.sid
from v$lock l1, v$lock l2
where l1.block =1 and l2.request > 0
and l1.id1=l2.id1
and l1.id2=l2.id2;

The result are listed as follows:

Session 1 (Do archive high compression)

WZHOU@dbm1> ALTER TABLE movetab_chkop MOVE COMPRESS for archive high;
Table altered.
Elapsed: 00:02:32.15

Session 2 (check SELECT)

WZHOU@dbm1> select id, object_name from movetab_chkop where id = 10;

	ID OBJECT_NAME
---------- ------------------------------
	10 I_OBJ5

Elapsed: 00:00:00.70

WZHOU@dbm1> select count(*) from movetab_chkop;

  COUNT(*)
----------
  10000000

Elapsed: 00:00:00.65

WZHOU@dbm1> select id, object_name from movetab_chkop where id = 10000;

	ID OBJECT_NAME
---------- ------------------------------
     10000 KU$_M_VIEW_SCM_T

Elapsed: 00:00:00.04

WZHOU@dbm1> select id, object_name from movetab_chkop where id = 5000;

	ID OBJECT_NAME
---------- ------------------------------
      5000 UTL_RECOMP_ALL_OBJECTS
Elapsed: 00:00:00.02

Session 3 (check INSERT)

WZHOU@dbm1> insert into movetab_chkop select * from testcp where id = 1;

1 row created.
Elapsed: 00:02:09.70

Session 4 (check UPDATE)

WZHOU@dbm1> update movetab_chkop set id = 1001 where id = 10000;

1 row updated.
Elapsed: 00:02:04.50

Session 5 (check DELETE)

WZHOU@dbm1> delete from movetab_chkop where id = 5000;

1 row deleted.
Elapsed: 00:01:59.17

Session 6 (Monitor the other sessions)

WZHOU@dbm1> @as

  SID	 SERIAL# PROG	    SQL_ID	   CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
----- ---------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
 1167	     587 sqlplus@en 1vz2hfh9wccwd      0      3566611749	  0	    .00 No  insert into movetab_chkop select * from t
 1945	      15 sqlplus@en 8856nfxk46wv8      0       900447529	  0	    .00 No  delete from movetab_chkop where id = 5000
  976	     669 sqlplus@en 9qta3dvp1jp5f      0       143915756	  0	  67.22 Yes ALTER TABLE movetab_chkop MOVE COMPRESS f
 1751	      15 sqlplus@en b2qff928t8ca7      1      2858995896	  8	    .35 No  select sid, serial#,substr(program,1,19)
 1557	     341 sqlplus@en f23grns86fvmn      0      3466372318	  0	    .01 No  update movetab_chkop set id = 1001 where

Elapsed: 00:00:00.45

WZHOU@dbm1> select l1.sid, ' IS BLOCKING ', l2.sid
  from v$lock l1, v$lock l2
    where l1.block =1 and l2.request > 0
    and l1.id1=l2.id1
        and l1.id2=l2.id2;

  SID 'ISBLOCKING'    SID
----- ------------- -----
  976  IS BLOCKING   1167
  976  IS BLOCKING   1557
  976  IS BLOCKING   1945

Elapsed: 00:00:00.08

After the compression completes, all DML operations are unblocked.

WZHOU@dbm1> @as

  SID	 SERIAL# PROG	    SQL_ID	   CHILD PLAN_HASH_VALUE      EXECS   AVG_ETIME OFF SQL_TEXT
----- ---------- ---------- ------------- ------ --------------- ---------- ----------- --- -----------------------------------------
 1751	      15 sqlplus@en b2qff928t8ca7      1      2858995896	  9	    .36 No  select sid, serial#,substr(program,1,19)

Elapsed: 00:00:00.44

WZHOU@dbm1> select l1.sid, ' IS BLOCKING ', l2.sid
  from v$lock l1, v$lock l2
    where l1.block =1 and l2.request > 0
    and l1.id1=l2.id1
        and l1.id2=l2.id2;

no rows selected

From the above results, we can see the MOVE operation is blocking all DML operations while select operation is not blocked. So if you have a big table to be compressed, there will be a period all DML operations will be blocked during the move.

Compression Methods on Exadata: Create compression table using CREATE TABLE – Part 1 of 6

dataCompression

I have done compression works in the past on Exadata, but never done a benchmark to compare different types of compression methods on Exadata. Of course, I don’t need a tank shown above to do the compression. But you can see the tank does compression very hard.

Let’s start from the basic ones. The default compression is BASIC method, which was introduced before 11g. In 11g, OLTP compression is used. On Exadata, Oracle uses Hybrid Columnar Compression (HCC). It includes compression methods like Query Low, Query High, Archive Low, and Archive High. This is a six part blog series on compression methods on Exadata.

The tests were run on Exadata X3 Eighth Rack. First, let create a test table with 10 million rows. The create table script is based on Tom Kyte’s test case with slight modifications. (http://asktom.oracle.com)

create table testcp
as
select rownum id, a.*
  from all_objects a
 where 1=0;

declare
    l_cnt  number;
    l_rows number := 10000000;
begin
    -- Copy ALL_OBJECTS
    insert /*+ append */
    into testcp
    select rownum, a.*
      from all_objects a;
    l_cnt := sql%rowcount;
    commit;

    -- Generate Rows
    while (l_cnt < l_rows)
    loop
        insert /*+ APPEND */ into testcp
        select rownum+l_cnt,
               OWNER, OBJECT_NAME, SUBOBJECT_NAME,
               OBJECT_ID, DATA_OBJECT_ID,
               OBJECT_TYPE, CREATED, LAST_DDL_TIME,
               TIMESTAMP, STATUS, TEMPORARY,
               GENERATED, SECONDARY, NAMESPACE, EDITION_NAME
          from testcp
         where rownum <= l_rows-l_cnt;         l_cnt := l_cnt + sql%rowcount;         commit;     end loop; end; / 

Let me verify the row count.

 WZHOU@dbm1> select count(*) from testcp;

COUNT(*)
----------
 10000000

My Enkitec colleagues have many nice scripts and I am going to use a few of them in my tests. The complete scripts used here are listed at Script page.

Ok, Let’s look at the size of the testcp table.

WZHOU@dbm1>  @table_size
Enter value for owner: WZHOU
Enter value for table_name: TESTCP
Enter value for type:

OWNER	SEGMENT_NAME	TYPE	     TOTALSIZE_MEGS TABLESPACE_NAME
------- --------------- ------------ -------------- -------------------------
WZHOU	TESTCP          TABLE              1,152.0  USERS
                                         ----------
sum                                        1,152.0

BASIC Compression:

WZHOU@dbm1> create table testcp_basic compress as select * from testcp where 1=2;
Table created.
Elapsed: 00:00:00.36

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
  2  where table_name='TESTCP_BASIC';

TABLE_NAME		       LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
TESTCP_BASIC                   YES ENABLED  BASIC
Elapsed: 00:00:00.02

WZHOU@dbm1> insert /*+ append */ into testcp_basic select * from testcp;
10000000 rows created.
Elapsed: 00:00:28.18

WZHOU@dbm1> commit;
Commit complete.
Elapsed: 00:00:00.00

OLTP Compression:

WZHOU@dbm1> create table testcp_oltp compress for oltp as select * from testcp where 1=2;
Table created.
Elapsed: 00:00:00.33

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
where table_name='TESTCP_OLTP';
TABLE_NAME		       LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
TESTCP_OLTP            YES ENABLED  OLTP
Elapsed: 00:00:00.02

WZHOU@dbm1> insert /*+ append */ into testcp_oltp select * from testcp;
10000000 rows created.
Elapsed: 00:00:28.87

WZHOU@dbm1> commit;
Commit complete.
Elapsed: 00:00:00.01

Exadata Query Low Compression

WZHOU@dbm1> create table testcp_query_low compress for query low as select * from testcp where 1=2;
Table created.
Elapsed: 00:00:00.34

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
where table_name='TESTCP_QUERY_LOW';
TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
TESTCP_QUERY_LOW       YES ENABLED  QUERY LOW
Elapsed: 00:00:00.02

WZHOU@dbm1> insert /*+ append */ into testcp_query_low select * from testcp;
10000000 rows created.
Elapsed: 00:00:15.41

WZHOU@dbm1> commit;
Commit complete.
Elapsed: 00:00:00.01

Exadata Query High Compression

WZHOU@dbm1> create table testcp_query_high compress for query high as select * from testcp where 1=2;
Table created.
Elapsed: 00:00:00.34

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='TESTCP_QUERY_HIGH';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
TESTCP_QUERY_HIGH      YES ENABLED  QUERY HIGH
Elapsed: 00:00:00.02

WZHOU@dbm1> insert /*+ append */ into testcp_query_high select * from testcp;
10000000 rows created.
Elapsed: 00:00:27.59

WZHOU@dbm1> commit;
Commit complete.
Elapsed: 00:00:00.00

Exadata Archive Low Compression

WZHOU@dbm1> create table testcp_archive_low compress for archive low as select * from testcp where 1=2;
Table created.
Elapsed: 00:00:00.34

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='TESTCP_ARCHIVE_LOW';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
TESTCP_ARCHIVE_LOW     YES ENABLED  ARCHIVE LOW
Elapsed: 00:00:00.02

WZHOU@dbm1> insert /*+ append */ into testcp_archive_low select * from testcp;
10000000 rows created.
Elapsed: 00:00:28.97

WZHOU@dbm1> commit;
Commit complete.
Elapsed: 00:00:00.01

Exadata Archive High Compression

WZHOU@dbm1> create table testcp_archive_high compress for archive high as select * from testcp where 1=2;
Table created.
Elapsed: 00:00:00.34

WZHOU@dbm1> select table_name,logging,compression,compress_for from dba_tables
    where table_name='TESTCP_ARCHIVE_HIGH';

TABLE_NAME             LOG COMPRESS COMPRESS_FOR
------------------------------ --- -------- ------------
TESTCP_ARCHIVE_HIGH        YES ENABLED  ARCHIVE HIGH
Elapsed: 00:00:00.02

WZHOU@dbm1> insert /*+ append */ into testcp_archive_high select * from testcp;
10000000 rows created.
Elapsed: 00:02:04.72

WZHOU@dbm1> commit;
Commit complete.
Elapsed: 00:00:00.01

So far, we have created six different compression tables. Let’s look at the table size for each of them.

WZHOU@dbm1> @table_size
Enter value for owner: WZHOU
Enter value for table_name: TESTCP%
Enter value for type:

OWNER  SEGMENT_NAME         TYPE    TOTALSIZE_MEGS TABLESPACE_NAME
------ -------------------- ------- -------------- ---------------
WZHOU  TESTCP_QUERY_HIGH    TABLE            59.0 USERS
WZHOU  TESTCP_ARCHIVE_HIGH  TABLE            59.0 USERS
WZHOU  TESTCP_ARCHIVE_LOW   TABLE            59.0 USERS
WZHOU  TESTCP_QUERY_LOW     TABLE           161.0 USERS
WZHOU  TESTCP_BASIC         TABLE           384.0 USERS
WZHOU  TESTCP_OLTP          TABLE           424.0 USERS
WZHOU  TESTCP               TABLE         1,152.0 USERS
                                       --------------
sum                                       2,298.0

7 rows selected.

It’s pretty amazing to see Exadata HCC compressions can get 10~20 compression ratio. From the timing perspective, only Archive High takes three times more than other compression methods.