Export data from Hive table to Oracle Database

In the last few posts, I discussed the following:
1. Install Cloudera Hadoop Cluster using Cloudera Manager
2. Configurations after CDH Installation
3. Load Data to Hive Table.
4. Import Data to Hive from Oracle Database

This post will discuss a different operation, exporting data from Hive table to Oracle database, just like goldfish jumps from fish bowl to the laptop. Again, we still use Sqoop to perform this operation.
fish_export

The following example will export the data from my_all_objects_sqoop table in Hive to WZHOU.TEST_IMPORT_FROM_SCOOP table in DBM database on X3.

1. Create WZHOU.TEST_IMPORT_FROM_SCOOP table in Oracle.

CREATE TABLE WZHOU.TEST_IMPORT_FROM_SCOOP
(
OWNER VARCHAR2(30),
OBJECT_NAME VARCHAR2(30),
OBJECT_ID NUMBER(10),
OBJECT_TYPE VARCHAR2(19),
CREATED_DATE VARCHAR2(30)
);

select count(*) from WZHOU.TEST_IMPORT_FROM_SCOOP;

2. Run the sqoop export job.

sqoop export \
–connect jdbc:oracle:thin:@enkx3-scan:1521:dbm1 \
–username wzhou \
–password wzhou \
–direct \
–export-dir ‘/user/hive/warehouse/test_oracle.db/my_all_objects_sqoop’ \
–table WZHOU.TEST_IMPORT_FROM_SCOOP \
–fields-terminated-by ‘\001’

Note: –export-dir argument specifies the location of hive source directory, not individual filename. –fields-terminated-by ‘\001’ must be included in the command. Otherwise, you will see the following error:
Caused by: java.lang.RuntimeException: Can’t parse input data: ‘SYSTEMLOGMNR_REFCON$7604TABLE2013-03-12’

Result:

[wzhou@vmhost1 ~]$ sqoop export \
>     --connect jdbc:oracle:thin:@enkx3-scan:1521:dbm1 \
>     --username wzhou \
>     --password wzhou \
>     --direct \
>     --export-dir '/user/hive/warehouse/test_oracle.db/my_all_objects_sqoop' \
>     --table WZHOU.TEST_IMPORT_FROM_SCOOP  \
>     --fields-terminated-by '\001' 
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/09/22 07:02:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.3
15/09/22 07:02:46 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/09/22 07:02:46 INFO manager.SqlManager: Using default fetchSize of 1000
15/09/22 07:02:47 INFO oracle.OraOopOracleQueries: Current schema is: WZHOU
15/09/22 07:02:48 INFO oracle.OraOopManagerFactory: 
**************************************************
*** Using Data Connector for Oracle and Hadoop ***
**************************************************
15/09/22 07:02:48 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
15/09/22 07:02:48 INFO oracle.OraOopManagerFactory: This Oracle database is a RAC.
15/09/22 07:02:49 INFO oracle.OraOopManagerFactory: 	Database time on dbm1 is 2015-09-22 07:02:49.060215
15/09/22 07:02:50 INFO oracle.OraOopManagerFactory: 	Database time on dbm2 is 2015-09-22 07:02:50.03952
15/09/22 07:02:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will load-balance sessions across the following instances ofthe Oracle RAC:
	Instance: dbm1 	 URL: jdbc:oracle:thin:@enkx3db01.enkitec.com:1521:dbm1
	Instance: dbm2 	 URL: jdbc:oracle:thin:@enkx3db02.enkitec.com:1521:dbm2

15/09/22 07:02:50 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
15/09/22 07:02:50 INFO tool.CodeGenTool: Beginning code generation
15/09/22 07:02:50 INFO manager.SqlManager: Executing SQL statement: SELECT "OWNER","OBJECT_NAME","OBJECT_ID","OBJECT_TYPE","CREATED_DATE" FROM WZHOU.TEST_IMPORT_FROM_SCOOP WHERE 0=1
15/09/22 07:02:50 INFO manager.SqlManager: Executing SQL statement: SELECT OWNER,OBJECT_NAME,OBJECT_ID,OBJECT_TYPE,CREATED_DATE FROM WZHOU.TEST_IMPORT_FROM_SCOOP WHERE 1=0
15/09/22 07:02:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-wzhou/compile/f33da649a2f40c4d01ab1701ed0d21a5/WZHOU_TEST_IMPORT_FROM_SCOOP.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/09/22 07:02:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wzhou/compile/f33da649a2f40c4d01ab1701ed0d21a5/WZHOU.TEST_IMPORT_FROM_SCOOP.jar
15/09/22 07:02:52 INFO mapreduce.ExportJobBase: Beginning export of WZHOU.TEST_IMPORT_FROM_SCOOP
15/09/22 07:02:52 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/09/22 07:02:53 INFO manager.SqlManager: Executing SQL statement: SELECT OWNER,OBJECT_NAME,OBJECT_ID,OBJECT_TYPE,CREATED_DATE FROM WZHOU.TEST_IMPORT_FROM_SCOOP WHERE 1=0
15/09/22 07:02:53 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/09/22 07:02:53 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/09/22 07:02:53 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/09/22 07:02:53 INFO client.RMProxy: Connecting to ResourceManager at vmhost1.local/192.168.56.71:8032
15/09/22 07:02:55 INFO input.FileInputFormat: Total input paths to process : 5
15/09/22 07:02:56 INFO input.FileInputFormat: Total input paths to process : 5
15/09/22 07:02:56 INFO mapreduce.JobSubmitter: number of splits:4
15/09/22 07:02:56 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/09/22 07:02:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442921609180_0004
15/09/22 07:02:56 INFO impl.YarnClientImpl: Submitted application application_1442921609180_0004
15/09/22 07:02:56 INFO mapreduce.Job: The url to track the job: http://vmhost1.local:8088/proxy/application_1442921609180_0004/
15/09/22 07:02:56 INFO mapreduce.Job: Running job: job_1442921609180_0004
15/09/22 07:03:05 INFO mapreduce.Job: Job job_1442921609180_0004 running in uber mode : false
15/09/22 07:03:05 INFO mapreduce.Job:  map 0% reduce 0%
15/09/22 07:03:14 INFO mapreduce.Job:  map 25% reduce 0%
15/09/22 07:03:38 INFO mapreduce.Job:  map 45% reduce 0%
15/09/22 07:03:39 INFO mapreduce.Job:  map 50% reduce 0%
15/09/22 07:04:24 INFO mapreduce.Job:  map 73% reduce 0%
15/09/22 07:04:25 INFO mapreduce.Job:  map 75% reduce 0%
15/09/22 07:04:54 INFO mapreduce.Job:  map 100% reduce 0%
15/09/22 07:04:54 INFO mapreduce.Job: Job job_1442921609180_0004 completed successfully
15/09/22 07:04:54 INFO mapreduce.Job: Counters: 31
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=550000
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1176438
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=25
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Launched map tasks=4
		Data-local map tasks=1
		Rack-local map tasks=3
		Total time spent by all maps in occupied slots (ms)=102232
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=102232
		Total vcore-seconds taken by all map tasks=102232
		Total megabyte-seconds taken by all map tasks=104685568
	Map-Reduce Framework
		Map input records=22519
		Map output records=22519
		Input split bytes=1033
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=421
		CPU time spent (ms)=6480
		Physical memory (bytes) snapshot=481697792
		Virtual memory (bytes) snapshot=6458949632
		Total committed heap usage (bytes)=127401984
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0
15/09/22 07:04:54 INFO mapreduce.ExportJobBase: Transferred 1.1219 MB in 121.1924 seconds (9.4797 KB/sec)
15/09/22 07:04:55 INFO mapreduce.ExportJobBase: Exported 22519 records.

3. Verify the result on Exadata. The row count and everything looks good.

SYS:dbm1> select count(*) from WZHOU.TEST_IMPORT_FROM_SCOOP;

   COUNT(*)
-----------
      22519

SYS:dbm1> select * from WZHOU.TEST_IMPORT_FROM_SCOOP where OWNER='WZHOU' and rownum < 4;

OWNER           OBJECT_NAME      OBJECT_ID   OBJECT_TYPE   CREATED_DATE
--------------- ---------------- ----------- ------------- -----------------
WZHOU           T1               29436       TABLE         2013-07-30
WZHOU           MOVETAB_BASIC    28832       TABLE         2013-07-23
WZHOU           MOVETAB_OLTP     28834       TABLE         2013-07-23

In the next post, I am going to discuss how to use Impala to access the same Hive data and performance comparison between Hive and Impala.

Advertisements

5 thoughts on “Export data from Hive table to Oracle Database

  1. Pingback: Use Impala to query a Hive table | My Big Data World

  2. Pingback: Use incremental import in sqoop to load data from Oracle (Part I) | My Big Data World

  3. Pingback: Use incremental import in sqoop to load data from Oracle (Part II) | My Big Data World

  4. Pingback: Data Query between BDA and Exadata (Part 1): Query Hive Table from Oracle on Exadata | My Big Data World

  5. Pingback: Data Query between BDA and Exadata (Part 3): Load data from BDA to Oracle Table on Exadata Using OLH | My Big Data World

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s