In the last few posts, I discussed the following:
1. Install Cloudera Hadoop Cluster using Cloudera Manager
2. Configurations after CDH Installation
3. Load Data to Hive Table.
4. Import Data to Hive from Oracle Database
This post will discuss a different operation, exporting data from Hive table to Oracle database, just like goldfish jumps from fish bowl to the laptop. Again, we still use Sqoop to perform this operation.
The following example will export the data from my_all_objects_sqoop table in Hive to WZHOU.TEST_IMPORT_FROM_SCOOP table in DBM database on X3.
1. Create WZHOU.TEST_IMPORT_FROM_SCOOP table in Oracle.
CREATE TABLE WZHOU.TEST_IMPORT_FROM_SCOOP
(
OWNER VARCHAR2(30),
OBJECT_NAME VARCHAR2(30),
OBJECT_ID NUMBER(10),
OBJECT_TYPE VARCHAR2(19),
CREATED_DATE VARCHAR2(30)
);
select count(*) from WZHOU.TEST_IMPORT_FROM_SCOOP;
2. Run the sqoop export job.
sqoop export \
–connect jdbc:oracle:thin:@enkx3-scan:1521:dbm1 \
–username wzhou \
–password wzhou \
–direct \
–export-dir ‘/user/hive/warehouse/test_oracle.db/my_all_objects_sqoop’ \
–table WZHOU.TEST_IMPORT_FROM_SCOOP \
–fields-terminated-by ‘\001’
Note: –export-dir argument specifies the location of hive source directory, not individual filename. –fields-terminated-by ‘\001’ must be included in the command. Otherwise, you will see the following error:
Caused by: java.lang.RuntimeException: Can’t parse input data: ‘SYSTEMLOGMNR_REFCON$7604TABLE2013-03-12’
Result:
[wzhou@vmhost1 ~]$ sqoop export \ > --connect jdbc:oracle:thin:@enkx3-scan:1521:dbm1 \ > --username wzhou \ > --password wzhou \ > --direct \ > --export-dir '/user/hive/warehouse/test_oracle.db/my_all_objects_sqoop' \ > --table WZHOU.TEST_IMPORT_FROM_SCOOP \ > --fields-terminated-by '\001' Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 15/09/22 07:02:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.3 15/09/22 07:02:46 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 15/09/22 07:02:46 INFO manager.SqlManager: Using default fetchSize of 1000 15/09/22 07:02:47 INFO oracle.OraOopOracleQueries: Current schema is: WZHOU 15/09/22 07:02:48 INFO oracle.OraOopManagerFactory: ************************************************** *** Using Data Connector for Oracle and Hadoop *** ************************************************** 15/09/22 07:02:48 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production 15/09/22 07:02:48 INFO oracle.OraOopManagerFactory: This Oracle database is a RAC. 15/09/22 07:02:49 INFO oracle.OraOopManagerFactory: Database time on dbm1 is 2015-09-22 07:02:49.060215 15/09/22 07:02:50 INFO oracle.OraOopManagerFactory: Database time on dbm2 is 2015-09-22 07:02:50.03952 15/09/22 07:02:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop will load-balance sessions across the following instances ofthe Oracle RAC: Instance: dbm1 URL: jdbc:oracle:thin:@enkx3db01.enkitec.com:1521:dbm1 Instance: dbm2 URL: jdbc:oracle:thin:@enkx3db02.enkitec.com:1521:dbm2 15/09/22 07:02:50 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts 15/09/22 07:02:50 INFO tool.CodeGenTool: Beginning code generation 15/09/22 07:02:50 INFO manager.SqlManager: Executing SQL statement: SELECT "OWNER","OBJECT_NAME","OBJECT_ID","OBJECT_TYPE","CREATED_DATE" FROM WZHOU.TEST_IMPORT_FROM_SCOOP WHERE 0=1 15/09/22 07:02:50 INFO manager.SqlManager: Executing SQL statement: SELECT OWNER,OBJECT_NAME,OBJECT_ID,OBJECT_TYPE,CREATED_DATE FROM WZHOU.TEST_IMPORT_FROM_SCOOP WHERE 1=0 15/09/22 07:02:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-wzhou/compile/f33da649a2f40c4d01ab1701ed0d21a5/WZHOU_TEST_IMPORT_FROM_SCOOP.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 15/09/22 07:02:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wzhou/compile/f33da649a2f40c4d01ab1701ed0d21a5/WZHOU.TEST_IMPORT_FROM_SCOOP.jar 15/09/22 07:02:52 INFO mapreduce.ExportJobBase: Beginning export of WZHOU.TEST_IMPORT_FROM_SCOOP 15/09/22 07:02:52 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 15/09/22 07:02:53 INFO manager.SqlManager: Executing SQL statement: SELECT OWNER,OBJECT_NAME,OBJECT_ID,OBJECT_TYPE,CREATED_DATE FROM WZHOU.TEST_IMPORT_FROM_SCOOP WHERE 1=0 15/09/22 07:02:53 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 15/09/22 07:02:53 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 15/09/22 07:02:53 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 15/09/22 07:02:53 INFO client.RMProxy: Connecting to ResourceManager at vmhost1.local/192.168.56.71:8032 15/09/22 07:02:55 INFO input.FileInputFormat: Total input paths to process : 5 15/09/22 07:02:56 INFO input.FileInputFormat: Total input paths to process : 5 15/09/22 07:02:56 INFO mapreduce.JobSubmitter: number of splits:4 15/09/22 07:02:56 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 15/09/22 07:02:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442921609180_0004 15/09/22 07:02:56 INFO impl.YarnClientImpl: Submitted application application_1442921609180_0004 15/09/22 07:02:56 INFO mapreduce.Job: The url to track the job: http://vmhost1.local:8088/proxy/application_1442921609180_0004/ 15/09/22 07:02:56 INFO mapreduce.Job: Running job: job_1442921609180_0004 15/09/22 07:03:05 INFO mapreduce.Job: Job job_1442921609180_0004 running in uber mode : false 15/09/22 07:03:05 INFO mapreduce.Job: map 0% reduce 0% 15/09/22 07:03:14 INFO mapreduce.Job: map 25% reduce 0% 15/09/22 07:03:38 INFO mapreduce.Job: map 45% reduce 0% 15/09/22 07:03:39 INFO mapreduce.Job: map 50% reduce 0% 15/09/22 07:04:24 INFO mapreduce.Job: map 73% reduce 0% 15/09/22 07:04:25 INFO mapreduce.Job: map 75% reduce 0% 15/09/22 07:04:54 INFO mapreduce.Job: map 100% reduce 0% 15/09/22 07:04:54 INFO mapreduce.Job: Job job_1442921609180_0004 completed successfully 15/09/22 07:04:54 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=550000 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1176438 HDFS: Number of bytes written=0 HDFS: Number of read operations=25 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=4 Data-local map tasks=1 Rack-local map tasks=3 Total time spent by all maps in occupied slots (ms)=102232 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=102232 Total vcore-seconds taken by all map tasks=102232 Total megabyte-seconds taken by all map tasks=104685568 Map-Reduce Framework Map input records=22519 Map output records=22519 Input split bytes=1033 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=421 CPU time spent (ms)=6480 Physical memory (bytes) snapshot=481697792 Virtual memory (bytes) snapshot=6458949632 Total committed heap usage (bytes)=127401984 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 15/09/22 07:04:54 INFO mapreduce.ExportJobBase: Transferred 1.1219 MB in 121.1924 seconds (9.4797 KB/sec) 15/09/22 07:04:55 INFO mapreduce.ExportJobBase: Exported 22519 records.
3. Verify the result on Exadata. The row count and everything looks good.
SYS:dbm1> select count(*) from WZHOU.TEST_IMPORT_FROM_SCOOP; COUNT(*) ----------- 22519 SYS:dbm1> select * from WZHOU.TEST_IMPORT_FROM_SCOOP where OWNER='WZHOU' and rownum < 4; OWNER OBJECT_NAME OBJECT_ID OBJECT_TYPE CREATED_DATE --------------- ---------------- ----------- ------------- ----------------- WZHOU T1 29436 TABLE 2013-07-30 WZHOU MOVETAB_BASIC 28832 TABLE 2013-07-23 WZHOU MOVETAB_OLTP 28834 TABLE 2013-07-23
In the next post, I am going to discuss how to use Impala to access the same Hive data and performance comparison between Hive and Impala.
You must be logged in to post a comment.