In the last three blogs below, I mainly discussed how to query data between BDA and Exadata and a way to load data from BDA to Exadata using OLH.
Data Query between BDA and Exadata (Part 1): Query Hive Table from Oracle on Exadata
Data Query between BDA and Exadata (Part 2): Query Oracle Table from Hive on BDA
Data Query between BDA and Exadata (Part 3): Load data from BDA to Oracle Table on Exadata Using OLH
When moving data from Oracle Database to Hadoop, I discussed the approach using Sqoop in this blog. There is actually another better approach on BDA by using Copy to BDA. Copy to BDA allows you to copy tables from an Oracle database into Hadoop in the format of regular Oracle Data Pump format, dmp file. After generating Data Pump format files from the tables and copying the files to HDFS, you can use Hive to query the data locally without accessing remote Oracle database on Exadata.
Ok, here is the overview of my plan to do the work. I am going to use the same Oracle table, ORA_TARGET_STOCK_PRICE1, used in my last blog on Exadata to generate an Oracle Data Pump dmp file. Then copy the Data Pump format file to the BDA and load it into HDFS. After the dmp file is available on HDFS, link the dmp file to a Hive external table. Then you can query the Oracle Data Pump file directly by using Hive on BDA.
Here are the detail steps:
On Exadata:
Step 1: Create db directory and generate Oracle Data Pump format file
mkdir -p /home/oracle/wzhou/mytest/expdir sqlplus / as sysdba CREATE DIRECTORY exp_2hdfs_dir AS '/home/oracle/wzhou/mytest/expdir'; GRANT READ, WRITE ON DIRECTORY exp_2hdfs_dir TO WZHOU;
Step 2: Create data pump format files for the data
sqlplus wzhou/wzhou WZHOU:SQL> SELECT COUNT(*) FROM ORA_TARGET_STOCK_PRICE1; COUNT(*) ----------- 65 WZHOU:SQL> CREATE TABLE EXT_DMP_ORA_TEST2 ORGANIZATION EXTERNAL ( TYPE oracle_datapump DEFAULT DIRECTORY exp_2hdfs_dir LOCATION('exa_ora_dmp_test2.dmp') ) AS SELECT * FROM ORA_TARGET_STOCK_PRICE1; 2 3 4 5 6 7 8 Table created.
Verify the result
WZHOU:SQL> desc EXT_DMP_ORA_TEST2; Name Null? Type ------------------ -------- ----------------- STOCK_DATE VARCHAR2(20) CLOSE_PRICE NUMBER(8,3) VOLUME NUMBER(8) OPEN_PRICE NUMBER(8,3) HIGH_PRICE NUMBER(8,3) LOW_PRICE NUMBER(8,3) WZHOU:SQL> select count(*) from EXT_DMP_ORA_TEST2; COUNT(*) ----------- 65 WZHOU:SQL> !ls -l /home/oracle/wzhou/mytest/expdir total 44 -rw-r----- 1 oracle dba 16384 Nov 5 19:12 exa_ora_dmp_test2.dmp -rw-r--r-- 1 oracle dba 258 Nov 5 19:15 EXT_DMP_ORA_TEST2_180093.log
Step 3: Copy the dump file to Hadoop HDFS
. $ORACLE_HOME/bigdatasql/hadoop_enkbda.env cd /home/oracle/wzhou/mytest/expdir ls *.dmp hadoop fs -mkdir /user/oracle/mytest/dmp hadoop fs -put *.dmp /user/oracle/mytest/dmp hadoop fs -ls /user/oracle/mytest/dmp
Ok, I am done with the work on Exadata and move my work to BDA side
On BDA
Step 4: Create the mapping external table in Hive
[root@enkbda1node01 ~]# su – oracle
[oracle@enkbda1node01 ~]$ hive
hive> use default;
OK
Time taken: 0.486 seconds
hive>
CREATE EXTERNAL TABLE ext_ora_dump_test2
ROW FORMAT SERDE ‘oracle.hadoop.hive.datapump.DPSerDe’
STORED AS
INPUTFORMAT ‘oracle.hadoop.hive.datapump.DPInputFormat’
OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION ‘/user/oracle/mytest/dmp’;
OK
Time taken: 0.48 seconds
Step 5: Querying the Data in Hive
hive> DESCRIBE ext_ora_dump_test2; OK stock_date varchar(20) from deserializer close_price decimal(8,3) from deserializer volume int from deserializer open_price decimal(8,3) from deserializer high_price decimal(8,3) from deserializer low_price decimal(8,3) from deserializer Time taken: 0.188 seconds, Fetched: 6 row(s) hive> show create table ext_ora_dump_test2; OK CREATE EXTERNAL TABLE `ext_ora_dump_test2`( `stock_date` varchar(20) COMMENT 'from deserializer', `close_price` decimal(8,3) COMMENT 'from deserializer', `volume` int COMMENT 'from deserializer', `open_price` decimal(8,3) COMMENT 'from deserializer', `high_price` decimal(8,3) COMMENT 'from deserializer', `low_price` decimal(8,3) COMMENT 'from deserializer') ROW FORMAT SERDE 'oracle.hadoop.hive.datapump.DPSerDe' STORED AS INPUTFORMAT 'oracle.hadoop.hive.datapump.DPInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://enkbda-ns/user/oracle/mytest/dmp' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0', 'transient_lastDdlTime'='1478391818') Time taken: 0.092 seconds, Fetched: 22 row(s) hive> select * from ext_ora_dump_test2 limit 3; OK 9/23/16 24.05 56837 24.13 24.22 23.88 9/22/16 24.1 56675 23.49 24.18 23.49 9/21/16 23.38 70925 23.21 23.58 23.025 Time taken: 0.237 seconds, Fetched: 3 row(s)
Ok, I can get the result from ora dump file in Hive. Nice feature.
Additional Note:
You must be logged in to post a comment.