Data Query between BDA and Exadata (Part 4): Query Oracle Dump File on BDA Using Copy2BDA

ship_ship
In the last three blogs below, I mainly discussed how to query data between BDA and Exadata and a way to load data from BDA to Exadata using OLH.
Data Query between BDA and Exadata (Part 1): Query Hive Table from Oracle on Exadata
Data Query between BDA and Exadata (Part 2): Query Oracle Table from Hive on BDA
Data Query between BDA and Exadata (Part 3): Load data from BDA to Oracle Table on Exadata Using OLH

When moving data from Oracle Database to Hadoop, I discussed the approach using Sqoop in this blog. There is actually another better approach on BDA by using Copy to BDA. Copy to BDA allows you to copy tables from an Oracle database into Hadoop in the format of regular Oracle Data Pump format, dmp file. After generating Data Pump format files from the tables and copying the files to HDFS, you can use Hive to query the data locally without accessing remote Oracle database on Exadata.

Ok, here is the overview of my plan to do the work. I am going to use the same Oracle table, ORA_TARGET_STOCK_PRICE1, used in my last blog on Exadata to generate an Oracle Data Pump dmp file. Then copy the Data Pump format file to the BDA and load it into HDFS. After the dmp file is available on HDFS, link the dmp file to a Hive external table. Then you can query the Oracle Data Pump file directly by using Hive on BDA.

Here are the detail steps:
On Exadata:
Step 1: Create db directory and generate Oracle Data Pump format file

mkdir -p /home/oracle/wzhou/mytest/expdir
sqlplus / as sysdba
CREATE DIRECTORY exp_2hdfs_dir AS '/home/oracle/wzhou/mytest/expdir';
GRANT READ, WRITE ON DIRECTORY exp_2hdfs_dir TO WZHOU;

Step 2: Create data pump format files for the data

sqlplus wzhou/wzhou
WZHOU:SQL> SELECT COUNT(*) FROM ORA_TARGET_STOCK_PRICE1;

   COUNT(*)
-----------
         65
         
WZHOU:SQL> CREATE TABLE EXT_DMP_ORA_TEST2
ORGANIZATION EXTERNAL
(                    
   TYPE oracle_datapump
   DEFAULT DIRECTORY exp_2hdfs_dir
   LOCATION('exa_ora_dmp_test2.dmp')
)
AS SELECT * FROM ORA_TARGET_STOCK_PRICE1;
2    3    4    5    6    7    8  
Table created.

Verify the result

WZHOU:SQL> desc EXT_DMP_ORA_TEST2;
 Name               Null?    Type
 ------------------ -------- -----------------
 STOCK_DATE                  VARCHAR2(20)
 CLOSE_PRICE                 NUMBER(8,3)
 VOLUME                      NUMBER(8)
 OPEN_PRICE                  NUMBER(8,3)
 HIGH_PRICE                  NUMBER(8,3)
 LOW_PRICE                   NUMBER(8,3)

WZHOU:SQL> select count(*) from EXT_DMP_ORA_TEST2;
   COUNT(*)
-----------
         65

WZHOU:SQL> !ls -l /home/oracle/wzhou/mytest/expdir
total 44
-rw-r----- 1 oracle dba 16384 Nov  5 19:12 exa_ora_dmp_test2.dmp
-rw-r--r-- 1 oracle dba   258 Nov  5 19:15 EXT_DMP_ORA_TEST2_180093.log

Step 3: Copy the dump file to Hadoop HDFS

. $ORACLE_HOME/bigdatasql/hadoop_enkbda.env
cd /home/oracle/wzhou/mytest/expdir
ls *.dmp
hadoop fs -mkdir /user/oracle/mytest/dmp
hadoop fs -put *.dmp /user/oracle/mytest/dmp
hadoop fs -ls /user/oracle/mytest/dmp

Ok, I am done with the work on Exadata and move my work to BDA side
On BDA
Step 4: Create the mapping external table in Hive
[root@enkbda1node01 ~]# su – oracle
[oracle@enkbda1node01 ~]$ hive
hive> use default;
OK
Time taken: 0.486 seconds
hive>
CREATE EXTERNAL TABLE ext_ora_dump_test2
ROW FORMAT SERDE ‘oracle.hadoop.hive.datapump.DPSerDe’
STORED AS
INPUTFORMAT ‘oracle.hadoop.hive.datapump.DPInputFormat’
OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION ‘/user/oracle/mytest/dmp’;

OK
Time taken: 0.48 seconds

Step 5: Querying the Data in Hive

hive> DESCRIBE ext_ora_dump_test2;
OK
stock_date          	varchar(20)         	from deserializer   
close_price         	decimal(8,3)        	from deserializer   
volume              	int                 	from deserializer   
open_price          	decimal(8,3)        	from deserializer   
high_price          	decimal(8,3)        	from deserializer   
low_price           	decimal(8,3)        	from deserializer   
Time taken: 0.188 seconds, Fetched: 6 row(s)

hive> show create table  ext_ora_dump_test2;
OK
CREATE EXTERNAL TABLE `ext_ora_dump_test2`(
  `stock_date` varchar(20) COMMENT 'from deserializer', 
  `close_price` decimal(8,3) COMMENT 'from deserializer', 
  `volume` int COMMENT 'from deserializer', 
  `open_price` decimal(8,3) COMMENT 'from deserializer', 
  `high_price` decimal(8,3) COMMENT 'from deserializer', 
  `low_price` decimal(8,3) COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'oracle.hadoop.hive.datapump.DPSerDe' 
STORED AS INPUTFORMAT 
  'oracle.hadoop.hive.datapump.DPInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://enkbda-ns/user/oracle/mytest/dmp'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'numFiles'='0', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'totalSize'='0', 
  'transient_lastDdlTime'='1478391818')
Time taken: 0.092 seconds, Fetched: 22 row(s)

hive> select * from ext_ora_dump_test2 limit 3;
OK
9/23/16	24.05	56837	24.13	24.22	23.88
9/22/16	24.1	56675	23.49	24.18	23.49
9/21/16	23.38	70925	23.21	23.58	23.025
Time taken: 0.237 seconds, Fetched: 3 row(s)

Ok, I can get the result from ora dump file in Hive. Nice feature.

Additional Note:

  • In my test, I use only one dmp file. You actually can generate multiple dmp files to speed up the export of dmp file as well as faster data access on Hadoop. The syntax to use multiple dmp files is like LOCATION(‘file1.dmp, file2.dmp’).
  • This feature is quite useful when you want to reuse the same Oracle dmp file on both sides. For example, you can use the dmp file on HDFS to query data directly on BDA without connecting to Oracle database on Exadata. At the same time, if you want to reuse the same data on Exadata, you could easily reimport the same dmp file to Oracle database on Exadata.
  • At the time when the source Oracle table changes, you can regenerate the dmp file on Exadata and then refresh the copy in Hadoop. Copy to BDA is primarily useful for Oracle tables that are relatively static and do not require frequent refreshes.
  • Copy to BDA is licensed under Oracle Big Data SQL and you have to have an Oracle Big Data SQL license before using Copy to BDA feature.
  • Data Query between BDA and Exadata (Part 3): Load data from BDA to Oracle Table on Exadata Using OLH

    truck_transfer
    In the last two blogs, I show how to query data between the two system: Hive on BDA Oracle on Exadata.
    Data Query between BDA and Exadata (Part 1): Query Hive Table from Oracle on Exadata
    Data Query between BDA and Exadata (Part 2): Query Oracle Table from Hive on BDA

    We know we can dump massive raw data to Hadoop, do some analysis or ETL transformation, load the result data to Oracle database. There are many ways to load the data into Oracle database:

  • Copy the result files to a NFS mount that is visible on both BDA and Exadata. Then use sqlldr or Oracle external table to load the data into Oracle database. There are many articles/blogs about sqlldr and Oracle external tables. I am not going to repeat here.
  • Use Sqoop to load the data from HDFS to Oracle. I actually had a blog last year to cover this topic. Export data from Hive table to Oracle Database
  • Oracle Loader for Hadoop (OLH). This is the topic I am going to cover in this blog.
  • OLH is part of Oracle Big Data Connectors. It is a MapReduce utility to load data from Hadoop into Oracle Database. It first sorts, partitions, and converts data into Oracle Database formats on the Hadoop cluster. Then it loads the converted data into the database. One of main advantage of OLH is to preprocess the data on the Hadoop side and reduce the CPU and IO utilization on Oracle database.

    In my test, I am going to reuse the stock_price.csv file from my first blog to load into an Oracle table on Exadata.

    Step 1: Create target Oracle table on Exadata
    This time the table is created in a normal way instead of external table.

    sqlplus wzhou/wzhou
    CREATE TABLE ORA_TARGET_STOCK_PRICE1
    (    
        STOCK_DATE  VARCHAR2(20),
        CLOSE_PRICE NUMBER(8,3),
        VOLUME NUMBER(8),
        OPEN_PRICE NUMBER(8,3),
        HIGH_PRICE NUMBER(8,3),
        LOW_PRICE NUMBER(8,3)               
    );
    

    Step 2: Clean up OLH output folder
    OLH requires a temporary output folder on HDFS. This directory needs to be removed before the data loading.
    In my example, I am using /user/oracle/oraloadout directory on HDFS.
    [oracle@enkbda1node01 ~]$ hdfs dfs -rm -R -skipTrash /user/oracle/oraloadout
    Deleted /user/oracle/oraloadout

    Step 3: Create a job configuration file

    vi MyOLHJobConf.xml
    <?xml version="1.0" encoding="UTF-8" ?>
    <configuration>
     
    <!--                          Input settings                             -->
     
     <property>
       <name>mapreduce.inputformat.class</name>
       <value>oracle.hadoop.loader.lib.input.DelimitedTextInputFormat</value>
     </property>
     
     <property>
       <name>mapred.input.dir</name>
       <value>/user/oracle/mytest</value>
     </property>
     
     <!--\u002C is ; -->
     <property>
       <name>oracle.hadoop.loader.input.fieldTerminator</name>
       <value>\u002C</value>
     </property>
      
     <property>
        <name>oracle.hadoop.loader.input.fieldNames</name>
           <value>STOCK_DATE,CLOSE_PRICE,VOLUME,OPEN_PRICE,HIGH_PRICE,LOW_PRICE</value>
     </property>
     <property>
     	<name>oracle.hadoop.loader.defaultDateFormat</name>
     	<value>yyyy-MM-dd:HH:mm:ss</value>
     </property> 
    
    <!--                          Output settings                    -->         
     <property>
       <name>mapreduce.job.outputformat.class</name>
       <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
     </property>
        
     <property>
       <name>mapreduce.output.fileoutputformat.outputdir</name>
       <value>oraloadout</value>
     </property>
     
    <!--                          Table information     -->                       
     
     <property>
       <name>oracle.hadoop.loader.loaderMap.targetTable</name>
       <value>ORA_TARGET_STOCK_PRICE1</value>
     </property>      
       
     
      
    <!--                          Connection information        -->              
     
    <property>
      <name>oracle.hadoop.loader.connection.url</name>
      <value>jdbc:oracle:thin:@${HOST}:${TCPPORT}/${SERVICE_NAME}</value>
    </property>
     
    <property>
      <name>TCPPORT</name>
      <value>1521</value>
    </property>
     
    <property>
      <name>HOST</name>
      <value>enkx3-scan</value>
    </property>
     
    <property>
     <name>SERVICE_NAME</name>
     <value>bigdatadb</value>
    </property>
     
    <property>
      <name>oracle.hadoop.loader.connection.user</name>
      <value>wzhou</value>
    </property>
     
    <property>
      <name>oracle.hadoop.loader.connection.password</name>
      <value>wzhou</value>        
       
    </property>  
     
    </configuration>
    

    Step 4: Execute the job to load data from HDFS to Oracle table on Exadata

    [oracle@enkbda1node01 test]$ export OLH_HOME=/opt/oracle/oraloader-3.6.0-h2
    [oracle@enkbda1node01 test]$ export HADOOP_CLASSPATH=$OLH_HOME/jlib/*:$HADOOP_CLASSPATH
    [oracle@enkbda1node01 test]$ 
    [oracle@enkbda1node01 test]$ hadoop jar $OLH_HOME/jlib/oraloader.jar oracle.hadoop.loader.OraLoader \
    > -conf MyOLHJobConf.xml -libjars $OLH_HOME/jlib/oraloader.jar
    Oracle Loader for Hadoop Release 3.6.0 - Production
    
    Copyright (c) 2011, 2016, Oracle and/or its affiliates. All rights reserved.
    
    16/11/03 05:04:41 INFO loader.OraLoader: Oracle Loader for Hadoop Release 3.6.0 - Production
    
    Copyright (c) 2011, 2016, Oracle and/or its affiliates. All rights reserved.
    
    16/11/03 05:04:41 INFO loader.OraLoader: Built-Against: hadoop-2.2.0 hive-0.13.0 avro-1.7.6 jackson-1.8.8
    16/11/03 05:04:42 INFO loader.OraLoader: oracle.hadoop.loader.loadByPartition is disabled because table: ORA_TARGET_STOCK_PRICE1 is not partitioned
    16/11/03 05:04:42 INFO loader.OraLoader: oracle.hadoop.loader.enableSorting disabled, no sorting key provided
    16/11/03 05:04:42 INFO loader.OraLoader: Reduce tasks set to 0 because of no partitioning or sorting. Loading will be done in the map phase.
    16/11/03 05:04:42 INFO output.DBOutputFormat: Setting map tasks speculative execution to false for : oracle.hadoop.loader.lib.output.JDBCOutputFormat
    16/11/03 05:04:43 WARN loader.OraLoader: Sampler is disabled because the number of reduce tasks is less than two. Job will continue without sampled information.
    16/11/03 05:04:43 INFO loader.OraLoader: Submitting OraLoader job OraLoader
    16/11/03 05:04:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm28
    16/11/03 05:04:43 INFO input.FileInputFormat: Total input paths to process : 1
    16/11/03 05:04:43 INFO mapreduce.JobSubmitter: number of splits:1
    16/11/03 05:04:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1476296428851_0053
    16/11/03 05:04:43 INFO impl.YarnClientImpl: Submitted application application_1476296428851_0053
    16/11/03 05:04:43 INFO mapreduce.Job: The url to track the job: http://enkbda1node04.enkitec.com:8088/proxy/application_1476296428851_0053/
    16/11/03 05:04:48 INFO loader.OraLoader: map 0% reduce 0%
    16/11/03 05:04:51 INFO loader.OraLoader: map 100% reduce 0%
    16/11/03 05:04:51 INFO loader.OraLoader: Job complete: OraLoader (job_1476296428851_0053)
    16/11/03 05:04:51 INFO loader.OraLoader: Counters: 34
    	File System Counters
    		FILE: Number of bytes read=0
    		FILE: Number of bytes written=0
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=2704
    		HDFS: Number of bytes written=185843
    		HDFS: Number of read operations=69
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=8
    	Job Counters 
    		Launched map tasks=1
    		Other local map tasks=1
    		Total time spent by all maps in occupied slots (ms)=2089
    		Total time spent by all reduces in occupied slots (ms)=0
    		TOTAL_LAUNCHED_UBERTASKS=1
    		NUM_UBER_SUBMAPS=1
    		Total time spent by all map tasks (ms)=2089
    		Total vcore-seconds taken by all map tasks=2089
    		Total megabyte-seconds taken by all map tasks=2139136
    	Map-Reduce Framework
    		Map input records=66
    		Map output records=65
    		Input split bytes=116
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=40
    		CPU time spent (ms)=3150
    		Physical memory (bytes) snapshot=474824704
    		Virtual memory (bytes) snapshot=2996711424
    		Total committed heap usage (bytes)=632291328
    	Rows skipped by input error
    		Parse Error=1
    		Total rows skipped by input error=1
    	File Input Format Counters 
    		Bytes Read=2495
    	File Output Format Counters 
    		Bytes Written=39900
    

    Step 5: Verify the result on Exadata

    sqlplus wzhou/wzhou
    WZHOU:SQL > select count(*) from ORA_TARGET_STOCK_PRICE1;
    
       COUNT(*)
    -----------
             65
    

    Data Query between BDA and Exadata (Part 2): Query Oracle Table from Hive on BDA

    mouse_dancing
    In the last post (Data Query between BDA and Exadata (Part 1): Query Hive Table from Oracle on Exadata), I show the way to use Oracle Big Data SQL from Oracle table to access hive table on BDA. In this post, I am going to show the data flow in opposite direction: Query data in an Oracle Table on Exadata from Hive on BDA.

    Step 1: Load some data to an Oracle table on Exadata
    On one of DB nodes on Exadata, run the following in SQLPlus.

    sqlplus wzhou/wzhou
    CREATE TABLE ORA_STOCK_PRICE1
    (    
        STOCK_DATE  VARCHAR2(20),
        CLOSE_PRICE NUMBER(8,3),
        VOLUME NUMBER(8),
        OPEN_PRICE NUMBER(8,3),
        HIGH_PRICE NUMBER(8,3),
        LOW_PRICE NUMBER(8,3)               
    );
    INSERT INTO ORA_STOCK_PRICE1 values ( '6/18/16', 23.6,320000,23.62,23.74,23.5 );
    INSERT INTO ORA_STOCK_PRICE1 values ( '6/19/16', 23.72,350000,23.73,23.8,23.38 );
    COMMIT;
    

    Let’s check out the content in the table.

    WZHOU:SQL> set lines 120
    WZHOU:SQL> select * from ORA_STOCK_PRICE1; 
    
    STOCK_DATE           CLOSE_PRICE      VOLUME  OPEN_PRICE  HIGH_PRICE   LOW_PRICE
    -------------------- ----------- ----------- ----------- ----------- -----------
    6/18/16                     23.6      320000       23.62       23.74        23.5
    6/19/16                    23.72      350000       23.73        23.8       23.38
    

    Step 2: Create the script for the external table in Hive.
    To access data in Oracle table, one way is using Oracle Table Access for Hadoop and Spark (OTA4H). OTA4H is an Oracle BDA feature that converts Oracle tables to Hadoop datasources. It allows direct access to master data in Oracle database using Hive SQL.

    On BDA node, create a Hive external table.

    vi hive_create_ext_ora_to_stock_price_test1.hql
    DROP TABLE EXT_ORA_STOCK_PRICE_TEST1;
    CREATE EXTERNAL TABLE IF NOT EXISTS EXT_ORA_STOCK_PRICE_TEST1 (
        stock_date string,
        close_price float,
        volume int,
        open_price float,
        high_price float,
        low_price float
    )
    STORED BY 'oracle.hcat.osh.OracleStorageHandler'
    WITH SERDEPROPERTIES (
        'oracle.hcat.osh.columns.mapping' = 'stock_date,close_price,volume,open_price,high_price,low_price')
    TBLPROPERTIES(
        'mapreduce.jdbc.url'='jdbc:oracle:thin:@enkx3-scan:1521/bigdatadb',
        'mapreduce.jdbc.username'='wzhou',
        'mapreduce.jdbc.password'='wzhou',
        'mapreduce.jdbc.input.table.name'='ORA_STOCK_PRICE1'
    );
    

    Step 3: Create the hive table on HDFS

    [oracle@enkbda1node01 test]$ export HIVE_AUX_JARS_PATH=/opt/oracle/ota4h/jlib/osh.jar:/opt/oracle/ota4h/jlib/ucp.jar:/opt/oracle/ota4h/jlib/ojdbc7.jar
    [oracle@enkbda1node01 test]$ hive -f hive_create_ext_ora_to_stock_price_test1.hql
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
    Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
    16/11/02 14:29:29 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
    
    Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties
    OK
    Time taken: 0.642 seconds
    OK
    Time taken: 0.381 seconds
    

    Step 4: Verify the result
    We should see two rows in the result.

    [oracle@enkbda1node01 test]$ hive
    hive> select * from EXT_ORA_STOCK_PRICE_TEST1;
    OK
    6/18/16	23.6	320000	23.62	23.74	23.5
    6/19/16	23.72	350000	23.73	23.8	23.38
    Time taken: 2.093 seconds, Fetched: 2 row(s)
    

    Ok, the result is exactly what I expect.

    Data Query between BDA and Exadata (Part 1): Query Hive Table from Oracle on Exadata

    mouse_cheese
    Last year I wrote several posts showing below to demonstrate load data between Hadoop Hive tables and Oracle tables using Sqoop.
    Import Data to Hive from Oracle Database
    Export data from Hive table to Oracle Database
    Use incremental import in sqoop to load data from Oracle (Part I)
    Use incremental import in sqoop to load data from Oracle (Part II)

    To accessing data between Big Data Appliance (BDA) and Exadata, there are a few other approaches that can be used in addition to Sqoop approach. For the next few blogs, I am going to discuss the ways to query hive table from an Oracle table, query Oracle table from Hive on BDA, and load data from BDA to Exadata.
    Last post, I show the way how to install Big Data SQL on Exadata. In this post, I am going to show how to use Big Data SQL to query a Hive table on BDA from Exadata.

    Step 1: Load test dataset to HDFS on BDA
    I use a small CSV file, stock_price.csv, which contains the stock price for one stock. The columns in the file include date, close, volume, open, high, and low. For simplicity, I am going to define the date column as STRING (Hive)/VARCHAR2(Oracle) instead of DATE type. The part of file content is shown below:

    [oracle@enkbda1node01 test]$ cat stock_price.csv
    date,close,volume,open,high,low
    9/23/16,24.05,56837,24.13,24.22,23.88
    9/22/16,24.1,56675,23.49,24.18,23.49
    9/21/16,23.38,70925,23.21,23.58,23.025
    9/20/16,23.07,35429,23.17,23.264,22.98
    9/19/16,23.12,34257,23.22,23.27,22.96
    9/16/16,23.16,83309,22.96,23.21,22.96
    9/15/16,23.01,43258,22.7,23.25,22.53
    9/14/16,22.69,33891,22.81,22.88,22.66
    9/13/16,22.81,59871,22.75,22.89,22.53
    9/12/16,22.85,109145,22.9,22.95,22.74
    9/9/16,23.03,115901,23.53,23.53,23.02
    9/8/16,23.6,32717,23.8,23.83,23.55
    9/7/16,23.85,143635,23.69,23.89,23.69
    9/6/16,23.68,43577,23.78,23.79,23.43
    ....
    
    [oracle@enkbda1node01 test]$ vi stock_price.csv
    [oracle@enkbda1node01 test]$ hdfs dfs -mkdir /user/oracle/mytest
    [oracle@enkbda1node01 test]$ hdfs dfs -put stock_price.csv /user/oracle/mytest
    [oracle@enkbda1node01 test]$ hdfs dfs -ls /user/oracle/mytest
    Found 1 items
    -rw-r--r--   3 oracle hadoop       2495 2016-10-30 15:36 /user/oracle/mytest/stock_price.csv
    

    Step 2: Create a hive table using the datafile
    vi hive_create_stock_price1.hql
    Input the following content in the hive_create_stock_price1.hql file.

    CREATE EXTERNAL TABLE hive_stock_price1 (
    stock_date string,
    close_price float,
    volume int,
    open_price float,
    high_price float,
    low_price float
    )
    ROW FORMAT delimited fields terminated by ‘,’
    LOCATION ‘/user/oracle/mytest’
    tblproperties (“skip.header.line.count”=”1”);

    Create the table and verify the result.

    [oracle@enkbda1node01 test]$ hive -f hive_create_stock_price1.hql
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
    Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
    16/11/01 15:47:47 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
    Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
    
    Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/hive-common-1.1.0-cdh5.7.0.jar!/hive-log4j.properties
    OK
    Time taken: 0.839 seconds
    
    [oracle@enkbda1node01 test]$ hive
    hive> select * from hive_stock_price1 limit 5;
    OK
    9/23/16	24.05	56837	24.13	24.22	23.88
    9/22/16	24.1	56675	23.49	24.18	23.49
    9/21/16	23.38	70925	23.21	23.58	23.025
    9/20/16	23.07	35429	23.17	23.264	22.98
    9/19/16	23.12	34257	23.22	23.27	22.96
    Time taken: 1.93 seconds, Fetched: 5 row(s)
    

    Step 3: Create an Oracle external table on Exadata
    Goto another session and logon to one db node on Exadata. Create an external table using type ORACLE_HIVE. Parameter com.oracle.bigdata.cluster refers to the Hadoop cluster name.

    sqlplus wzhou/wzhou
    CREATE TABLE EXT_HDFS_STOCK_PRICE1
    (    
        STOCK_DATE  VARCHAR2(20),
        CLOSE_PRICE NUMBER(8,3),
        VOLUME NUMBER(8),
        OPEN_PRICE NUMBER(8,3),
        HIGH_PRICE NUMBER(8,3),
        LOW_PRICE NUMBER(8,3)               
    )
    ORGANIZATION EXTERNAL
    ( TYPE ORACLE_HIVE
          DEFAULT DIRECTORY "DEFAULT_DIR"
          ACCESS PARAMETERS
          ( com.oracle.bigdata.cluster=enkbda
            com.oracle.bigdata.tablename: hive_stock_price1
          )
    )
    REJECT LIMIT UNLIMITED ;
    

    Step 3: Query the result

    WZHOU:SQL> select * from EXT_HDFS_STOCK_PRICE1 where rownum < 6;
    STOCK_DATE           CLOSE_PRICE      VOLUME  OPEN_PRICE  HIGH_PRICE   LOW_PRICE
    -------------------- ----------- ----------- ----------- ----------- -----------
    date
    9/23/16                    24.05       56837       24.13       24.22       23.88
    9/22/16                     24.1       56675       23.49       24.18       23.49
    9/21/16                    23.38       70925       23.21       23.58      23.025
    9/20/16                    23.07       35429       23.17      23.264       22.98
    

    You might notice the first row is actually the header of the CSV file on HDFS. On HDFS, when I created the hive table, I used a property tblproperties (“skip.header.line.count”=”1”) to skip the header row when showing the result from the hive table. Unfortunately I don’t find a way to allow me to skip header row from Big Data SQL. Anyway, it is not a big deal for me. I just need to remember to remove the header row when loading the data to HDFS.

    Install Oracle Big Data SQL on Exadata

    blog_bdsql_cat
    Oracle Big Data SQL is part of Oracle Big Data family. It allows users to use regular Oracle SQL to access data in Oracle database, Hadoop and other sources at the same time. Recently I installed the new version of Oracle Big Data SQL (BDSQL) v3.0 on two different systems:

    • An X3-2 Full Rack Big Data Appliance (BDA) connecting with an X3-2 Quarter Rack Exadata. This BDA is non-kerberosed.
    • An X6-2 Starter Rack BDA connecting with an X6-2 Eighth Rack Exadata. This BDA is kerberosed.

    Unlike previous previous BDSQL, the installations of this version of BDSQL on both systems are relatively smooth and straight forward. I list the installation steps as follows:

    Prerequisite

    • The BDA’s version is at least 4.5.
    • The April 2016 Proactive Bundle Patch (12.1.0.2.160419 BP) for Oracle Database must be pre-installed on Exadata.

    BDSQL requires the installation on both BDA and Exadata. Perform the installation on BDA first, then Exadata next.

    BDA Installation
    Run bdacli command to check whether BDSQL is enabled or not on BDA.
    [root@enkbda1node01 ~]# bdacli getinfo cluster_big_data_sql_enabled
    true

    If not enabled, run the following command:
    [root@enkbda1node01 ~]# bdacli enable big_data_sql

    After BDSQL is enabled on BDA, you should able to see the service on BDA’s Cloudera Manager.

    bdsql_clouderamanager
    Exadata Installation

    Logon as root user on Exadata DB Node 1.
    Step 1: copy bds-exa-install.sh file from BDA to exadata

    [root@enkx3db01 ~]# mkdir bdsql
    [root@enkx3db01 ~]# cd bdsql
    [root@enkx3db01 bdsql]# curl -O http://enkbda1node01/bda/bds-exa-install.sh
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 40258  100 40258    0     0  13.6M      0 --:--:-- --:--:-- --:--:-- 38.3M
    
    [root@enkx3db01 bdsql]# ls -l
    total 40
    -rw-r--r-- 1 root root 40258 Sep 28 07:04 bds-exa-install.sh
    

    Step 2. Change the ownership of the script and bdsql directory.
    We need to use oracle user for the installation.

    [root@enkx3db01 bdsql]# chown oracle:oinstall bds-exa-install.sh
    [root@enkx3db01 bdsql]# chmod +x bds-exa-install.sh
    [root@enkx3db01 bdsql]# ls -l
    total 40
    -rwxr-xr-x 1 oracle oinstall 40258 Sep 28 07:04 bds-exa-install.sh
    [root@enkx3db01 bdsql]# cd 
    [root@enkx3db01 ~]# chown oracle:oinstall bdsql
    

    Step 3. Perform the installation
    Logon as oracle user, not root user. Set the environment as follows:

    export ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1
    export ORACLE_SID=dbm1
    export GI_HOME=/u01/app/12.1.0.2/grid
    export TNS_ADMIN=/u01/app/12.1.0.2/grid/network/admin

    echo $ORACLE_HOME
    echo $ORACLE_SID
    echo $GI_HOME
    echo $TNS_ADMIN

    Run the installation script.

    [oracle@enkx3db01 ~] cd bdsql
    oracle:dbm1@enkx3db01 bdsql > ./bds-exa-install.sh
    bds-exa-install: setup script started at   : Wed Sep 28 07:23:06 SGT 2016
    bds-exa-install: bds version               : bds-2.0-2.el6.x86_64
    bds-exa-install: bda cluster name          : enkbda
    bds-exa-install: bda web server            : enkbda1node01.enkitec.local
    bds-exa-install: cloudera manager url      : enkbda1node03.enkitec.local:7180
    bds-exa-install: hive version              : hive-1.1.0-cdh5.7.0
    bds-exa-install: hadoop version            : hadoop-2.6.0-cdh5.7.0
    bds-exa-install: bds install date          : 09/24/2016 05:48 SGT
    bds-exa-install: bd_cell version           : bd_cell-12.1.2.0.101_LINUX.X64_160701-1.x86_64
    bds-exa-install: action                    : setup
    bds-exa-install: crs                       : true
    bds-exa-install: db resource               : dbm
    bds-exa-install: database type             : RAC
    bds-exa-install: cardinality               : 2
    ************************
    README--README--README--README--README--README--README--README--README--README
    ************************
    Detected a multi instance database (dbm). Run this script on all instances.
    Please read all option of this program (bds-exa-install --help)
    This script does extra work on the last instance.  The last instance is determined as
    the instance with the largest instance_id number.
    press <return>
    
    bds-exa-install: root shell script         : /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bds-root-enkbda-setup.sh
    please run as root:
    
    /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bds-root-enkbda-setup.sh
    
    waiting for root script to complete, press <enter> to continue checking.. q<enter> to quit
    

    Open another session, run the following command as root user.

    [root@enkx3db01 ~]# /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bds-root-enkbda-setup.sh
    bds-root-enkbda-setup.sh: enkbda: removing existing entries for 192.168.9.1 from /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: removing existing entries for 192.168.9.2 from /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: removing existing entries for 192.168.9.3 from /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: removing existing entries for 192.168.9.4 from /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: removing existing entries for 192.168.9.5 from /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: removing existing entries for 192.168.9.6 from /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: added entry "192.168.9.1 enkbda1node01.enkitec.local    enkbda1node01" to /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: added entry "192.168.9.2 enkbda1node02.enkitec.local    enkbda1node02" to /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: added entry "192.168.9.3 enkbda1node03.enkitec.local    enkbda1node03" to /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: added entry "192.168.9.4 enkbda1node04.enkitec.local    enkbda1node04" to /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: added entry "192.168.9.5 enkbda1node05.enkitec.local    enkbda1node05" to /etc/hosts
    bds-root-enkbda-setup.sh: enkbda: added entry "192.168.9.6 enkbda1node06.enkitec.local    enkbda1node06" to /etc/hosts
    
    After it completes, close this root session and go back to the oracle installation session. Then press ENTER key. The installation continues.
    
    bds-exa-install: root script seem to have succeeded, continuing with setup bds
    /bin/mkdir: created directory `/u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql'
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/install
    bds-exa-install: downloading JDK
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/install
    bds-exa-install: installing JDK tarball
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql/jdk1.8.0_92/jre/lib/security
    bds-exa-install: Copying JCE policy jars
    /bin/mkdir: created directory `default_dir'
    /bin/mkdir: created directory `bigdata_config'
    /bin/mkdir: created directory `log'
    /bin/mkdir: created directory `jlib'
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql/jlib
    bds-exa-install: removing old oracle bds jars if any
    bds-exa-install: downloading oracle bds jars
    bds-exa-install: installing oracle bds jars
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql
    bds-exa-install: downloading               : hadoop-2.6.0-cdh5.7.0.tar.gz
    bds-exa-install: downloading               : hive-1.1.0-cdh5.7.0.tar.gz
    bds-exa-install: unpacking                 : hadoop-2.6.0-cdh5.7.0.tar.gz
    bds-exa-install: unpacking                 : hive-1.1.0-cdh5.7.0.tar.gz
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql/hadoop-2.6.0-cdh5.7.0/lib
    bds-exa-install: downloading               : cdh-ol6-native.tar.gz
    bds-exa-install: creating /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql/hadoop_enkbda.env for hdfs/mapred client access
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql
    bds-exa-install: creating bds property files
    bds-exa-install: working directory         : /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatasql/bigdata_config
    bds-exa-install: created bigdata.properties
    bds-exa-install: created  bigdata-log4j.properties
    bds-exa-install: creating default and cluster directories needed by big data external tables
    bds-exa-install: note this will grant default and cluster directories to public!
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_catcon_387868.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    bds-exa-install: granted default and cluster directories to public!
    bds-exa-install: mta set to use listener end point : LISTENER
    bds-exa-install: mta will be setup
    bds-exa-install: creating /u01/app/oracle/product/12.1.0.2/dbhome_1/hs/admin/initbds_dbm_enkbda.ora
    bds-exa-install: mta setting agent home as : /u01/app/oracle/product/12.1.0.2/dbhome_1/hs/admin
    bds-exa-install: mta shutdown              : bds_dbm_enkbda
    ORA-28593: agent control utility: command terminated with error
    bds-exa-install: registering crs resource  : bds_dbm_enkbda
    bds-exa-install: skipping crs registration on this instance
    bds-exa-install: patching view LOADER_DIR_OBJS
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_catcon_388081.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    bds-exa-install: creating mta dblinks
    bds-exa-install: cluster name              : enkbda
    bds-exa-install: extproc sid               : bds_dbm_enkbda
    bds-exa-install: cdb                       : true
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_dbcluster_dropdblink_catcon_388131.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_dbcluster_dropdblink*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_dbcluster_dropdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_default_dropdblink_catcon_388153.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_default_dropdblink*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_default_dropdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_dbcluster_createdblink_catcon_388175.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_dbcluster_createdblink*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_dbcluster_createdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_default_createdblink_catcon_388210.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_default_createdblink*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_default_createdblink_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_catcon_388232.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    catcon: ALL catcon-related output will be written to /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_catcon_388258.lst
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon*.log files for output generated by scripts
    catcon: See /u01/app/oracle/product/12.1.0.2/dbhome_1/install/bdscatcon_*.lst files for spool files, if any
    catcon.pl: completed successfully
    bds-exa-install: setup script completed all steps
    

    Step 4. Install kerberos related stuff
    If BDA is not kerberosed, skip this step.

    Logon as root user, and copy the krb5.conf file from BDA node 1

    [root@enkx3db01 etc]# cp -p krb5.conf krb5.conf.orig
    [root@enkx3db01 etc]# scp root@enkbda1node01:/etc/krb5.conf .
    root@enkbda1node01's password:
    krb5.conf                                           100%  888     0.9KB/s   00:00 
    

    Download krb5-workstation-1.10.3-57.el6.x86_64.rpm file and do the yum install on Exadata.

    [root@enkx3db01 tmp]# yum localinstall krb5-workstation-1.10.3-57.el6.x86_64.rpm

    Step 5: Verify Hadoop Access from Exadata
    I used hive to verify whether Hadoop access is working or not on Exadata. If the BDA environment is non-kerberosed, skip the steps to do the kinit and klist.

    The hive's principal is used for kinit and oracle@ENKITEC.KDC principal works as well.

    [root@enkbda1node01 ~]# kinit hive
    Password for hive@ENKITEC.KDC: 
    [root@enkbda1node01 ~]# klist
    Ticket cache: FILE:/tmp/krb5cc_0
    Default principal: hive@ENKITEC.KDC
    Valid starting     Expires            Service principal
    09/28/16 10:23:28  09/29/16 10:23:28  krbtgt/ENKITEC.KDC@ENKITEC.KDC
    	renew until 10/05/16 10:23:28
    	
    [root@enkbda1node01 ~]# hive
    hive> select count(*) from sample_07;
    Query ID = root_20160928102323_f691ee8c-17f7-4ba8-af36-dad452bac458
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1475025273405_0001, Tracking URL = http://enkbda1node04.enkitec.local:8088/proxy/application_1475025273405_0001/
    Kill Command = /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/lib/hadoop/bin/hadoop job  -kill job_1475025273405_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2016-09-28 10:24:11,810 Stage-1 map = 0%,  reduce = 0%
    2016-09-28 10:24:20,134 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.94 sec
    2016-09-28 10:24:25,280 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.27 sec
    MapReduce Total cumulative CPU time: 4 seconds 270 msec
    Ended Job = job_1475025273405_0001
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.27 sec   HDFS Read: 52641 HDFS Write: 4 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 270 msec
    OK
    823
    Time taken: 28.544 seconds, Fetched: 1 row(s)
    
    hive> select * from sample_07 limit 10;
    OK
    00-0000	All Occupations	134354250	40690
    11-0000	Management occupations	6003930	96150
    11-1011	Chief executives	299160	151370
    11-1021	General and operations managers	1655410	103780
    11-1031	Legislators	61110	33880
    11-2011	Advertising and promotions managers	36300	91100
    11-2021	Marketing managers	165240	113400
    11-2022	Sales managers	322170	106790
    11-2031	Public relations managers	47210	97170
    11-3011	Administrative services managers	239360	76370
    Time taken: 0.054 seconds, Fetched: 10 row(s)
    hive> quit;
    [root@enkbda1node01 ~]# 
    

    Check hadoop from Oracle database. Run the following SQLs in SQLPlus. You can see BDSQL is installed under $ORACLE_HOME/bigdatasql directory.

    set lines 150
    col OWNER for a10
    col DIRECTORY_NAME for a25
    col DIRECTORY_PATH for a50
    select OWNER, DIRECTORY_NAME, DIRECTORY_PATH  from dba_directories where directory_name like '%BIGDATA%';
    
    SQL> SQL> SQL>
    OWNER      DIRECTORY_NAME            DIRECTORY_PATH
    ---------- ------------------------- --------------------------------------------------
    SYS        ORA_BIGDATA_CL_enkbda
    SYS        ORACLE_BIGDATA_CONFIG     /u01/app/oracle/product/12.1.0.2/dbhome_1/bigdatas
                                         ql/bigdata_config
    

    Create a bds user in Oracle database. This is the user I can test whether I can query hive table from SQLPlus.

    SQL> create user bds identified by bds;
    User created.
    SQL> grant connect , resource to bds;
    Grant succeeded.
    

    Create a few tables like sample_07 and sample_08. Both tables exists in Hive on BDA.

    CREATE TABLE sample_07 (  code VARCHAR2(4000),  description VARCHAR2(4000),
    total_emp NUMBER,  salary NUMBER)  ORGANIZATION EXTERNAL
     (TYPE ORACLE_HIVE
    DEFAULT DIRECTORY DEFAULT_DIR
     ACCESS PARAMETERS
    (
    com.oracle.bigdata.cluster=enkbda
    com.oracle.bigdata.tablename=default.sample_07)
    ) PARALLEL 2 REJECT LIMIT UNLIMITED;
    

    Several other useful queries to query hive metadata from SQLPlus:

    set lines 150
    set pages 999
    col cluster_id for a20
    col database_name for a8
    col table_name for a15
    col location for a20
    col owner for a8
    col table_type for a8
    col input_format for a20
    col hive_uri for a25
    select cluster_id, database_name, table_name, location,
    owner, table_type, input_format, hive_uri
    from all_hive_tables
    order by owner, table_name;
    
    CLUST  DATABASE TABLE_NAME      LOCATION             OWNER    TABLE_TY INPUT_FORMAT         HIVE_URI
    ------ -------- --------------- -------------------- -------- -------- -------------------- -------------------------
    enkbda default  sample_07       hdfs://enkbda-ns/use hive     MANAGED_ org.apache.hadoop.ma thrift://enkbda1node0
                                    r/hive/warehouse/sam          TABLE    pred.TextInputFormat 4.enkitec.local:9083
                                    ple_07
    
    col partitioned for a15
    SELECT cluster_id, database_name, owner, table_name, partitioned FROM all_hive_tables;
    
    CLUSTER_ID   DATABASE OWNER    TABLE_NAME      PARTITIONED
    ------------ -------- -------- --------------- ---------------
    enkbda       default  hive     sample_07       UN-PARTITIONED
    
    
    col column_name for a20
    col hive_column_type for a20
    col oracle_column_type for a15
    SELECT table_name, column_name, hive_column_type, oracle_column_type 
    FROM all_hive_columns;
    
    TABLE_NAME      COLUMN_NAME          HIVE_COLUMN_TYPE     ORACLE_COLUMN_T
    --------------- -------------------- -------------------- ---------------
    sample_07       code                 string               VARCHAR2(4000)
    sample_07       description          string               VARCHAR2(4000)
    sample_07       total_emp            int                  NUMBER
    sample_07       salary               int                  NUMBER
    
    4 rows selected.
    

    Ok, let me get some rows from sample_07 table.
    oracle:dbm1@enkx3db01 bdsql > sqlplus bds/bds
    SQL> select * from bds.sample_07;
    select * from bds.sample_07
    *
    ERROR at line 1:
    ORA-29913: error in executing ODCIEXTTABLEOPEN callout
    ORA-29400: data cartridge error
    ORA-28575: unable to open RPC connection to external procedure agent

    After some investigation, I realized the above error was caused by a weird requirement of BDSQL installation: the installation must be done on ALL db nodes on Exadata before accessing Hive table on BDA. Not a big deal. I performed the installation from Step 1 to Step 4 on DB node 2 on Exadata. After the completion, I bounced BDSQL on BDA using the
    following commands as root user:

    bdacli stop big_data_sql_cluster
    bdacli start big_data_sql_cluster

    Step 6: Final Verification
    After BDSQL was installed on db node 2 and BDSQL was bounced on BDA, I was able to query the sample_07 table from SQLPlus.

    SQL> select * from sample_07 where rownum < 3; 
    CODE
    --------------------------------------------------------------------------------
    DESCRIPTION
    --------------------------------------------------------------------------------
     TOTAL_EMP     SALARY
    ---------- ----------
    00-0000
    All Occupations
     134354250	40690
    
    11-0000
    Management occupations
       6003930	96150
    
    SQL> select count(*) from sample_07;
    
      COUNT(*)
    ----------
           823
    

    Cool, BDSQL is working on Exadata.

    A few more tips
    If stopping the database that has BDSQL installed, you will see the following message when stopping the db.
    oracle:dbm1@enkx3db01 ~ > srvctl stop database -d dbm
    PRCR-1133 : Failed to stop database dbm and its running services
    PRCR-1132 : Failed to stop resources using a filter
    CRS-2529: Unable to act on 'ora.dbm.db' because that would require stopping or relocating 'bds_dbm_enkbda', but the force option was not specified
    CRS-2529: Unable to act on 'ora.dbm.db' because that would require stopping or relocating 'bds_dbm_enkbda', but the force option was not specified

    The reason is that BDSQL installation created a new resource bds_dbm_enkbda that is running on both all db nodes. We can see the resource from crsctl status resource command.

    oracle:+ASM1@enkx3db01 ~ > crsctl status resource -t
    --------------------------------------------------------------------------------
    Name           Target  State        Server                   State details       
    --------------------------------------------------------------------------------
    Local Resources
    --------------------------------------------------------------------------------
    ora.DATAC1.dg
                   ONLINE  ONLINE       enkx3db01          STABLE
                   ONLINE  ONLINE       enkx3db02          STABLE
    ora.DBFS_DG.dg
                   ONLINE  ONLINE       enkx3db01          STABLE
                   ONLINE  ONLINE       enkx3db02          STABLE
    ora.LISTENER.lsnr
                   ONLINE  ONLINE       enkx3db01          STABLE
                   ONLINE  ONLINE       enkx3db02          STABLE
    ora.RECOC1.dg
                   ONLINE  ONLINE       enkx3db01          STABLE
                   ONLINE  ONLINE       enkx3db02          STABLE
    ora.asm
                   ONLINE  ONLINE       enkx3db01          Started,STABLE
                   ONLINE  ONLINE       enkx3db02          Started,STABLE
    ora.net1.network
                   ONLINE  ONLINE       enkx3db01          STABLE
                   ONLINE  ONLINE       enkx3db02          STABLE
    ora.ons
                   ONLINE  ONLINE       enkx3db01          STABLE
                   ONLINE  ONLINE       enkx3db02          STABLE
    --------------------------------------------------------------------------------
    Cluster Resources
    --------------------------------------------------------------------------------
    bds_dbm_enkbda
          1        ONLINE  ONLINE       enkx3db01          STABLE
          2        ONLINE  ONLINE       enkx3db02          STABLE
    ora.LISTENER_SCAN1.lsnr
          1        ONLINE  ONLINE       enkx3db02          STABLE
    ora.LISTENER_SCAN2.lsnr
          1        ONLINE  ONLINE       enkx3db01          STABLE
    ora.LISTENER_SCAN3.lsnr
          1        ONLINE  ONLINE       enkx3db01          STABLE
    ora.MGMTLSNR
          1        ONLINE  ONLINE       enkx3db01          169.254.33.130 192.1
                                                           68.10.1 192.168.10.2
                                                            ,STABLE
    ora.cvu
          1        ONLINE  ONLINE       enkx3db01          STABLE
    ora.dbm.db
          1        ONLINE  ONLINE       enkx3db01          Open,STABLE
          2        ONLINE  ONLINE       enkx3db02          Open,STABLE
    ora.enkx3db01.vip
          1        ONLINE  ONLINE       enkx3db01          STABLE
    ora.enkx3db02.vip
          1        ONLINE  ONLINE       enkx3db02          STABLE
    ora.mgmtdb
          1        ONLINE  ONLINE       enkx3db01          Open,STABLE
    ora.oc4j
          1        ONLINE  ONLINE       enkx3db01          STABLE
    ora.scan1.vip
          1        ONLINE  ONLINE       enkx3db02          STABLE
    ora.scan2.vip
          1        ONLINE  ONLINE       enkx3db01          STABLE
    ora.scan3.vip
          1        ONLINE  ONLINE       enkx3db01          STABLE
    --------------------------------------------------------------------------------
    

    So to shutdown the dbm database, need to run crsctl stop resource bds_dbm_enkbda command first.

    This resource is actually an agent for BDSQL. Running ps command can see this agent.

    oracle:dbm2@enkx3db02 ~ > ps -ef|grep bds
    oracle    21046   1945  0 11:42 pts/1    00:00:00 grep bds
    oracle   367388      1  0 11:19 ?        00:00:02 extprocbds_dbm_enkbda -mt
    

    If running the following lsnrctl, you should be able see 30 handlers for bds.
    lsnrctl status | grep bds