Create Cloudera Hadoop Cluster Using Cloudera Director on Google Cloud

I have a blog discussing how to install Cloudera Hadoop Cluster several years ago. It basically took about at least half day to complete the installation in my VM cluster. In my last post, I discussed an approach to deploy Hadoop cluster using DataProc on Google Cloud Platform. It literally took less than two minutes to create a Hadoop Cluster. Although it is a good to have a cluster launched in a very short time, it does not have the nice UI like Cloudera Manager as the Hadoop distribution used by Dataproc is not CDH. I could repeat my blogs to build a Hadoop Cluster using VM instances on Google Cloud Platform. But it will take some time and involve a lot of work. Actually there is another way to create Hadoop cluster on the cloud. Cloudera has a product, called Cloudera Director. It currently supports not only Google Cloud, but also AWS and Azure as well. It is designed to deploy CDH cluster faster and easier to scale the cluster on the cloud. Another important feature is that Cloud Director allows you to move your deployment scripts or steps easily from one cloud provider to another provider and you don’t have to be locked in one cloud vendor. In this blog, I will show you the way to create a CDH cluster using Cloudera Director.

The first step is to start my Cloudera Director instance. In my case, I have already installed Cloudera Director based on the instruction from Cloudera. It is pretty straight forward process and I am not going to repeat it here. The Cloudera Director instance is where you can launch your CDH cluster deployment.

Both Cloudera Director and Cloudera Manager UI are browser-based and you have to setup secure connection between your local machine and VM instances on the cloud. To achieve this, you need to configure SOCKS proxy on your local machine that is used to connect to the Cloudera Director VM. It provides a secure way to connect to your VM on the cloud and can use VM’s internal IP and hostname in the web browser. Google has a nice note about the steps, Securely Connecting to VM Instances. Following this note will help you to setup SOCKS proxy.

Ok, here are the steps.
Logon to Cloudera Director
Open a terminal session locally, and run the following code:

gcloud compute ssh cdh-director-1 \
    --project cdh-director-173715 \
    --zone us-central1-c \
    --ssh-flag="-D" \
    --ssh-flag="1080" \
    --ssh-flag="-N"    

cdh-director-1 is the name of my Cloudera Director instance on Google cloud and cdh-director-173715 is my Google Cloud project id. After executing the above command, it looks hang and never complete. This is CORRECT behavior. Do not kill or exit this session. Open a browser and type in the internal IP of Cloudera Director instance with port number 7189. For my cdh-director-1 instance, the internal IP is 10.128.0.2.

After input the URL http://10.128.0.2:7189 for Cloudera Director. The login screen shows up. Login as admin user.

Deployment
After login, the initial setup wizard shows up. Click Let’s get started.

In the Add Environment screen, input the information as follows. The Client ID JSON Key is the file you can create during the initial setup of you Google project with SSH key stuff.

In the next Add Cloudera Manager screen, I usually create the Instance Template first. Click the drop down of Instance Template, then select Create a new instance template. I need at least three template, one for Cloudera Manager, one for Master nodes, and one for Worker nodes. In my case here, I did not create a template for Edge nodes. To save resource on my Google cloud environment, I did not create the template for Edge node. Here are the configuration for all three templates.

Cloudera Manager Template

Master Node Template

Worker Node Template

Input the following for Cloudera Manager. For my test, I use Embedded Database. If it is used for production, you need to setup external database first and register the external database here.

After click Continue, Add Cluster screen shows up. There is a gateway instance group and I removed it by clicking Delete Group because I don’t have edge node here. Input the corresponding template and number of instances for masters and workders.

After click Continue, the deployment starts.

After about 20 minutes, it completes. Click Continue.

Review Cluster
The nice Cloudera Director dashboard shows up.

You can also login to Cloudera Manager from the link on Cloudera Director.

Nice and easy. Excellent product from Cloudera. For more information about deploying CDH cluster on Google Cloud, you can also check out Cloudera’s document, Getting Started on Google Cloud Platform.

Advertisements

E4 2017

I joined Enkitec in the summer of 2012, just a few weeks before the first Enkitec E4 conferece. Really good timing. Since then, I have been to every E4 conference for five years. It’s really an interesting conference more focusing on technical stuff and architecture. I always learn something new each time. The conference also grows from Exadata focus to more focuses on Oracle Engineering System, Big Data and Cloud. This year, I will be a speaker at E4 and co-present with Rashmi Kansakar of 84.51 on the topic of Analytics as a Business with Exadata and Big Data. It will have a lot of technical stuff in the presentation. The conference will be from June 12 to 15. For more information about E4, please visit Accenture Enkitec Group’s E4 site at https://registration.accenture.com/ehome/e4/sessions.

Validate Java Keystore on BDA

In many projects, I need to create a keystore to store SSL certifications. Majority of times I hardly worry about the validity of a Keystore. My keystores just works and I can see the content of all certifications by using keytool command. It works pretty well until recently when I needed to configure TLS for Cloudera Manager on BDA.
BDA has its own command to enable TLS for Cloudera Manager,Hue and Oozie in a pretty easy way. Just run command bdacli enable https_cm_hue_oozie. The only drawback for this command is that it is using self-signed certificate, not the users’ own certificates. Although it works good from security perspective, it’s not a good idea in the long run. I need to replace Oracle’s self-signed certificates with client’s certificates on BDA. Either Cloudera’s approach or Oracle’s approach is not going to work. Anyway, it is a different topic and I will discuss it in a different blog.

During my work to enable TLS with Cloudera Manager using client’s certificates, I run into various issues. After looking at many issues in detail, I suspect the key issue of my problem might come from the incorrectness of my keystore. Unfortunately to configure TLS with Cloudera Manager, agent and services, it requires to shut down CDH cluster and many steps to reach the stage I can test the keystore. It’s too time consuming for a busy BDA cluster. This blog is to discuss the approach to find a way, fast, easy and independent of CDH cluster to verify the content of a keystore is valid or not. Most importantly avoid the bridge building mistake shown below.

As my topic is related to BDA, I am going to list the ways to create a keystore in both Cloudera and Oracle ways.

Cloudera Way
See Cloudera’s document Step 1: Obtain Encryption Keys and Certificates for Cloudera Manager Server
I just highlight the key steps and commands as follows:
1. Generate Keystore for Cloudera Manager Host (Node 3 on BDA)

# keytool -genkeypair -alias cmhost -keyalg RSA -keystore \
/opt/cloudera/security/jks/cmhost-keystore.jks -keysize 2048 -dname \
"CN=cmhost.sec.example.com,OU=Security,O=Example,L=Denver,ST=Colorado,C=US" \
-storepass password -keypass password

2. Generate a CSR for the host.

# keytool -certreq -alias cmhost \
-keystore /opt/cloudera/security/jks/cmhost-keystore.jks \
-file /opt/cloudera/security/x509/cmhost.csr -storepass password \
-keypass password

3. Submit the .csr file created by the -certreq command to Certificate Authority to obtain a server certificate.
4. Copy the root CA certificate and any intermediate CA certificates to /opt/cloudera/security/CAcerts/.
There is no /opt/cloudera/security/CAcerts/ directory exist on BDA and I don’t believe it is necessary.
Actually I like Oracle approach, just copy the root and intermediate CA certificates to /opt/cloudera/security/jks directory. But I do like Cloudera’s approach to import root CA and intermediate CA certificates to the alternative system JDK truststore, jssecacerts, before importing them to the Java keystore on BDA. This is what Oracle’s approach is missing.

# cp $JAVA_HOME/jre/lib/security/cacerts $JAVA_HOME/jre/lib/security/jssecacerts

# keytool -importcert -alias RootCA -keystore $JAVA_HOME/jre/lib/security/jssecacerts \
-file /opt/cloudera/security/CAcerts/RootCA.cer -storepass changeit

# keytool -importcert -alias SubordinateCA -keystore \
$JAVA_HOME/jre/lib/security/jssecacerts \
-file /opt/cloudera/security/CAcerts/SubordinateCA.cer -storepass changeit

5. Import the root and intermediate certificates into keystore.

# keytool -importcert -trustcacerts -alias RootCA -keystore \
/opt/cloudera/security/jks/cmhost-keystore.jks -file \
/opt/cloudera/security/CAcerts/RootCA.cer -storepass password

# keytool -importcert -trustcacerts -alias SubordinateCA -keystore \ 
/opt/cloudera/security/jks/cmhost-keystore.jks -file \
/opt/cloudera/security/CAcerts/SubordinateCA.cer -storepass password

6. Import the signed host certificate

# cp certificate-file.cer  /opt/cloudera/security/x509/cmhost.pem

# keytool -importcert -trustcacerts -alias cmhost \ 
-file /opt/cloudera/security/x509/cmhost.pem \ 
-keystore /opt/cloudera/security/jks/cmhost-keystore.jks -storepass password

Oracle Way
See Oracle Note How to Use Certificates Signed by a User’s Certificate Authority for Web Consoles and Hadoop Network Encryption Use on the BDA (Doc ID 2187903.1)

1. Create the keystore on all nodes called /opt/cloudera/security/jks/node.jks
This is the place I like Oracle’s approach. Cloudera does require to have keystore in all hosts, but document in a way in separate chapters: Cloudera Manager and Agent. Only when I am done with the configuration, I realized why not combine them together in one single step. This is where Oracle’s approach is much simpler and easy.

# dcli -C keytool -validity 720 -keystore /opt/cloudera/security/jks/node.jks \
-alias \$HOSTNAME -genkeypair -keyalg RSA -storepass $PW -keypass $PW \
-dname "CN=\${HOSTNAME},OU=,O=,L=,S=,C="  

# dcli -C ls -l /opt/cloudera/security/jks/node.jks

2. Create CSR for each node.

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias \$HOSTNAME \
-certreq -file /root/\$HOSTNAME-cert-file -keypass $PW -storepass $PW 

3. Submit the node specific CSR to CA and signed.
4. Copy the signed certificate to cert_file_signed
cert_file_signed_bdanode01 would be copied to Node 1 as: /opt/cloudera/security/jks/cert_file_signed
cert_file_signed_bdanode02 would be copied to Node 2 as: /opt/cloudera/security/jks/cert_file_signed

cert_file_signed_bdanode0n would be copied to Node n as: /opt/cloudera/security/jks/cert_file_signed
5. Copy CA public certificate to /opt/cloudera/security/jks/ca.crt

# cp /tmp/staging/ca.crt /opt/cloudera/security/jks/ca.crt  
# dcli -C -f /opt/cloudera/security/jks/ca.crt -d /opt/cloudera/security/jks/ca.crt  
# dcli -C ls -ltr /opt/cloudera/security/jks/ca.crt

6. Import the CA public certificate /opt/cloudera/security/jks/ca.crt into the keystore on each node

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias CARoot \
-import -file /opt/cloudera/security/jks/ca.crt -storepass $PW -keypass $PW -noprompt

7. Import the signed certificate for each node on BDA

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias \$HOSTNAME \
-import -file /opt/cloudera/security/jks/cert_file_signed -storepass $PW -keypass $PW -noprompt 

So for TLS on BDA, the keystore file is /opt/cloudera/security/jks/node.jks. Another important file is Truststore at /opt/cloudera/security/jks/.truststore. The approach to build this file is quite similar as node.jks.

Ok, I have the node.jks file. How to verify it that it is a valid one? Like many people, I used to use keytool command to check out the content of keystore file. For example,

[root@enkx4bda1node01 ~]# keytool -list -v -keystore /opt/cloudera/security/jks/node.jks
Enter keystore password:  

*****************  WARNING WARNING WARNING  *****************
* The integrity of the information stored in your keystore  *
* has NOT been verified!  In order to verify its integrity, *
* you must provide your keystore password.                  *
*****************  WARNING WARNING WARNING  *****************

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: enkx4bda1node01.enkitec.local
Creation date: Mar 5, 2016
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=enkx4bda1node01.enkitec.local, OU=, O=, L=, ST=, C=
Issuer: CN=enkx4bda1node01.enkitec.local, OU=, O=, L=, ST=, C=
Serial number: 26a1471b
Valid from: Sat Mar 05 02:17:40 CST 2016 until: Fri Feb 23 02:17:40 CST 2018
Certificate fingerprints:
	 MD5:  10B:30:3A:40:CD:94:38:7D:3A:33:1F:DD:49:B7:DF:99
	 SHA1: 98:6F:FC:84:68:BA:BD:25:37:8A:1B:D6:07:6F:FE:14:41:76:5B:09
	 SHA256: L3:43:4C:4C:9B:0E:36:18:DD:F1:10:84:46:9E:77:AA:BB:C7:85:E5:FC:19:4F:29:7F:70:BA:D4:0C:55:AD:F7
	 Signature algorithm name: SHA256withRSA
	 Version: 3

Extensions: 

#1: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: GH FD 23 C9 9A A3 28 F9   3D C5 3B 1E E7 97 49 4E  ......(.=.:...IN
0010: 12 69 27 D5                                        .i(.
]
]

*******************************************
*******************************************

It is usually works, but with certain limitations. Even the keystore has all the necessary certificates, if they are not in the right order, it might not be a valid one. As I suspect my keystore on BDA might not be a valid one, I tried to find other potential tools beyond keytool. Luckily, I found a blog Installing Trusted Certificates into a Java Keystore by Oracle’s Jim Connors. It’s a very nice blog about various tools for keystore. I am really interested in one of the tool he talked about: using weblogic.jar‘s ValidateCertChain program.

I happened to build an OEM Cloud Control 13c R2 environment. Ok, let me give it a try.

[root@enkx4bdacli02 tmp]# java -cp /u01/app/oracle/oem/wlserver/server/lib/weblogic.jar utils.ValidateCertChain -jks enkx4bda1node03.enkitec.com node.jks
Cert[0]: CN=enkx4bda1node03.enkitec.com,OU=Bigdata,O=Enkitec,L=Irving,ST=TX,C=US
Certificate chain is incomplete, can't confirm the entire chain is valid
Certificate chain appears valid

It indeed find something and tell me my certificate chain is incomplete. This gives me the clue to focus only on the steps in building keystore. After I figured out the issue and fixed the import sequence of certificates, rerun the command again. Here is the result:

[root@enkx4bdacli02 tmp]# java -cp /u01/app/oracle/oem/wlserver/server/lib/weblogic.jar utils.ValidateCertChain -jks enkx4bda1node03.enkitec.com node.jks
Cert[0]: CN=enkx4bda1node03.enkitec.com,OU=Bigdata,O=Enkitec,L=Irving,ST=TX,C=US
Cert[1]: CN=EnkLab Intermediate CA,OU=Bigdata,O=Enkitec,ST=Texas,C=US
Cert[2]: CN=EnkLab ROOT CA,OU=Bigdata,O=Enkitec,L=Irving,ST=TX,C=US
Certificate chain appears valid

Looks much better. It correctly shows there are one root certificate, one intemediate CA certificate, and one host certificate. This keystore is one of my major issues in building keystore on BDA.

There is another command, openssl s_client, to validate keystore, but only useful when everything is configured.

# openssl s_client -connect enkx4bda1node03.enkitec.com:7183 -CAfile root.enkitec.com.cert.pem
CONNECTED(00000003)
depth=2 C = US, ST = TX, L = Irving, O = Enkitec, OU = bigdata, CN = Enklab ROOT CA
verify return:1
depth=1 C = US, ST = TX, O = Enkitec, OU = bigdata, CN = Enklab Intermediate CA
verify return:1
depth=0 C = US, ST = TX, L = Irving, O = Enkitec, OU = bigdata, CN = enkx4bda1node03.enkitec.com
verify return:1
---
Certificate chain
 0 s:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=enkx4bda1node03.enkitec.com
   i:/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
 1 s:/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
   i:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
 2 s:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
   i:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
---
Server certificate
-----BEGIN CERTIFICATE-----

MIIDXTCCAkWgAwIBAgIEQn3HnzANBgkqhkiG9w0BAQsFADBfMQkwBwYDVQQGEwAx
CTAHBgNVBAgTADEJMAcGA1UEBxMAMQkwBwYDVQQKEwAxCTAHBgNVBAsTADEmMCQG
A1UEAxMdZW5reDRiZGExbm9kZTAzLmVua2l0ZWMubG9jYWwwHhcNMTYwMzA1MDgx
NzQ1WhcNMTgwMjIzMDgxNzQ1WjBfMQkwBwYDVQQGEwAxCTAHBgNVBAgTADEJMAcG
A1UEBxMAMQkwBwYDVQQKEwAxCTAHBgNVBAsTADEmMCQGA1UEAxMdZW5reDRiZGEx
bm9kZTAzLmVua2l0ZWMubG9jYWwwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQDXcThbyBV4FAm2EJJBhZpg5XLqRcswMm748QUxBzTBj+LeXZJw7wTX3SzJ
Eup6YeJKczDYTjPLpHZ6ruOnhz4WSA/39e+U9MvqNZMnwdwgA7/d++4BA4ZGWs1N
3G/NmYHR1eKJntPFrExz/1XSJpW7xVfAaNsQNUb9HkAEtXN25GOF/H7jQBwxx5Wq
mnIZAgNC7shg6DCusvaURllsOih+XY4kf8HYKLLihXUmbeNauG/ixZyXm3kKu5mN
vfXF48Y4OKMHkYMS5BfZzaRw43+PWIWPbsy2RR+GRypsFMSCa5MHIwL+2tHJHBwC
kwXMB7RlA7yVd57iXPzlCAf1mijjAgMBAAGjITAfMB0GA1UdDgQWBBQ20j1Jr+LG
ejzGFNVNZIHybvIstjANBgkqhkiG9w0BAQsFAAOCAQEArZ6x6qIRxhqJ8Qd20Xkf
T3NsbzEUMBIGA1UECgwLU3RhdG5ldHQgU0YxDjAMBgNVBAsMkFDs1FAjXrt8fo7S
QTVe225bCiTYgIJl7UwOAonKBZLRIhwjbh1TDij1iyNuSrX1kisVkrmtQrsNTpqH
D8m3k1M6XCUU3RV2+I6UY2WhLNvojlCYPXnQHXo5BJPDRuaXQu/OUi2cr5LVzOhC
5NdBjMUDwfsWx5NYtTK5iNvt7CBGZOXF5RgdDhZMywR0qY0pMiBjGoCxvhv9v8Ob
xk/WfbfXfcviUrb5lnqCX8NUG+/fKv09Csx0CBiXXNU+9R5HAlTZG5xptIi22CXZ
Kw==
-----END CERTIFICATE-----
subject=/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=enkx4bda1node03.enkitec.com
issuer=/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
---
No client certificate CA names sent
Server Temp Key: ECDH, secp521r1, 521 bits
---
SSL handshake has read 4430 bytes and written 443 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 39023B1EB131C30355F20CD8F012DCF2FFC95E1A1F9F8D8D2B6954942E9
    Session-ID-ctx: 
    Master-Key: XMB7RlA7yVd57iXPzl5EE73EAAB9B18B04B2718CAf1mijjAgMBAA5126650B5A3GjITAfM8EA269DBFE17A750EBBC5EC
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 9023528453
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---
closed

Change BDA Cluster Name in OEM Cloud Control 13c

Oracle OEM Cloud Control 13c has some improvement than OEM 12c. But for BDA, the most weird thing after OEM BDA Discovery is the target name for BDA cluster. By default, the target name is called BDA Network 1 for the first BDA cluster, and BDA Network 2 for 2nd BDA cluster. Think about this way, Oracle BDA already has a different BDA cluster name than Cloudera Manager’s cluster name. Right now OEM comes out with another different cluster name. If we have two BDAs and use OEM 13c to discover the DR BDA first, the DR BDA will take BDA Network 1 name. Then primary BDA will be discovered as BDA Network 2. It’s really an annoying new feature in OEM 13c. Ideally, I want to change the BDA Network Name to something meaningful. BDA Network 1 is really an useless naming standard, just like the door below. Mapping to either BDA cluster name or Cloudera Manager’s Cluster name is fine with me. In this blog, I am going to discuss whether I can change this name to something I like.

There are two types of targets in OEM: Repository Side Targets and Agent Side Targets. Each managed target in OEM have a Display Name and Target Name. So for BDA Network, I am wondering which category is for this target.

Run the query for Repository Side targets:

set lines 200
set pages 999
col ENTITY_TYPE for a30
col TYPE_DISPLAY_NAME for a35
col ENTITY_NAME for a45
col DISPLAY_NAME for a40

SELECT ENTITY_TYPE,
       TYPE_DISPLAY_NAME,
       ENTITY_NAME,
       DISPLAY_NAME
FROM   SYSMAN.EM_MANAGEABLE_ENTITIES
WHERE  MANAGE_STATUS = 2
AND    REP_SIDE_AVAIL = 1
ORDER  BY 1,2;
ENTITY_TYPE		       TYPE_DISPLAY_NAME		   ENTITY_NAME					 DISPLAY_NAME
------------------------------ ----------------------------------- --------------------------------------------- ----------------------------------------
j2ee_application_cluster       Clustered Application Deployment    /EMGC_GCDomain/GCDomain/BIP_cluster/bipublish bipublisher(11.1.1)
								  er(11.1.1)

j2ee_application_cluster       Clustered Application Deployment    /EMGC_GCDomain/GCDomain/BIP_cluster/ESSAPP	 ESSAPP
oracle_em_service	       EM Service			   EM Jobs Service				 EM Jobs Service
oracle_emsvrs_sys	       EM Servers System		   Management_Servers				 Management Servers
oracle_si_netswitch	       Systems Infrastructure Switch	   enkx4bda1sw-ib2				 enkx4bda1sw-ib2
oracle_si_netswitch	       Systems Infrastructure Switch	   enkx4bda1sw-ip				 enkx4bda1sw-ip
oracle_si_netswitch	       Systems Infrastructure Switch	   enkx4bda1sw-ib3				 enkx4bda1sw-ib3
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node08-ilom				 enkbda1node08-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node06-ilom 			 enkx4bda1node06-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node17-ilom				 enkbda1node17-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node11-ilom				 enkbda1node11-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node04.enkitec.local/server 	 enkx4bda1node04.enkitec.local/server
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node18-ilom				 enkbda1node18-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node05-ilom 			 enkx4bda1node05-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node15-ilom				 enkbda1node15-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node14-ilom				 enkbda1node14-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node02-ilom				 enkbda1node02-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node01-ilom				 enkbda1node01-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node13-ilom				 enkbda1node13-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node06-ilom				 enkbda1node06-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node09-ilom				 enkbda1node09-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node03-ilom 			 enkx4bda1node03-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node16-ilom				 enkbda1node16-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node03-ilom				 enkbda1node03-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node10-ilom				 enkbda1node10-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node01-ilom 			 enkx4bda1node01-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node04-ilom				 enkbda1node04-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node05-ilom				 enkbda1node05-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node12-ilom				 enkbda1node12-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node02-ilom 			 enkx4bda1node02-ilom
weblogic_cluster	       Oracle WebLogic Cluster		   /EMGC_GCDomain/GCDomain/BIP_cluster		 BIP_cluster

31 rows selected.

Not found in this category. Try Agent Side target.

set lines 200
set pages 999
col ENTITY_TYPE for a30
col TYPE_DISPLAY_NAME for a30
col ENTITY_NAME for a35
col DISPLAY_NAME for a35
col EMD_URL for a60

SELECT ENTITY_TYPE,
       TYPE_DISPLAY_NAME,
       ENTITY_NAME,
       DISPLAY_NAME,
       EMD_URL
FROM   SYSMAN.EM_MANAGEABLE_ENTITIES
WHERE  MANAGE_STATUS = 2
AND    REP_SIDE_AVAIL = 0
AND    EMD_URL IS NOT NULL
ORDER  BY 1,2,3;

ENTITY_TYPE		       TYPE_DISPLAY_NAME	      ENTITY_NAME			  DISPLAY_NAME			      EMD_URL
------------------------------ ------------------------------ ----------------------------------- ----------------------------------- ------------------------------------------------------------
host			       Host			      enkx4bda1node01.enkitec.local	  enkx4bda1node01.enkitec.local       https://enkx4bda1node01.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node02.enkitec.local	  enkx4bda1node02.enkitec.local       https://enkx4bda1node02.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node03.enkitec.local	  enkx4bda1node03.enkitec.local       https://enkx4bda1node03.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node04.enkitec.local	  enkx4bda1node04.enkitec.local       https://enkx4bda1node04.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node05.enkitec.local	  enkx4bda1node05.enkitec.local       https://enkx4bda1node05.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node06.enkitec.local	  enkx4bda1node06.enkitec.local       https://enkx4bda1node06.enkitec.local:1830/emd/main/
host			       Host			      enkx4bdacli02.enkitec.local	  enkx4bdacli02.enkitec.local	      https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_apache		       Oracle HTTP Server	      /EMGC_GCDomain/GCDomain/ohs1	  ohs1				      https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_bda_cluster	       BDA Network		      BDA Network 1			  BDA Network 1 		      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_beacon		       Beacon			      EM Management Beacon		  EM Management Beacon		      https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_big_data_sql	       Oracle Big Data SQL	      bigdatasql_enkx4bda		  bigdatasql_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_cloudera_manager        Cloudera Manager 	      Cloudera Manager - enkx4bda	  Cloudera Manager - enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node01.enkitec.local:1830  enkx4bda1node01.enkitec.local:1830  https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node02.enkitec.local:1830  enkx4bda1node02.enkitec.local:1830  https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node03.enkitec.local:1830  enkx4bda1node03.enkitec.local:1830  https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node04.enkitec.local:1830  enkx4bda1node04.enkitec.local:1830  https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node05.enkitec.local:1830  enkx4bda1node05.enkitec.local:1830  https://enkx4bda1node05.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node06.enkitec.local:1830  enkx4bda1node06.enkitec.local:1830  https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bdacli02.enkitec.local:3872	  enkx4bdacli02.enkitec.local:3872    https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_emrep		       OMS and Repository	      Management Services and Repository  Management Services and Repository  https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_hadoop_cluster	       Hadoop Cluster		      enkx4bda				  enkx4bda			      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node01_enkx4bda	  DN_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node02_enkx4bda	  DN_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node03_enkx4bda	  DN_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node04_enkx4bda	  DN_enkx4bda1node04_enkx4bda	      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node05_enkx4bda	  DN_enkx4bda1node05_enkx4bda	      https://enkx4bda1node05.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node06_enkx4bda	  DN_enkx4bda1node06_enkx4bda	      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_failoverctl      Hadoop Failover Controller     FC_NNA_enkx4bda			  FC_NNA_enkx4bda		      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_failoverctl      Hadoop Failover Controller     FC_NNB_enkx4bda			  FC_NNB_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_hdfs	       Hadoop HDFS		      hdfs_enkx4bda			  hdfs_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_historyserver    Hadoop Job History Server      JHS_enkx4bda			  JHS_enkx4bda			      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_hive	       Hadoop Hive		      hive_enkx4bda			  hive_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_hive_metaserver  Hadoop Hive Metastore Server   Metastore_enkx4bda		  Metastore_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_hive_server      Hadoop Hive Server2	      HiveServer2_enkx4bda		  HiveServer2_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_hive_webhcat     Hadoop Hive WebHCat Server     WebHCat_enkx4bda			  WebHCat_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_hue	       Hadoop Hue		      hue_enkx4bda			  hue_enkx4bda			      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_impala	       Hadoop Impala		      impala_enkx4bda			  impala_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node01_enkx4bda	  ImpalaD_enkx4bda1node01_enkx4bda    https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node02_enkx4bda	  ImpalaD_enkx4bda1node02_enkx4bda    https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node03_enkx4bda	  ImpalaD_enkx4bda1node03_enkx4bda    https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node04_enkx4bda	  ImpalaD_enkx4bda1node04_enkx4bda    https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node06_enkx4bda	  ImpalaD_enkx4bda1node06_enkx4bda    https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_impala_server_cat Hadoop Impala Server Catalogue ImpalaCatSrv_enkx4bda		  ImpalaCatSrv_enkx4bda 	      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_impala_statestore Hadoop Impala State Store      StateStore_enkx4bda		  StateStore_enkx4bda		      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_journalnode      Hadoop Journal Node	      JN_enkx4bda1node01_enkx4bda	  JN_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_journalnode      Hadoop Journal Node	      JN_enkx4bda1node02_enkx4bda	  JN_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_journalnode      Hadoop Journal Node	      JN_enkx4bda1node03_enkx4bda	  JN_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_kerberos	       Kerberos 		      kerberos_enkx4bda 		  kerberos_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_mysql	       MySql			      mysql_enkx4bda			  mysql_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_namenode	       Hadoop NameNode		      NNA_enkx4bda			  NNA_enkx4bda			      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_namenode	       Hadoop NameNode		      NNB_enkx4bda			  NNB_enkx4bda			      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node01_enkx4bda	  NM_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node02_enkx4bda	  NM_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node03_enkx4bda	  NM_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node04_enkx4bda	  NM_enkx4bda1node04_enkx4bda	      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node05_enkx4bda	  NM_enkx4bda1node05_enkx4bda	      https://enkx4bda1node05.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node06_enkx4bda	  NM_enkx4bda1node06_enkx4bda	      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_oozie	       Hadoop Oozie		      oozie_enkx4bda			  oozie_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_oozie_server     Hadoop Oozie Server	      OozieServer_enkx4bda		  OozieServer_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_resourcemgr      Hadoop ResourceManager	      RMA_enkx4bda			  RMA_enkx4bda			      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_resourcemgr      Hadoop ResourceManager	      RMB_enkx4bda			  RMB_enkx4bda			      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_solr	       Hadoop Solr		      solr_enkx4bda			  solr_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_solr_server      Hadoop Solr Server	      SolrServer_enkx4bda		  SolrServer_enkx4bda		      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_spark_on_yarn    Hadoop Spark On Yarn	      spark_on_yarn_enkx4bda		  spark_on_yarn_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_yarn	       Hadoop Yarn		      yarn_enkx4bda			  yarn_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper        Hadoop ZooKeeper 	      zookeeper_enkx4bda		  zookeeper_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper_server Hadoop ZooKeeper Server	      ZKS_enkx4bda1node01_enkx4bda	  ZKS_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper_server Hadoop ZooKeeper Server	      ZKS_enkx4bda1node02_enkx4bda	  ZKS_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper_server Hadoop ZooKeeper Server	      ZKS_enkx4bda1node03_enkx4bda	  ZKS_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/

....

oracle_oms		       Oracle Management Service      enkx4bdacli02.enkitec.local:4889_Ma enkx4bdacli02.enkitec.local:4889_Ma https://enkx4bdacli02.enkitec.local:3872/emd/main/
							      nagement_Service			  nagement_Service
oracle_oms_console	       OMS Console		      enkx4bdacli02.enkitec.local:4889_Ma enkx4bdacli02.enkitec.local:4889_Ma https://enkx4bdacli02.enkitec.local:3872/emd/main/
							      nagement_Service_CONSOLE		  nagement_Service_CONSOLE
oracle_oms_pbs		       OMS Platform		      enkx4bdacli02.enkitec.local:4889_Ma enkx4bdacli02.enkitec.local:4889_Ma https://enkx4bdacli02.enkitec.local:3872/emd/main/
							      nagement_Service_PBS		  nagement_Service_PBS
oracle_si_pdu		       Systems Infrastructure PDU     enkx4bda1-pdua			  enkx4bda1-pdua		      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_si_pdu		       Systems Infrastructure PDU     enkx4bda1-pdub			  enkx4bda1-pdub		      https://enkx4bda1node02.enkitec.local:1830/emd/main/


164 rows selected

The following query shows just oracle_bda_cluster type of target.

col ENTITY_TYPE for a20
col TYPE_DISPLAY_NAME for a20
col ENTITY_NAME for a16
col DISPLAY_NAME for a16
col EMD_URL for a55
SELECT ENTITY_TYPE,
       TYPE_DISPLAY_NAME,
       ENTITY_NAME,
       DISPLAY_NAME,
       EMD_URL
FROM   SYSMAN.EM_MANAGEABLE_ENTITIES
WHERE  
ENTITY_TYPE = 'oracle_bda_cluster'
;
ENTITY_TYPE	     TYPE_DISPLAY_NAME	  ENTITY_NAME	   DISPLAY_NAME     EMD_URL
-------------------- -------------------- ---------------- ---------------- -------------------------------------------------------
oracle_bda_cluster   BDA Network	  BDA Network 1    BDA Network 1    https://enkx4bda1node02.enkitec.local:1830/emd/main/

Ok, we can see entity_type is oracle_bda_cluster for BDA Network. Both target name and display name are BDA Network 1.

Next, I will check whether I can rename the target name of BDA Network 1. I used emcli rename_target command in the past to rename OEM target. It usually works. So I run the following command:

[oracle@enkx4bdacli02 ~]$ emcli show_bda_clusters
BDA Network 1 : enkx4bda

[oracle@enkx4bdacli02 ~]$ emcli get_targets -targets="oracle_bda_cluster"
Status  Status           Target Type           Target Name                        
 ID                                                                               
-9      N/A              oracle_bda_cluster    BDA Network 1  

[oracle@enkx4bdacli02 ~]$ emcli rename_target -target_type="oracle_bda_cluster" -target_name="BDA Network 1" -new_target_name="X4BDA"
Rename not supported for the given Target Type.

No luck. It doesn’t work. If renaming target name not working, let me try to change display name.

[oracle@enkx4bdacli02 ~]$ emcli modify_target -type="oracle_bda_cluster" -name="BDA Network 1" -display_name="X4BDA"
Target "BDA Network 1:oracle_bda_cluster" modified successfully

It works. Rerun the query to check oracle_bda_cluster type.

ENTITY_TYPE	     TYPE_DISPLAY_NAME	  ENTITY_NAME	   DISPLAY_NAME     EMD_URL
-------------------- -------------------- ---------------- ---------------- -------------------------------------------------------
oracle_bda_cluster   BDA Network	  BDA Network 1    X4BDA	    https://enkx4bda1node02.enkitec.local:1830/emd/main/

Well it work partially. For some screens, it works perfectly.

But for some other screen, it still shows the same annoying name.

Another lesson I learned recently is that you need very careful in using default password when setting up BDA. Once setting up BDA using default password and OEM BDA Discovery is using these default password for Named Credentials, you will run into issue after you change default password later on. In the worst case, like Cloudera Manager’s password change, it requires the remove the current BDA target and redo the BDA Discovery. I may write this topic in a different blog if I have time.

Missing Classpath Issue during Sqoop Import

Recently I run into an issue when using Sqoop import with the following error messages:

17/02/22 10:44:17 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1487780389745_0001/
17/02/22 10:44:17 INFO mapreduce.Job: Running job: job_1487780389745_0001
17/02/22 10:44:27 INFO mapreduce.Job: Job job_1487780389745_0001 running in uber mode : false
17/02/22 10:44:27 INFO mapreduce.Job:  map 0% reduce 0%
17/02/22 10:44:38 INFO mapreduce.Job: Task Id : attempt_1487780389745_0001_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at org.apache.sqoop.manager.oracle.OracleUtils.escapeIdentifier(OracleUtils.java:36)
	at org.apache.sqoop.manager.oracle.OraOopOracleQueries.getTableColumns(OraOopOracleQueries.java:683)
	at org.apache.sqoop.manager.oracle.OraOopOracleQueries.getTableColumns(OraOopOracleQueries.java:767)
	at org.apache.sqoop.manager.oracle.OraOopDBRecordReader.getSelectQuery(OraOopDBRecordReader.java:195)
	at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
	at org.apache.sqoop.manager.oracle.OraOopDBRecordReader.nextKeyValue(OraOopDBRecordReader.java:356)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

I used sqoop import many times and have never run into this issue. This made me wondering what’s going.
I checked out the CDH version. The version is 5.8.0 and I used to use CDH 5.7 a lot.

[root@quickstart lib]# hadoop version
Hadoop 2.6.0-cdh5.8.0
Subversion http://github.com/cloudera/hadoop -r 57e7b8556919574d517e874abfb7ebe31a366c2b
Compiled by jenkins on 2016-06-16T19:38Z
Compiled with protoc 2.5.0
From source with checksum 9e99ecd28376acfd5f78c325dd939fed
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.8.0.jar

Check out the sqoop version and it is 1.4.6

[root@quickstart lib]# sqoop version
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/02/22 13:37:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
Sqoop 1.4.6-cdh5.8.0
git commit id 
Compiled by jenkins on Thu Jun 16 12:25:21 PDT 2016

The Cloudera Quickstart VM is using CDH 5.8.0 and this is the first time I am touching CDH 5.8. It might relate to CDH 5.8. Did some research and found someone had the exact issue as I did. It is a sqoop bug: SQOOP-2999Sqoop ClassNotFoundException (org.apache.commons.lang3.StringUtils) is thrown when executing Oracle direct import map task.

It looks like the new version of Sqoop in the CDH 5.8.0 used a new class OracleUtils that has a dependency on org.apache.commons.lang3.StringUtils. The jar that contains the class is not on the classpath that Sqoop passes to the mappers. Therefore the exception thrown at runtime. Unfortunately the fixed is in CDH 5.8.2 as indicated in the Cloudera’s release note and the first sqoop bug in the note is this one: SQOOP-2999.

At this moment, I don’t want to go through the upgrade or patching of CDH. Tried to see whether there is any workaround on the internet. Could not find anything related to this one. So I will use my way to fix this issue.

First I need to find out which jar file containing this class. There is one post discussing about this class and the related jar file: commons-lang3-3.1.jar. Luckily I do have a commons-lang3-3.1.jar file in the system.

[root@quickstart lib]# locate commons-lang3-3.1.jar
/usr/lib/mahout/lib/commons-lang3-3.1.jar

Next I need to figure out how to add this jar to the class path for Sqoop. Majority of Hadoop environment variables setting can be found at /etc/default directory and I did found a file called sqoop2-server.

[root@quickstart default]# cat sqoop2-server
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

CATALINA_BASE=/var/lib/sqoop2/tomcat-deployment
SQOOP_USER=sqoop2
SQOOP_CONFIG=/etc/sqoop2/conf
SQOOP_LOG=/var/log/sqoop2
SQOOP_TEMP=/var/run/sqoop2
SQOOP_PID=/var/run/sqoop2/sqoop-server-sqoop2.pid
CATALINA_BIN=/usr/lib/bigtop-tomcat/bin
CATALINA_TMPDIR=/var/tmp/sqoop2
CATALINA_OPTS=-Xmx1024m
CATALINA_OUT=/var/log/sqoop2/sqoop-tomcat.log
#AUX_CLASSPATH=

Uncomment AUX_CLASSPATH= line and change to the following
AUX_CLASSPATH=/usr/lib/mahout/lib/commons-lang3-3.1.jar
After restarted Sqoop service, no luck. Still not working.

I could change /etc/sqoop2/conf/setenv.sh file and add this jar file to CLASSPATH. But there is no CLASSPATH used in this script. I also don’t want to change something and completely forgot later on. So I tried the option to add CLASSPATH on Cloudera Manager. But the tricky question is to which parameter I can add.

As Sqoop is managed by YARN, I tried to the add the class in the YARN Application Classpath or yarn.application.classpath.
The result is shown below.
sqoop_classpath_missing_issue
After the change, redeploy the client configuration and bounce the CDH cluster. Rerun the Sqoop job and it completed without error. Please note: this is just temporary workaround solution. The right approach should still be upgrade to the new version of CDH.