Use OEM 13c R2 to Discover Oracle BDA

OEM 13c Cloud Control is a powerful monitoring tool, not only for Exadata and Oracle database, but also for Oracle Big Data Appliance (BDA). There are many articles or blogs about Exadata Discovery using OEM 12c or 13c. But not many places discuss the OEM BDA Discovery, especially using the new version of OEM, 13c Cloud Control. In this blog, I am going to discuss the steps to discover BDA using OEM 13c R2.

First, do not use OEM 13c R1 for BDA Discovery. It is very time consuming and very likely not going to work. OEM 13c R2 is much better, at least I can successfully do the BDA Discovery on all of the BDAs I have worked on.

Secondly, unlike OEM Exadata Discovery, BDA Discovery usually requires one extra step before the Manual OEM BDA Discovery by using bdacli enable em command first. Theoretically if works, I don’t need to do anything in manual BDA discovery process. Unfortunately I have never run into this perfect situation in different BDA environment and always get certain kind of errors at the end.

Preparation
There are a few useful notes about OEM BDA Discovery.
1) Instructions to Install 12.1.0.4 BDA Plug-in on Oracle Big Data Appliance (BDA) V2.*/V3.0.*/V3.1/V4.* (Doc ID 1682558.1)
2) BDA Credentials for Enterprise Manager 13.x Plugin (Doc ID 2206111.1)
3) Instructions to Enable / Disable the 13.x BDA Enterprise Manager Plug-in on Oracle Big Data Appliance (BDA) V4.5-V4.7 (Doc ID 2206207.1)

Execute bdacli command
Run bdacli enable em. For BDA version below 4.5, run command bdacli enable em –force. I am almost 100% guarantee you won’t see the successful completion message from this command. For example, get the following error at the end.

INFO: Running: /opt/oracle/emcli_home/emcli discover_bda_cluster -hostname=enkx4bda1node01.enkitec.local -cloudera_credential=BDA_ENKX4BDA_CM_CRED -host_credential=BDA_ENKX4BDA_HOSTS_CRED -cisco_credential=BDA_ENKX4BDA_CISCO_CRED -ilom_credential=BDA_ENKX4BDA_ILOM_CRED -infiniband_credential=BDA_ENKX4BDA_IB_CRED -pdu_credential=BDA_ENKX4BDA_PDU_CRED -cisco_snmp_string="snmp_v3;;SNMPV3Creds;authUser:none;authPwd:none;authProtocol:none;privPwd:none" -pdu_snmp_string="snmp_v1v2_v3;;SNMPV1Creds;COMMUNITY:none" -switch_snmp_string="snmp_v1v2_v3;;SNMPV3Creds;authUser:none;authPwd:none;authProtocol:none;privPwd:none"
ERROR: Syntax Error: Unrecognized argument -cisco_snmp_string #Step Syntax Error: Unrecognized argument -pdu_snmp_string#
Are you sure you want to completely cleanup em and lose all related state ?

When see the above message, always type in N and not rollback the changes. Basically you have OEM agent deployed, just need to figure out which node you want to use as the start point for Manual OEM BDA Discovery.

On each node, run the following command:

[root@enkx4bda1node06 ~]# java -classpath /opt/oracle/EMAgent/agent_13.2.0.0.0/jlib/*:/opt/oracle/EMAgent/agent_13.2.0.0.0/plugins/oracle.sysman.bda.discovery.plugin_13.2.2.0.0/archives/* oracle.sysman.bda.discovery.pojo.GetHadoopClusters http://enkx4bda1node03.enkitec.local:7180/api/v1/clusters admin admin_password

You should see the error below for the execution on many nodes.

Apr 10, 2017 10:14:44 AM com.sun.jersey.api.client.ClientResponse getEntity
SEVERE: A message body reader for Java class [Loracle.sysman.bda.discovery.pojo.Items;, and Java type class [Loracle.sysman.bda.discovery.pojo.Items;, and MIME media type text/html was not found
Apr 10, 2017 10:14:44 AM com.sun.jersey.api.client.ClientResponse getEntity
SEVERE: The registered message body readers compatible with the MIME media type are:
*/* ->
  com.sun.jersey.core.impl.provider.entity.FormProvider
  com.sun.jersey.core.impl.provider.entity.MimeMultipartProvider
  com.sun.jersey.core.impl.provider.entity.StringProvider
  com.sun.jersey.core.impl.provider.entity.ByteArrayProvider
  com.sun.jersey.core.impl.provider.entity.FileProvider
  com.sun.jersey.core.impl.provider.entity.InputStreamProvider
  com.sun.jersey.core.impl.provider.entity.DataSourceProvider
  com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$General
  com.sun.jersey.core.impl.provider.entity.ReaderProvider
  com.sun.jersey.core.impl.provider.entity.DocumentProvider
  com.sun.jersey.core.impl.provider.entity.SourceProvider$StreamSourceReader
  com.sun.jersey.core.impl.provider.entity.SourceProvider$SAXSourceReader
  com.sun.jersey.core.impl.provider.entity.SourceProvider$DOMSourceReader
  com.sun.jersey.core.impl.provider.entity.XMLRootElementProvider$General
  com.sun.jersey.core.impl.provider.entity.XMLListElementProvider$General
  com.sun.jersey.core.impl.provider.entity.XMLRootObjectProvider$General
  com.sun.jersey.core.impl.provider.entity.EntityHolderReader

For certain node, you could see successful message and showing below.

enkx4bda;;

In my case, it is node 2. So I will use Node 2 for my manual BDA Discovery in the following steps.

Manual OEM BDA Discovery
Logon to OEM as sysman user. Select Add Target -> Add Target Manually.

Select Add Targets Using Guided Process

Select Oracle Big Data Appliance

The Add Targets Manually pages shows up. Select node2 from the list. Click Next.

After it completes, it will show the following hardware information. Click Next.

The Hardware Credentials screen shows up. If all Host credentials show green sign, you don’t need to do anything related to Host. Go to the next one, for example, IB Switch. Select Set Credentials -> All Infiniband Switches . Then set SNMP Credentials type and community string. Majority of the time, input public for community string. Then click OK.

If successful, it shows the green check.

Following the similar procedure for all other hardware components, like ILOM, PDU and Cisco Switch. At the end, you should see the following screen.
One interesting note about PDU. PDU component always behave in a weird way during the discovery. For this case, it shows successful with green check, but later on OEM shows PDUs as DOWN status after the discovery. In my other discovery works for different BDA environments, the green check has never shown up in this page, but PDUs shows UP status after the discovery. So the result is inconsistent.

Click Next. The screen for Cloudera Manager shows up. Click Edit, verify the credential for admin user for Cloudera Manager. Then click Next.

The Software page shows up, click Next.

The review page shows up, click Submit

If successful, will see the screen message below, click OK.

The BDA Discovery is completed.
.
You might notice the new BDA cluster is called BDA Network1. This is not a good way to name a cluster, especially you have multiple BDAs under the management from the same OEM. I don’t understand why not to use BDA’s cluster name or Cloudera Manager’s cluster name. Either one will be much better than this naming. Even worse, you can change a lot of target name in OEM, but not for this one. I have another blog (Change BDA Cluster Name in OEM Cloud Control 13c) discussing a partial workaround for this issue.

To view the detail of a host target, you can have the following:

The presentation looks better than OEM 12c. In general, OEM 13c for BDA is good one. But pay attention to the followings. Otherwise you will spend a lot of additional time.
1) Before performing OEM BDA Discovery, make sure you have changed all of your default passwords on BDA. It’s easier to use default password during the discovery, but a huge pain after you change passwords for certain user accounts used in BDA discovery. Basically, update the Named Credentials is not enough and you have to delete the whole BDA target in OEM and redo the discovery.

2) Similarly, if configure TLS with Cloudera Manager after BDA Discovery, you will have to remove the BDA target and redo the discovery. It is a clearly a bug in OEM, at least not fixed at the time I am writing this blog.

3) Sometimes you might see tons of alerts from almost every ports in the Cisco switch. If from a few ports, I might believe it. But for almost every port, there is no way this is the right alert. As matter of fact, Oracle Support confirmed it seem false alert. At the time I had to do the BDA Rediscovery after configuring TLS with Cloudera Manager, I happened to notice all Cisco port alerts were gone after BDA rediscovery.

4) Both Oracle document and Oracle support says OEM 13c R2 supports BDA v4.5+ and any version below it is not supported. It’s true the lower BDA version would run into additional issues, but I managed to find workaround and make it working for BDA v4.3.

Validate Java Keystore on BDA

In many projects, I need to create a keystore to store SSL certifications. Majority of times I hardly worry about the validity of a Keystore. My keystores just works and I can see the content of all certifications by using keytool command. It works pretty well until recently when I needed to configure TLS for Cloudera Manager on BDA.
BDA has its own command to enable TLS for Cloudera Manager,Hue and Oozie in a pretty easy way. Just run command bdacli enable https_cm_hue_oozie. The only drawback for this command is that it is using self-signed certificate, not the users’ own certificates. Although it works good from security perspective, it’s not a good idea in the long run. I need to replace Oracle’s self-signed certificates with client’s certificates on BDA. Either Cloudera’s approach or Oracle’s approach is not going to work. Anyway, it is a different topic and I will discuss it in a different blog.

During my work to enable TLS with Cloudera Manager using client’s certificates, I run into various issues. After looking at many issues in detail, I suspect the key issue of my problem might come from the incorrectness of my keystore. Unfortunately to configure TLS with Cloudera Manager, agent and services, it requires to shut down CDH cluster and many steps to reach the stage I can test the keystore. It’s too time consuming for a busy BDA cluster. This blog is to discuss the approach to find a way, fast, easy and independent of CDH cluster to verify the content of a keystore is valid or not. Most importantly avoid the bridge building mistake shown below.

As my topic is related to BDA, I am going to list the ways to create a keystore in both Cloudera and Oracle ways.

Cloudera Way
See Cloudera’s document Step 1: Obtain Encryption Keys and Certificates for Cloudera Manager Server
I just highlight the key steps and commands as follows:
1. Generate Keystore for Cloudera Manager Host (Node 3 on BDA)

# keytool -genkeypair -alias cmhost -keyalg RSA -keystore \
/opt/cloudera/security/jks/cmhost-keystore.jks -keysize 2048 -dname \
"CN=cmhost.sec.example.com,OU=Security,O=Example,L=Denver,ST=Colorado,C=US" \
-storepass password -keypass password

2. Generate a CSR for the host.

# keytool -certreq -alias cmhost \
-keystore /opt/cloudera/security/jks/cmhost-keystore.jks \
-file /opt/cloudera/security/x509/cmhost.csr -storepass password \
-keypass password

3. Submit the .csr file created by the -certreq command to Certificate Authority to obtain a server certificate.
4. Copy the root CA certificate and any intermediate CA certificates to /opt/cloudera/security/CAcerts/.
There is no /opt/cloudera/security/CAcerts/ directory exist on BDA and I don’t believe it is necessary.
Actually I like Oracle approach, just copy the root and intermediate CA certificates to /opt/cloudera/security/jks directory. But I do like Cloudera’s approach to import root CA and intermediate CA certificates to the alternative system JDK truststore, jssecacerts, before importing them to the Java keystore on BDA. This is what Oracle’s approach is missing.

# cp $JAVA_HOME/jre/lib/security/cacerts $JAVA_HOME/jre/lib/security/jssecacerts

# keytool -importcert -alias RootCA -keystore $JAVA_HOME/jre/lib/security/jssecacerts \
-file /opt/cloudera/security/CAcerts/RootCA.cer -storepass changeit

# keytool -importcert -alias SubordinateCA -keystore \
$JAVA_HOME/jre/lib/security/jssecacerts \
-file /opt/cloudera/security/CAcerts/SubordinateCA.cer -storepass changeit

5. Import the root and intermediate certificates into keystore.

# keytool -importcert -trustcacerts -alias RootCA -keystore \
/opt/cloudera/security/jks/cmhost-keystore.jks -file \
/opt/cloudera/security/CAcerts/RootCA.cer -storepass password

# keytool -importcert -trustcacerts -alias SubordinateCA -keystore \ 
/opt/cloudera/security/jks/cmhost-keystore.jks -file \
/opt/cloudera/security/CAcerts/SubordinateCA.cer -storepass password

6. Import the signed host certificate

# cp certificate-file.cer  /opt/cloudera/security/x509/cmhost.pem

# keytool -importcert -trustcacerts -alias cmhost \ 
-file /opt/cloudera/security/x509/cmhost.pem \ 
-keystore /opt/cloudera/security/jks/cmhost-keystore.jks -storepass password

Oracle Way
See Oracle Note How to Use Certificates Signed by a User’s Certificate Authority for Web Consoles and Hadoop Network Encryption Use on the BDA (Doc ID 2187903.1)

1. Create the keystore on all nodes called /opt/cloudera/security/jks/node.jks
This is the place I like Oracle’s approach. Cloudera does require to have keystore in all hosts, but document in a way in separate chapters: Cloudera Manager and Agent. Only when I am done with the configuration, I realized why not combine them together in one single step. This is where Oracle’s approach is much simpler and easy.

# dcli -C keytool -validity 720 -keystore /opt/cloudera/security/jks/node.jks \
-alias \$HOSTNAME -genkeypair -keyalg RSA -storepass $PW -keypass $PW \
-dname "CN=\${HOSTNAME},OU=,O=,L=,S=,C="  

# dcli -C ls -l /opt/cloudera/security/jks/node.jks

2. Create CSR for each node.

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias \$HOSTNAME \
-certreq -file /root/\$HOSTNAME-cert-file -keypass $PW -storepass $PW 

3. Submit the node specific CSR to CA and signed.
4. Copy the signed certificate to cert_file_signed
cert_file_signed_bdanode01 would be copied to Node 1 as: /opt/cloudera/security/jks/cert_file_signed
cert_file_signed_bdanode02 would be copied to Node 2 as: /opt/cloudera/security/jks/cert_file_signed

cert_file_signed_bdanode0n would be copied to Node n as: /opt/cloudera/security/jks/cert_file_signed
5. Copy CA public certificate to /opt/cloudera/security/jks/ca.crt

# cp /tmp/staging/ca.crt /opt/cloudera/security/jks/ca.crt  
# dcli -C -f /opt/cloudera/security/jks/ca.crt -d /opt/cloudera/security/jks/ca.crt  
# dcli -C ls -ltr /opt/cloudera/security/jks/ca.crt

6. Import the CA public certificate /opt/cloudera/security/jks/ca.crt into the keystore on each node

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias CARoot \
-import -file /opt/cloudera/security/jks/ca.crt -storepass $PW -keypass $PW -noprompt

7. Import the signed certificate for each node on BDA

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias \$HOSTNAME \
-import -file /opt/cloudera/security/jks/cert_file_signed -storepass $PW -keypass $PW -noprompt 

So for TLS on BDA, the keystore file is /opt/cloudera/security/jks/node.jks. Another important file is Truststore at /opt/cloudera/security/jks/.truststore. The approach to build this file is quite similar as node.jks.

Ok, I have the node.jks file. How to verify it that it is a valid one? Like many people, I used to use keytool command to check out the content of keystore file. For example,

[root@enkx4bda1node01 ~]# keytool -list -v -keystore /opt/cloudera/security/jks/node.jks
Enter keystore password:  

*****************  WARNING WARNING WARNING  *****************
* The integrity of the information stored in your keystore  *
* has NOT been verified!  In order to verify its integrity, *
* you must provide your keystore password.                  *
*****************  WARNING WARNING WARNING  *****************

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: enkx4bda1node01.enkitec.local
Creation date: Mar 5, 2016
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=enkx4bda1node01.enkitec.local, OU=, O=, L=, ST=, C=
Issuer: CN=enkx4bda1node01.enkitec.local, OU=, O=, L=, ST=, C=
Serial number: 26a1471b
Valid from: Sat Mar 05 02:17:40 CST 2016 until: Fri Feb 23 02:17:40 CST 2018
Certificate fingerprints:
	 MD5:  10B:30:3A:40:CD:94:38:7D:3A:33:1F:DD:49:B7:DF:99
	 SHA1: 98:6F:FC:84:68:BA:BD:25:37:8A:1B:D6:07:6F:FE:14:41:76:5B:09
	 SHA256: L3:43:4C:4C:9B:0E:36:18:DD:F1:10:84:46:9E:77:AA:BB:C7:85:E5:FC:19:4F:29:7F:70:BA:D4:0C:55:AD:F7
	 Signature algorithm name: SHA256withRSA
	 Version: 3

Extensions: 

#1: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: GH FD 23 C9 9A A3 28 F9   3D C5 3B 1E E7 97 49 4E  ......(.=.:...IN
0010: 12 69 27 D5                                        .i(.
]
]

*******************************************
*******************************************

It is usually works, but with certain limitations. Even the keystore has all the necessary certificates, if they are not in the right order, it might not be a valid one. As I suspect my keystore on BDA might not be a valid one, I tried to find other potential tools beyond keytool. Luckily, I found a blog Installing Trusted Certificates into a Java Keystore by Oracle’s Jim Connors. It’s a very nice blog about various tools for keystore. I am really interested in one of the tool he talked about: using weblogic.jar‘s ValidateCertChain program.

I happened to build an OEM Cloud Control 13c R2 environment. Ok, let me give it a try.

[root@enkx4bdacli02 tmp]# java -cp /u01/app/oracle/oem/wlserver/server/lib/weblogic.jar utils.ValidateCertChain -jks enkx4bda1node03.enkitec.com node.jks
Cert[0]: CN=enkx4bda1node03.enkitec.com,OU=Bigdata,O=Enkitec,L=Irving,ST=TX,C=US
Certificate chain is incomplete, can't confirm the entire chain is valid
Certificate chain appears valid

It indeed find something and tell me my certificate chain is incomplete. This gives me the clue to focus only on the steps in building keystore. After I figured out the issue and fixed the import sequence of certificates, rerun the command again. Here is the result:

[root@enkx4bdacli02 tmp]# java -cp /u01/app/oracle/oem/wlserver/server/lib/weblogic.jar utils.ValidateCertChain -jks enkx4bda1node03.enkitec.com node.jks
Cert[0]: CN=enkx4bda1node03.enkitec.com,OU=Bigdata,O=Enkitec,L=Irving,ST=TX,C=US
Cert[1]: CN=EnkLab Intermediate CA,OU=Bigdata,O=Enkitec,ST=Texas,C=US
Cert[2]: CN=EnkLab ROOT CA,OU=Bigdata,O=Enkitec,L=Irving,ST=TX,C=US
Certificate chain appears valid

Looks much better. It correctly shows there are one root certificate, one intemediate CA certificate, and one host certificate. This keystore is one of my major issues in building keystore on BDA.

There is another command, openssl s_client, to validate keystore, but only useful when everything is configured.

# openssl s_client -connect enkx4bda1node03.enkitec.com:7183 -CAfile root.enkitec.com.cert.pem
CONNECTED(00000003)
depth=2 C = US, ST = TX, L = Irving, O = Enkitec, OU = bigdata, CN = Enklab ROOT CA
verify return:1
depth=1 C = US, ST = TX, O = Enkitec, OU = bigdata, CN = Enklab Intermediate CA
verify return:1
depth=0 C = US, ST = TX, L = Irving, O = Enkitec, OU = bigdata, CN = enkx4bda1node03.enkitec.com
verify return:1
---
Certificate chain
 0 s:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=enkx4bda1node03.enkitec.com
   i:/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
 1 s:/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
   i:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
 2 s:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
   i:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
---
Server certificate
-----BEGIN CERTIFICATE-----

MIIDXTCCAkWgAwIBAgIEQn3HnzANBgkqhkiG9w0BAQsFADBfMQkwBwYDVQQGEwAx
CTAHBgNVBAgTADEJMAcGA1UEBxMAMQkwBwYDVQQKEwAxCTAHBgNVBAsTADEmMCQG
A1UEAxMdZW5reDRiZGExbm9kZTAzLmVua2l0ZWMubG9jYWwwHhcNMTYwMzA1MDgx
NzQ1WhcNMTgwMjIzMDgxNzQ1WjBfMQkwBwYDVQQGEwAxCTAHBgNVBAgTADEJMAcG
A1UEBxMAMQkwBwYDVQQKEwAxCTAHBgNVBAsTADEmMCQGA1UEAxMdZW5reDRiZGEx
bm9kZTAzLmVua2l0ZWMubG9jYWwwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQDXcThbyBV4FAm2EJJBhZpg5XLqRcswMm748QUxBzTBj+LeXZJw7wTX3SzJ
Eup6YeJKczDYTjPLpHZ6ruOnhz4WSA/39e+U9MvqNZMnwdwgA7/d++4BA4ZGWs1N
3G/NmYHR1eKJntPFrExz/1XSJpW7xVfAaNsQNUb9HkAEtXN25GOF/H7jQBwxx5Wq
mnIZAgNC7shg6DCusvaURllsOih+XY4kf8HYKLLihXUmbeNauG/ixZyXm3kKu5mN
vfXF48Y4OKMHkYMS5BfZzaRw43+PWIWPbsy2RR+GRypsFMSCa5MHIwL+2tHJHBwC
kwXMB7RlA7yVd57iXPzlCAf1mijjAgMBAAGjITAfMB0GA1UdDgQWBBQ20j1Jr+LG
ejzGFNVNZIHybvIstjANBgkqhkiG9w0BAQsFAAOCAQEArZ6x6qIRxhqJ8Qd20Xkf
T3NsbzEUMBIGA1UECgwLU3RhdG5ldHQgU0YxDjAMBgNVBAsMkFDs1FAjXrt8fo7S
QTVe225bCiTYgIJl7UwOAonKBZLRIhwjbh1TDij1iyNuSrX1kisVkrmtQrsNTpqH
D8m3k1M6XCUU3RV2+I6UY2WhLNvojlCYPXnQHXo5BJPDRuaXQu/OUi2cr5LVzOhC
5NdBjMUDwfsWx5NYtTK5iNvt7CBGZOXF5RgdDhZMywR0qY0pMiBjGoCxvhv9v8Ob
xk/WfbfXfcviUrb5lnqCX8NUG+/fKv09Csx0CBiXXNU+9R5HAlTZG5xptIi22CXZ
Kw==
-----END CERTIFICATE-----
subject=/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=enkx4bda1node03.enkitec.com
issuer=/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
---
No client certificate CA names sent
Server Temp Key: ECDH, secp521r1, 521 bits
---
SSL handshake has read 4430 bytes and written 443 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 39023B1EB131C30355F20CD8F012DCF2FFC95E1A1F9F8D8D2B6954942E9
    Session-ID-ctx: 
    Master-Key: XMB7RlA7yVd57iXPzl5EE73EAAB9B18B04B2718CAf1mijjAgMBAA5126650B5A3GjITAfM8EA269DBFE17A750EBBC5EC
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 9023528453
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---
closed

Finding out Keystore and Truststore Passwords on BDA

I am working in a project involving configuring SSL with Cloudera Manager on BDA. There are several ways to do it: go with Oracle’s bdacli approach or use Cloudera’s approach. For BDA related work, I usually prefer Oracle’s BDA approach because it needs to write some information to Oracle BDA’s configuration files, which are usually outside the control of Cloudera Manager. Cloudera’s approach is definitely working as well. But during the time when doing BDA upgrade or patching, if mammoth couldn’t find the correct value in BDA’s configuration files, it might cause unnecessary trouble. For example, if mammoth think certain features are not enabled, then it could skip certain steps to disable the features before upgrade. Anyway, it is another unrelated topic.

To enable TLS on Cloudera Manager is pretty easy on BDA, instead of doing so many steps stated in Cloudera Manager’s document. On BDA, just run the following command:
bdacli enable https_cm_hue_oozie

The command will automatically enable TLS for all major services on CDH, such Cloudera Manager, Hue and Oozie. Please note: TLS on Cloudera Manager agent is automatically enabled during BDA installation. Usually running this command is enough for many clients as client just need to encrypt the content when communicating
with Cloudera Manager. There is a downside for this approach: BDA uses self-signed certificates during the execution of bdacli enable https_cm_hue_oozie. This kind of self-signed certificate is good for security, but sometime can be annoying with browsing alerts. Therefore some users might prefer to use their own signed SSL certificates.

After working with Eric from Oracle Support, he recommended a way actually pretty good documented in Doc ID 2187903.1: How to Use Certificates Signed by a User’s Certificate Authority for Web Consoles and Hadoop Network Encryption Use on the BDA. The key of this approach is to get keystore’s and truststore’s paths and passwords, creating new keystore and truststore, and then importing customer’s certificates. Unfortunately, this approach works for BDA version 4.5 and above. It is not going to work in my current client environment, which is using BDA v4.3. One of major issue is that BDA v4.5 and above has the following bdacli commands while BDA v4.3 doesn’t have the following commands:
bdacli getinfo cluster_https_keystore_password
bdacli getinfo cluster_https_truststore_password

Eric then recommended a potential workaround by querying MySQL database directly by using the commands below:

use scm;
select * from CONFIGS where ATTR = 'truststore_password' or ATTR = 'keystore_password'; 

I then used two BDAs in our lab for the verification.
First, I tested on our X4 Starter rack.

[root@enkx4bda1node01 ~]# bdacli getinfo cluster_https_keystore_password
Enter the admin user for CM (press enter for admin): 
Enter the admin password for CM: 
******

[root@enkx4bda1node01 ~]# bdacli getinfo cluster_https_truststore_password
Enter the admin user for CM (press enter for admin): 
Enter the admin password for CM: 

Interestingly, the keystore password is still showing ****** while truststore password is empty. I can understand empty password for truststore as nothing is configured for truststore. But keystore password shouldn’t show hidden value as ******.

Query MySQL db on the same rack.

[root@enkx4bda1node03 ~]# mysql -u root -p
Enter password: 
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| activity_monitor   |
| hive               |
| host_monitor       |
| hue                |
| mysql              |
| navigator          |
| navigator_metadata |
| oozie              |
| performance_schema |
| reports_manager    |
| resource_manager   |
| scm                |
| sentry_db          |
| service_monitor    |
| studio             |
+--------------------+
16 rows in set (0.00 sec)

mysql> use scm;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

mysql> select * from CONFIGS where ATTR = 'truststore_password' or ATTR = 'keystore_password'; 
+-----------+---------+-------------------+--------+------------+---------+---------------------+-------------------------+----------------------+---------+
| CONFIG_ID | ROLE_ID | ATTR              | VALUE  | SERVICE_ID | HOST_ID | CONFIG_CONTAINER_ID | OPTIMISTIC_LOCK_VERSION | ROLE_CONFIG_GROUP_ID | CONTEXT |
+-----------+---------+-------------------+--------+------------+---------+---------------------+-------------------------+----------------------+---------+
|         8 |    NULL | keystore_password | ****** |       NULL |    NULL |                   2 |                       2 |                 NULL | NONE    |
+-----------+---------+-------------------+--------+------------+---------+---------------------+-------------------------+----------------------+---------+
1 row in set (0.00 sec)

MySQL database also store the password as *****. I remember my colleague mentioned this BDA has some issue. This could be one of them.

Ok, this rack doesn’t really tell me anything and I move to the 2nd full rack BDA. Perform the same commands there.

[root@enkbda1node03 ~]# bdacli getinfo cluster_https_keystore_password 
Enter the admin user for CM (press enter for admin): 
Enter the admin password for CM: 
KUSld8yni8PMQcJbltvCnZEr2XG4BgKohAfnW6O02jB3tCP8v1DYlbMO5PqhJCVR

[root@enkbda1node03 ~]# bdacli getinfo cluster_https_truststore_password
Enter the admin user for CM (press enter for admin): 
Enter the admin password for CM: 


[root@enkbda1node03 ~]# mysql -u root -p
Enter password: 
mysql> use scm;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from CONFIGS where ATTR = 'truststore_password' or ATTR = 'keystore_password'; 
+-----------+---------+---------------------+------------------------------------------------------------------+------------+---------+---------------------+-------------------------+----------------------+---------+
| CONFIG_ID | ROLE_ID | ATTR                | VALUE                                                            | SERVICE_ID | HOST_ID | CONFIG_CONTAINER_ID | OPTIMISTIC_LOCK_VERSION | ROLE_CONFIG_GROUP_ID | CONTEXT |
+-----------+---------+---------------------+------------------------------------------------------------------+------------+---------+---------------------+-------------------------+----------------------+---------+
|         7 |    NULL | keystore_password   | KUSld8yni8PMQcJbltvCnZEr2XG4BgKohAfnW6O02jB3tCP8v1DYlbMO5PqhJCVR |       NULL |    NULL |                   2 |                       0 |                 NULL | NULL    |
|       991 |    NULL | truststore_password | NULL                                                             |       NULL |    NULL |                   2 |                       1 |                 NULL | NONE    |
+-----------+---------+---------------------+------------------------------------------------------------------+------------+---------+---------------------+-------------------------+----------------------+---------+
2 rows in set (0.00 sec)

MySQL database show same value as the value as the result from command bdacli getinfo cluster_https_keystore_password. This is exactly what I want to know. It looks like I can use MySQL query to get the necessary passwords for my work.

One side note: In case you want to check out those self-signed certificates on BDA, run the following command. When prompting for password, just press ENTER.

[root@enkx4bda1node03 ~]# bdacli getinfo cluster_https_keystore_path
Enter the admin user for CM (press enter for admin): 
Enter the admin password for CM: 
/opt/cloudera/security/jks/node.jks

[root@enkx4bda1node03 ~]# keytool -list -v -keystore /opt/cloudera/security/jks/node.jks
Enter keystore password:  

*****************  WARNING WARNING WARNING  *****************
* The integrity of the information stored in your keystore  *
* has NOT been verified!  In order to verify its integrity, *
* you must provide your keystore password.                  *
*****************  WARNING WARNING WARNING  *****************

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: enkx4bda1node03.enkitec.local
Creation date: Mar 5, 2016
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=enkx4bda1node03.enkitec.local, OU=, O=, L=, ST=, C=
Issuer: CN=enkx4bda1node03.enkitec.local, OU=, O=, L=, ST=, C=
Serial number: 427dc79f
Valid from: Sat Mar 05 02:17:45 CST 2016 until: Fri Feb 23 02:17:45 CST 2018
Certificate fingerprints:
	 MD5:  A1:F9:78:EE:D4:C7:C0:D0:65:25:4C:30:09:D8:18:6E
	 SHA1: 8B:E3:7B:5F:76:B1:81:33:35:03:B9:00:97:D0:F7:F9:03:F9:74:C2
	 SHA256: EC:B5:F3:EB:E5:DC:D9:19:DB:2A:D6:3E:71:9C:62:55:10:0A:59:59:E6:98:2C:AD:23:AC:24:48:E4:68:6A:AF
	 Signature algorithm name: SHA256withRSA
	 Version: 3

Extensions: 

#1: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 36 D2 3D 49 AF E2 C6 7A   3C C6 14 D5 4D 64 81 F2  6.=I...z<...Md..
0010: 6E F2 2C B6                                        n.,.
]
]

*******************************************
*******************************************

If you dont’ like this kind of default password, you can use command keytool -storepasswd -keystore /opt/cloudera/security/jks/node.jks to change the password.

Change BDA Cluster Name in OEM Cloud Control 13c

Oracle OEM Cloud Control 13c has some improvement than OEM 12c. But for BDA, the most weird thing after OEM BDA Discovery is the target name for BDA cluster. By default, the target name is called BDA Network 1 for the first BDA cluster, and BDA Network 2 for 2nd BDA cluster. Think about this way, Oracle BDA already has a different BDA cluster name than Cloudera Manager’s cluster name. Right now OEM comes out with another different cluster name. If we have two BDAs and use OEM 13c to discover the DR BDA first, the DR BDA will take BDA Network 1 name. Then primary BDA will be discovered as BDA Network 2. It’s really an annoying new feature in OEM 13c. Ideally, I want to change the BDA Network Name to something meaningful. BDA Network 1 is really an useless naming standard, just like the door below. Mapping to either BDA cluster name or Cloudera Manager’s Cluster name is fine with me. In this blog, I am going to discuss whether I can change this name to something I like.

There are two types of targets in OEM: Repository Side Targets and Agent Side Targets. Each managed target in OEM have a Display Name and Target Name. So for BDA Network, I am wondering which category is for this target.

Run the query for Repository Side targets:

set lines 200
set pages 999
col ENTITY_TYPE for a30
col TYPE_DISPLAY_NAME for a35
col ENTITY_NAME for a45
col DISPLAY_NAME for a40

SELECT ENTITY_TYPE,
       TYPE_DISPLAY_NAME,
       ENTITY_NAME,
       DISPLAY_NAME
FROM   SYSMAN.EM_MANAGEABLE_ENTITIES
WHERE  MANAGE_STATUS = 2
AND    REP_SIDE_AVAIL = 1
ORDER  BY 1,2;
ENTITY_TYPE		       TYPE_DISPLAY_NAME		   ENTITY_NAME					 DISPLAY_NAME
------------------------------ ----------------------------------- --------------------------------------------- ----------------------------------------
j2ee_application_cluster       Clustered Application Deployment    /EMGC_GCDomain/GCDomain/BIP_cluster/bipublish bipublisher(11.1.1)
								  er(11.1.1)

j2ee_application_cluster       Clustered Application Deployment    /EMGC_GCDomain/GCDomain/BIP_cluster/ESSAPP	 ESSAPP
oracle_em_service	       EM Service			   EM Jobs Service				 EM Jobs Service
oracle_emsvrs_sys	       EM Servers System		   Management_Servers				 Management Servers
oracle_si_netswitch	       Systems Infrastructure Switch	   enkx4bda1sw-ib2				 enkx4bda1sw-ib2
oracle_si_netswitch	       Systems Infrastructure Switch	   enkx4bda1sw-ip				 enkx4bda1sw-ip
oracle_si_netswitch	       Systems Infrastructure Switch	   enkx4bda1sw-ib3				 enkx4bda1sw-ib3
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node08-ilom				 enkbda1node08-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node06-ilom 			 enkx4bda1node06-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node17-ilom				 enkbda1node17-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node11-ilom				 enkbda1node11-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node04.enkitec.local/server 	 enkx4bda1node04.enkitec.local/server
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node18-ilom				 enkbda1node18-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node05-ilom 			 enkx4bda1node05-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node15-ilom				 enkbda1node15-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node14-ilom				 enkbda1node14-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node02-ilom				 enkbda1node02-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node01-ilom				 enkbda1node01-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node13-ilom				 enkbda1node13-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node06-ilom				 enkbda1node06-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node09-ilom				 enkbda1node09-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node03-ilom 			 enkx4bda1node03-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node16-ilom				 enkbda1node16-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node03-ilom				 enkbda1node03-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node10-ilom				 enkbda1node10-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node01-ilom 			 enkx4bda1node01-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node04-ilom				 enkbda1node04-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node05-ilom				 enkbda1node05-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkbda1node12-ilom				 enkbda1node12-ilom
oracle_si_server_map	       Systems Infrastructure Server	   enkx4bda1node02-ilom 			 enkx4bda1node02-ilom
weblogic_cluster	       Oracle WebLogic Cluster		   /EMGC_GCDomain/GCDomain/BIP_cluster		 BIP_cluster

31 rows selected.

Not found in this category. Try Agent Side target.

set lines 200
set pages 999
col ENTITY_TYPE for a30
col TYPE_DISPLAY_NAME for a30
col ENTITY_NAME for a35
col DISPLAY_NAME for a35
col EMD_URL for a60

SELECT ENTITY_TYPE,
       TYPE_DISPLAY_NAME,
       ENTITY_NAME,
       DISPLAY_NAME,
       EMD_URL
FROM   SYSMAN.EM_MANAGEABLE_ENTITIES
WHERE  MANAGE_STATUS = 2
AND    REP_SIDE_AVAIL = 0
AND    EMD_URL IS NOT NULL
ORDER  BY 1,2,3;

ENTITY_TYPE		       TYPE_DISPLAY_NAME	      ENTITY_NAME			  DISPLAY_NAME			      EMD_URL
------------------------------ ------------------------------ ----------------------------------- ----------------------------------- ------------------------------------------------------------
host			       Host			      enkx4bda1node01.enkitec.local	  enkx4bda1node01.enkitec.local       https://enkx4bda1node01.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node02.enkitec.local	  enkx4bda1node02.enkitec.local       https://enkx4bda1node02.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node03.enkitec.local	  enkx4bda1node03.enkitec.local       https://enkx4bda1node03.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node04.enkitec.local	  enkx4bda1node04.enkitec.local       https://enkx4bda1node04.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node05.enkitec.local	  enkx4bda1node05.enkitec.local       https://enkx4bda1node05.enkitec.local:1830/emd/main/
host			       Host			      enkx4bda1node06.enkitec.local	  enkx4bda1node06.enkitec.local       https://enkx4bda1node06.enkitec.local:1830/emd/main/
host			       Host			      enkx4bdacli02.enkitec.local	  enkx4bdacli02.enkitec.local	      https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_apache		       Oracle HTTP Server	      /EMGC_GCDomain/GCDomain/ohs1	  ohs1				      https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_bda_cluster	       BDA Network		      BDA Network 1			  BDA Network 1 		      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_beacon		       Beacon			      EM Management Beacon		  EM Management Beacon		      https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_big_data_sql	       Oracle Big Data SQL	      bigdatasql_enkx4bda		  bigdatasql_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_cloudera_manager        Cloudera Manager 	      Cloudera Manager - enkx4bda	  Cloudera Manager - enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node01.enkitec.local:1830  enkx4bda1node01.enkitec.local:1830  https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node02.enkitec.local:1830  enkx4bda1node02.enkitec.local:1830  https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node03.enkitec.local:1830  enkx4bda1node03.enkitec.local:1830  https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node04.enkitec.local:1830  enkx4bda1node04.enkitec.local:1830  https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node05.enkitec.local:1830  enkx4bda1node05.enkitec.local:1830  https://enkx4bda1node05.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bda1node06.enkitec.local:1830  enkx4bda1node06.enkitec.local:1830  https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_emd		       Agent			      enkx4bdacli02.enkitec.local:3872	  enkx4bdacli02.enkitec.local:3872    https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_emrep		       OMS and Repository	      Management Services and Repository  Management Services and Repository  https://enkx4bdacli02.enkitec.local:3872/emd/main/
oracle_hadoop_cluster	       Hadoop Cluster		      enkx4bda				  enkx4bda			      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node01_enkx4bda	  DN_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node02_enkx4bda	  DN_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node03_enkx4bda	  DN_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node04_enkx4bda	  DN_enkx4bda1node04_enkx4bda	      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node05_enkx4bda	  DN_enkx4bda1node05_enkx4bda	      https://enkx4bda1node05.enkitec.local:1830/emd/main/
oracle_hadoop_datanode	       Hadoop DataNode		      DN_enkx4bda1node06_enkx4bda	  DN_enkx4bda1node06_enkx4bda	      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_failoverctl      Hadoop Failover Controller     FC_NNA_enkx4bda			  FC_NNA_enkx4bda		      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_failoverctl      Hadoop Failover Controller     FC_NNB_enkx4bda			  FC_NNB_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_hdfs	       Hadoop HDFS		      hdfs_enkx4bda			  hdfs_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_historyserver    Hadoop Job History Server      JHS_enkx4bda			  JHS_enkx4bda			      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_hive	       Hadoop Hive		      hive_enkx4bda			  hive_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_hive_metaserver  Hadoop Hive Metastore Server   Metastore_enkx4bda		  Metastore_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_hive_server      Hadoop Hive Server2	      HiveServer2_enkx4bda		  HiveServer2_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_hive_webhcat     Hadoop Hive WebHCat Server     WebHCat_enkx4bda			  WebHCat_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_hue	       Hadoop Hue		      hue_enkx4bda			  hue_enkx4bda			      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_impala	       Hadoop Impala		      impala_enkx4bda			  impala_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node01_enkx4bda	  ImpalaD_enkx4bda1node01_enkx4bda    https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node02_enkx4bda	  ImpalaD_enkx4bda1node02_enkx4bda    https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node03_enkx4bda	  ImpalaD_enkx4bda1node03_enkx4bda    https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node04_enkx4bda	  ImpalaD_enkx4bda1node04_enkx4bda    https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_impala_demon     Hadoop Impala Daemon	      ImpalaD_enkx4bda1node06_enkx4bda	  ImpalaD_enkx4bda1node06_enkx4bda    https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_impala_server_cat Hadoop Impala Server Catalogue ImpalaCatSrv_enkx4bda		  ImpalaCatSrv_enkx4bda 	      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_impala_statestore Hadoop Impala State Store      StateStore_enkx4bda		  StateStore_enkx4bda		      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_journalnode      Hadoop Journal Node	      JN_enkx4bda1node01_enkx4bda	  JN_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_journalnode      Hadoop Journal Node	      JN_enkx4bda1node02_enkx4bda	  JN_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_journalnode      Hadoop Journal Node	      JN_enkx4bda1node03_enkx4bda	  JN_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_kerberos	       Kerberos 		      kerberos_enkx4bda 		  kerberos_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_mysql	       MySql			      mysql_enkx4bda			  mysql_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_namenode	       Hadoop NameNode		      NNA_enkx4bda			  NNA_enkx4bda			      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_namenode	       Hadoop NameNode		      NNB_enkx4bda			  NNB_enkx4bda			      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node01_enkx4bda	  NM_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node02_enkx4bda	  NM_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node03_enkx4bda	  NM_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node04_enkx4bda	  NM_enkx4bda1node04_enkx4bda	      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node05_enkx4bda	  NM_enkx4bda1node05_enkx4bda	      https://enkx4bda1node05.enkitec.local:1830/emd/main/
oracle_hadoop_nodemgr	       Hadoop NodeManager	      NM_enkx4bda1node06_enkx4bda	  NM_enkx4bda1node06_enkx4bda	      https://enkx4bda1node06.enkitec.local:1830/emd/main/
oracle_hadoop_oozie	       Hadoop Oozie		      oozie_enkx4bda			  oozie_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_oozie_server     Hadoop Oozie Server	      OozieServer_enkx4bda		  OozieServer_enkx4bda		      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_resourcemgr      Hadoop ResourceManager	      RMA_enkx4bda			  RMA_enkx4bda			      https://enkx4bda1node04.enkitec.local:1830/emd/main/
oracle_hadoop_resourcemgr      Hadoop ResourceManager	      RMB_enkx4bda			  RMB_enkx4bda			      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_solr	       Hadoop Solr		      solr_enkx4bda			  solr_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_solr_server      Hadoop Solr Server	      SolrServer_enkx4bda		  SolrServer_enkx4bda		      https://enkx4bda1node03.enkitec.local:1830/emd/main/
oracle_hadoop_spark_on_yarn    Hadoop Spark On Yarn	      spark_on_yarn_enkx4bda		  spark_on_yarn_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_yarn	       Hadoop Yarn		      yarn_enkx4bda			  yarn_enkx4bda 		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper        Hadoop ZooKeeper 	      zookeeper_enkx4bda		  zookeeper_enkx4bda		      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper_server Hadoop ZooKeeper Server	      ZKS_enkx4bda1node01_enkx4bda	  ZKS_enkx4bda1node01_enkx4bda	      https://enkx4bda1node01.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper_server Hadoop ZooKeeper Server	      ZKS_enkx4bda1node02_enkx4bda	  ZKS_enkx4bda1node02_enkx4bda	      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_hadoop_zookeeper_server Hadoop ZooKeeper Server	      ZKS_enkx4bda1node03_enkx4bda	  ZKS_enkx4bda1node03_enkx4bda	      https://enkx4bda1node03.enkitec.local:1830/emd/main/

....

oracle_oms		       Oracle Management Service      enkx4bdacli02.enkitec.local:4889_Ma enkx4bdacli02.enkitec.local:4889_Ma https://enkx4bdacli02.enkitec.local:3872/emd/main/
							      nagement_Service			  nagement_Service
oracle_oms_console	       OMS Console		      enkx4bdacli02.enkitec.local:4889_Ma enkx4bdacli02.enkitec.local:4889_Ma https://enkx4bdacli02.enkitec.local:3872/emd/main/
							      nagement_Service_CONSOLE		  nagement_Service_CONSOLE
oracle_oms_pbs		       OMS Platform		      enkx4bdacli02.enkitec.local:4889_Ma enkx4bdacli02.enkitec.local:4889_Ma https://enkx4bdacli02.enkitec.local:3872/emd/main/
							      nagement_Service_PBS		  nagement_Service_PBS
oracle_si_pdu		       Systems Infrastructure PDU     enkx4bda1-pdua			  enkx4bda1-pdua		      https://enkx4bda1node02.enkitec.local:1830/emd/main/
oracle_si_pdu		       Systems Infrastructure PDU     enkx4bda1-pdub			  enkx4bda1-pdub		      https://enkx4bda1node02.enkitec.local:1830/emd/main/


164 rows selected

The following query shows just oracle_bda_cluster type of target.

col ENTITY_TYPE for a20
col TYPE_DISPLAY_NAME for a20
col ENTITY_NAME for a16
col DISPLAY_NAME for a16
col EMD_URL for a55
SELECT ENTITY_TYPE,
       TYPE_DISPLAY_NAME,
       ENTITY_NAME,
       DISPLAY_NAME,
       EMD_URL
FROM   SYSMAN.EM_MANAGEABLE_ENTITIES
WHERE  
ENTITY_TYPE = 'oracle_bda_cluster'
;
ENTITY_TYPE	     TYPE_DISPLAY_NAME	  ENTITY_NAME	   DISPLAY_NAME     EMD_URL
-------------------- -------------------- ---------------- ---------------- -------------------------------------------------------
oracle_bda_cluster   BDA Network	  BDA Network 1    BDA Network 1    https://enkx4bda1node02.enkitec.local:1830/emd/main/

Ok, we can see entity_type is oracle_bda_cluster for BDA Network. Both target name and display name are BDA Network 1.

Next, I will check whether I can rename the target name of BDA Network 1. I used emcli rename_target command in the past to rename OEM target. It usually works. So I run the following command:

[oracle@enkx4bdacli02 ~]$ emcli show_bda_clusters
BDA Network 1 : enkx4bda

[oracle@enkx4bdacli02 ~]$ emcli get_targets -targets="oracle_bda_cluster"
Status  Status           Target Type           Target Name                        
 ID                                                                               
-9      N/A              oracle_bda_cluster    BDA Network 1  

[oracle@enkx4bdacli02 ~]$ emcli rename_target -target_type="oracle_bda_cluster" -target_name="BDA Network 1" -new_target_name="X4BDA"
Rename not supported for the given Target Type.

No luck. It doesn’t work. If renaming target name not working, let me try to change display name.

[oracle@enkx4bdacli02 ~]$ emcli modify_target -type="oracle_bda_cluster" -name="BDA Network 1" -display_name="X4BDA"
Target "BDA Network 1:oracle_bda_cluster" modified successfully

It works. Rerun the query to check oracle_bda_cluster type.

ENTITY_TYPE	     TYPE_DISPLAY_NAME	  ENTITY_NAME	   DISPLAY_NAME     EMD_URL
-------------------- -------------------- ---------------- ---------------- -------------------------------------------------------
oracle_bda_cluster   BDA Network	  BDA Network 1    X4BDA	    https://enkx4bda1node02.enkitec.local:1830/emd/main/

Well it work partially. For some screens, it works perfectly.

But for some other screen, it still shows the same annoying name.

Another lesson I learned recently is that you need very careful in using default password when setting up BDA. Once setting up BDA using default password and OEM BDA Discovery is using these default password for Named Credentials, you will run into issue after you change default password later on. In the worst case, like Cloudera Manager’s password change, it requires the remove the current BDA target and redo the BDA Discovery. I may write this topic in a different blog if I have time.

Mysterious Time Gap in Spark Job Timeline

Sometime ago one of my clients asked me a question when reviewing a Spark job: why there is a time gap in the event timeline, sometimes can be as long as one minute. If there are a few seconds, it seems make sense it could relate to Spark’s overhead between each job run. But for one minute, it seem to be too long for any overhead activities because the whole job takes only 8~9 minutes. I didn’t have a good answer for the question. Recently I did some benchmark for a spark job on a X3 full rack Oracle BDA in our lab, I did notice the same behavior. I tracked down the issue and finally figured out the cause of the timeline gap. I am going to share my findings in this blog.

My benchmark is on an X3 full rack BDA with Spark version 1.6 and CDH 5.7. The spark testing script is a pretty simple one and important lines related to this timeline gap are listed as follows:

line 42: val myDF = hiveContext.sql(“select * from wzdb.sales_part “)
line 44: myDF.show()
line 47: val rawDF = hiveContext.sql(“select * from wzdb.sales_raw limit 100 “)
line 48: rawDF.show()

Line 42 is pulling all data from wzdb.sales_part table, which is a hive partition table using Parquet and SNAPPY compression. The table has about 1.3 billion rows and 1,680 partitions. Line 44 just show the DataFrame myDF, by default it shows 20 rows. Similarly line 47 pull 100 rows from wzdb.sales_raw table and line 48 show the first 20 rows from the table. Ok, the code can not be simpler than that.

After started the spark job, it finished in 40 seconds. However, when I checked out the Event Timeline, it shows there is a time gap between Job (or stage) Id 1 and Job Id 2. Job Id 1 started at 18:13:24 and completed at 18:13:26. But the Job Id 2 started at 18:13:35 and there was 9 seconds time gap, about 25% of total execution time. 25% seems a lot to me. Job Id 1 executed the line 42 while Job Id 2 executed the line 44. There is no execution code at line 43. Things become interesting.

spark_event_timeline_1

Then I checked out Executors page. It shows there are two Executors and each took about 6~7 seconds tasks time. Then I click the link to stdout Logs for each executor. I paid more attention to the timeline between 18:13:24 and 18:13:35.

spark_executors_2

Here are the part of output from Executor 1:

16/12/02 18:13:14 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 27032@enkbda1node10.enkitec.com
16/12/02 18:13:14 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/12/02 18:13:16 INFO spark.SecurityManager: Changing view acls to: yarn,oracle
16/12/02 18:13:17 INFO spark.SecurityManager: Changing modify acls to: yarn,oracle
16/12/02 18:13:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, oracle); users with modify permissions: Set(yarn, oracle)
16/12/02 18:13:18 INFO spark.SecurityManager: Changing view acls to: yarn,oracle
16/12/02 18:13:18 INFO spark.SecurityManager: Changing modify acls to: yarn,oracle
16/12/02 18:13:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, oracle); users with modify permissions: Set(yarn, oracle)
16/12/02 18:13:18 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/12/02 18:13:18 INFO Remoting: Starting remoting
16/12/02 18:13:19 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@enkbda1node10.enkitec.com:38184]
16/12/02 18:13:19 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutorActorSystem@enkbda1node10.enkitec.com:38184]
16/12/02 18:13:19 INFO util.Utils: Successfully started service 'sparkExecutorActorSystem' on port 38184.
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-7af60bfd-9e27-45c8-bf30-a4bf126681f0
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u11/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-1c6099f9-37f2-4b4e-8b60-5c209bffc924
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u10/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-ec991001-9dc1-4017-ba55-314b54dd9109
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-c0df794d-5ee6-46fe-ad57-cf6186cd5ba7
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u08/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-673f1d0b-7e44-47e7-b36e-eed65c656c87
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u07/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-ab27950b-7dfd-48eb-a33c-7fbd02d29137
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u06/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-fe6697c4-d64a-47b3-9781-27d583370710
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u05/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-e0a928ab-5895-46b6-8b10-ad883e895632
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u04/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-4d39319c-b6be-4a17-8755-89477f81e899
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u03/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-cfd8fd9c-22cd-443f-8a1d-99b9867c8507
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u02/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-fff46796-d06a-45af-816b-c46d356be447
16/12/02 18:13:19 INFO storage.DiskBlockManager: Created local directory at /u01/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/blockmgr-6cf05120-f651-4615-8abe-14631c5aadb1
16/12/02 18:13:19 INFO storage.MemoryStore: MemoryStore started with capacity 530.0 MB
16/12/02 18:13:19 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@192.168.12.105:33339
16/12/02 18:13:19 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
16/12/02 18:13:19 INFO executor.Executor: Starting executor ID 2 on host enkbda1node10.enkitec.com
16/12/02 18:13:19 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45880.
16/12/02 18:13:19 INFO netty.NettyBlockTransferService: Server created on 45880
16/12/02 18:13:19 INFO storage.BlockManager: external shuffle service port = 7337
16/12/02 18:13:19 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/12/02 18:13:19 INFO storage.BlockManagerMaster: Registered BlockManager
16/12/02 18:13:19 INFO storage.BlockManager: Registering executor with local external shuffle service.
16/12/02 18:13:19 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1
16/12/02 18:13:19 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
16/12/02 18:13:20 INFO executor.Executor: Fetching spark://192.168.12.105:33339/jars/scalawztest1_2.10-1.0.jar with timestamp 1480723975104
16/12/02 18:13:20 INFO util.Utils: Fetching spark://192.168.12.105:33339/jars/scalawztest1_2.10-1.0.jar to /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-4647d3e3-6655-4da1-b75b-94f3d872c71a/fetchFileTemp8581640132534419761.tmp
16/12/02 18:13:20 INFO util.Utils: Copying /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-4647d3e3-6655-4da1-b75b-94f3d872c71a/8566654191480723975104_cache to /u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/container_e81_1480516695248_0056_01_000003/./scalawztest1_2.10-1.0.jar
16/12/02 18:13:20 INFO executor.Executor: Adding file:/u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/container_e81_1480516695248_0056_01_000003/./scalawztest1_2.10-1.0.jar to class loader
16/12/02 18:13:20 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
16/12/02 18:13:20 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.6 KB, free 25.6 KB)
16/12/02 18:13:20 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 101 ms
16/12/02 18:13:20 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 72.5 KB, free 98.1 KB)
16/12/02 18:13:22 INFO client.FusionCommon: Initialized FusionHdfs with URI: hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=1, FileSystem: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_468103070_110, ugi=oracle (auth:SIMPLE)]], instance: 426700950
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=1
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=10
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=11
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=12
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=13
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=14
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=15
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=16
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=17
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=18
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=19
16/12/02 18:13:22 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2014/month=4/day=2
....
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8
16/12/02 18:13:24 INFO sources.HadoopFsRelation: Listing hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9
16/12/02 18:13:24 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 727069 bytes result sent to driver
16/12/02 18:13:24 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 3
16/12/02 18:13:24 INFO executor.Executor: Running task 1.0 in stage 1.0 (TID 3)
16/12/02 18:13:24 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 25.5 KB, free 123.6 KB)
16/12/02 18:13:24 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 24 ms
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 72.4 KB, free 196.0 KB)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/parquet-hadoop-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/parquet-format-2.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/parquet-pig-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/hive-exec-1.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/jars/hive-jdbc-1.1.0-cdh5.7.0-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
16/12/02 18:13:26 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1503 bytes result sent to driver
16/12/02 18:13:35 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4
16/12/02 18:13:35 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 4)
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 3
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 93.1 KB, free 93.1 KB)
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Reading broadcast variable 3 took 9 ms
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 684.5 KB, free 777.6 KB)
16/12/02 18:13:35 INFO parquet.ParquetRelation$$anonfun$buildInternalScan$1$$anon$1: Input split: ParquetInputSplit{part: hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0 start: 0 end: 2028037 length: 2028037 hosts: []}
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 2
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.0 KB, free 801.6 KB)
16/12/02 18:13:35 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2 took 7 ms
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 334.5 KB, free 1136.1 KB)
16/12/02 18:13:35 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/12/02 18:13:36 INFO codegen.GenerateUnsafeProjection: Code generated in 148.139655 ms
16/12/02 18:13:36 INFO codegen.GenerateUnsafeProjection: Code generated in 21.996524 ms
16/12/02 18:13:36 INFO codegen.GenerateSafeProjection: Code generated in 15.975923 ms
16/12/02 18:13:36 WARN parquet.CorruptStatistics: Ignoring statistics because created_by is null or empty! See PARQUET-251 and PARQUET-297
16/12/02 18:13:36 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 4). 5365 bytes result sent to driver
16/12/02 18:13:40 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
16/12/02 18:13:40 WARN executor.CoarseGrainedExecutorBackend: An unknown (enkbda1node05.enkitec.com:33339) driver disconnected.
16/12/02 18:13:40 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.12.105:33339 disassociated! Shutting down.
16/12/02 18:13:40 INFO storage.DiskBlockManager: Shutdown hook called
16/12/02 18:13:40 INFO util.ShutdownHookManager: Shutdown hook called
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u04/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-47de642a-8665-4853-a4f7-a5ba3ece4295
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u03/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-f1aaa12c-c0d5-4c75-b1e6-e222ea9112b6
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u01/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-764ee07d-b082-4bc1-8b4d-3834d6cd14cd
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u02/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-f79cf620-4754-41e2-bb4c-6e242f8e16ad
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u11/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-ae63ef76-400a-4e8f-b68b-3a34ed25414e
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u09/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-37543a1b-c5ec-4f06-887d-9357bea573e4
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u05/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-9a5eb5c9-4009-49ff-aca6-64103429b245
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u07/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-6ac13e05-62ec-485f-8016-3bb4d1830492
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u10/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-15e0aa40-4a67-46c7-8eb0-fe8c8f2d0c6e
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u06/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-112c8eb2-e039-4f73-beae-e3c05a04c3c6
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u08/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-752c1bcc-ce1f-41a1-b779-347f77b04227
16/12/02 18:13:40 INFO util.ShutdownHookManager: Deleting directory /u12/hadoop/yarn/nm/usercache/oracle/appcache/application_1480516695248_0056/spark-4647d3e3-6655-4da1-b75b-94f3d872c71a

Executor 1 handled 840 Hive partitions. There are no additional logging information between 18:13:26 and 18:13:35. The log immediately jumped from 16/12/02 18:13:26 INFO executor.Executor: Finished task 1.0 in stage 1.0 (TID 3). 1503 bytes result sent to driver to line 16/12/02 18:13:35 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4.

Executor 2 has similar behavior and processed exactly 840 partitions as well. The interested logs are shown below:
16/12/02 18:13:25 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 931 bytes result sent to driver
16/12/02 18:13:38 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 5

It did not tell me anything useful. Ok, let me check out the Stages page. I am more interested in Stage Id 2 and Stage Id 3. Here is the screen for Stages summary.
spark_stage_summary_3

Let’s check out Stage Id 1 screen shown below.
spark_stage_stage1_4

The duration is only between 0.2 and 1 seconds for the two executors. Another interesting statistics is the Peak Execution Memory is 0 Byte for both. I don’t believe this stage can load 1.3 billion rows of data without any memory usage. In other words, I believe it does not do any IO related work at this stage although the stage is doing select * from wzdb.sales_part.

Ok, let me check out the next stage, Stage 2. The DAG chart is so huge that it takes 20 seconds to shows up on the screen. There are literally 3,365 RDD partitions with union operation together to provide the result for show() function.

spark_stage_stage2_5

spark_stage_stage2_6

The Metrics stats for this stage gives other interesting result.
spark_stage_stage2_7

The total duration is unbelievable fast, 1 second and input size of 1980KB and 21 records. Remember, by default, show() function just print out 20 rows. So this 1980KB and 21 records are definitely related to this 20 rows show() result. But with 3,365 RDD partitions are union together, 1 second seems unbelievable fast. Please note the block size is 256 MB in our environment. I just don’t believe it’s physically possible to perform stage 1 operation (select * from wzdb.sales_part with 1.3 billion rows Hive Piquet table) in 1 second and immediately show 20 rows of result in the following 1 second. Yes, Spark is in memory based processing and super fast. But from the DAG, it go through all 1.3 billion rows. It can’t be finished in 2 seconds, even with a full rack BDA. It must has something else not present in the picture.

Luckily, for this test, I used the client mode as the deployment mode. So all of the log output was sent to my driver, the executing session. Then I found out where the missing time goes.

16/12/02 18:13:23 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4178 ms on enkbda1node09.enkitec.com (1/2)
16/12/02 18:13:24 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 4458 ms on enkbda1node10.enkitec.com (2/2)
16/12/02 18:13:24 INFO scheduler.DAGScheduler: ResultStage 0 (sql at test2.scala:42) finished in 20.648 s
16/12/02 18:13:24 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Job 0 finished: sql at test2.scala:42, took 21.026555 s
16/12/02 18:13:24 INFO spark.SparkContext: Starting job: sql at test2.scala:42
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Got job 1 (sql at test2.scala:42) with 2 output partitions
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (sql at test2.scala:42)
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Missing parents: List()
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at sql at test2.scala:42), which has no missing parents
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 72.4 KB, free 170.5 KB)
16/12/02 18:13:24 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 25.5 KB, free 196.0 KB)
16/12/02 18:13:24 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.12.105:14015 (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:24 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/12/02 18:13:24 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at sql at test2.scala:42)
16/12/02 18:13:24 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
16/12/02 18:13:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, enkbda1node09.enkitec.com, partition 0,PROCESS_LOCAL, 2044 bytes)
16/12/02 18:13:24 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, enkbda1node10.enkitec.com, partition 1,PROCESS_LOCAL, 2143 bytes)
16/12/02 18:13:24 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on enkbda1node09.enkitec.com:19013 (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:24 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on enkbda1node10.enkitec.com:45880 (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:25 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 294 ms on enkbda1node09.enkitec.com (1/2)
16/12/02 18:13:26 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 1614 ms on enkbda1node10.enkitec.com (2/2)
16/12/02 18:13:26 INFO scheduler.DAGScheduler: ResultStage 1 (sql at test2.scala:42) finished in 1.620 s
16/12/02 18:13:26 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
16/12/02 18:13:26 INFO scheduler.DAGScheduler: Job 1 finished: sql at test2.scala:42, took 1.665575 s
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.12.105:14015 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node10.enkitec.com:45880 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node09.enkitec.com:19013 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.12.105:14015 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node10.enkitec.com:45880 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node09.enkitec.com:19013 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO spark.ContextCleaner: Cleaned accumulator 1
16/12/02 18:13:26 INFO spark.ContextCleaner: Cleaned accumulator 2
16/12/02 18:13:26 INFO datasources.DataSourceStrategy: Selected 1680 partitions out of 1680, pruned 0.0% partitions.
16/12/02 18:13:26 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 108.4 KB, free 108.4 KB)
16/12/02 18:13:26 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.0 KB, free 132.3 KB)
16/12/02 18:13:26 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.12.105:14015 (size: 24.0 KB, free: 530.0 MB)
16/12/02 18:13:26 INFO spark.SparkContext: Created broadcast 2 from show at test2.scala:44
16/12/02 18:13:27 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000022_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000128_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000128_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=1/000284_0
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000031_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000137_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000137_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=10/000293_0
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000032_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000138_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000138_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=11/000294_0
16/12/02 18:13:27 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000033_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000139_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000139_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=12/000295_0
16/12/02 18:13:28 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000034_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000140_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000140_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2012/month=1/day=13/000296_0

....
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000076_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000144_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000144_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=19/000266_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000059_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000127_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000127_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=2/000249_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000077_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000145_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000145_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=20/000267_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000000_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000146_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000146_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=21/000268_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000001_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000147_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000147_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=22/000269_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000002_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000148_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000148_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=23/000270_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000003_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000149_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000149_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=24/000271_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000004_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000150_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000150_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=25/000272_0
16/12/02 18:13:34 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000005_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000151_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000151_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=26/000273_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000006_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000152_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000152_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=27/000274_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000007_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000153_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000153_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=28/000275_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000060_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000128_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000128_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=3/000250_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000061_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000129_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000129_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=4/000251_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000062_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000130_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000130_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=5/000252_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000063_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000131_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000131_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=6/000253_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000064_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000132_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000132_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=7/000254_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000065_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000133_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000133_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=8/000255_0
16/12/02 18:13:35 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000066_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000134_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000134_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2016/month=9/day=9/000256_0
16/12/02 18:13:35 INFO spark.SparkContext: Starting job: show at test2.scala:44
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Got job 2 (show at test2.scala:44) with 1 output partitions
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (show at test2.scala:44)
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Missing parents: List()
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[3367] at show at test2.scala:44), which has no missing parents
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 684.5 KB, free 816.9 KB)
16/12/02 18:13:35 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 93.1 KB, free 909.9 KB)
16/12/02 18:13:35 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.12.105:14015 (size: 93.1 KB, free: 529.9 MB)
16/12/02 18:13:35 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
16/12/02 18:13:35 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[3367] at show at test2.scala:44)
16/12/02 18:13:35 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks
16/12/02 18:13:35 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, enkbda1node10.enkitec.com, partition 0,RACK_LOCAL, 2384 bytes)
16/12/02 18:13:35 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on enkbda1node10.enkitec.com:45880 (size: 93.1 KB, free: 529.9 MB)
16/12/02 18:13:35 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on enkbda1node10.enkitec.com:45880 (size: 24.0 KB, free: 529.9 MB)
16/12/02 18:13:36 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 1336 ms on enkbda1node10.enkitec.com (1/1)
16/12/02 18:13:36 INFO scheduler.DAGScheduler: ResultStage 2 (show at test2.scala:44) finished in 1.336 s
16/12/02 18:13:36 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool 
16/12/02 18:13:36 INFO scheduler.DAGScheduler: Job 2 finished: show at test2.scala:44, took 1.604959 s
16/12/02 18:13:36 INFO spark.ContextCleaner: Cleaned accumulator 3
16/12/02 18:13:36 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on 192.168.12.105:14015 in memory (size: 93.1 KB, free: 530.0 MB)
16/12/02 18:13:36 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on enkbda1node10.enkitec.com:45880 in memory (size: 93.1 KB, free: 530.0 MB)
+-------------+----------------+--------+--------------------+-------------+-------+----------------+----------+------------+----+-----+---+
|      tran_id|  tran_timestamp|store_id|           item_name|item_category|dept_id|       dept_name|tran_price|cashier_name|year|month|day|
+-------------+----------------+--------+--------------------+-------------+-------+----------------+----------+------------+----+-----+---+
|1479564513475|2012-01-01 15:47|       4|rxkexrwwrnohuenpm...|          891|     26|             Toy|    185.17|       Maria|2012|    1|  1|
|1479564513608|2012-01-01 10:11|      27|                  zz|          790|     26|   Auto and Tire|     68.55|         Von|2012|    1|  1|
|1479564513748|2012-01-01 16:53|      26|fqzlqxvmpktwjwwgg...|          279|     10|Home Improvement|    233.47|         Von|2012|    1|  1|
|1479564513750|2012-01-01 21:10|      22|               ndmeu|          487|     35|           Photo|     92.42|       Ileen|2012|    1|  1|
|1479564526973|2012-01-01 07:52|       6|sbzmvrnxrvbohorbp...|          632|     18|         Jewelry|    164.34|        Keri|2012|    1|  1|
|1479564469852|2012-01-01 18:54|      27|ptcilplqfvednxmmh...|          416|      3|    Baby Toddler|    144.86|      Michel|2012|    1|  1|
|1479564523772|2012-01-01 11:07|       2|gvjrsdgidzunbbmfi...|          269|     17|          Sports|    231.67|       Penny|2012|    1|  1|
|1479564524666|2012-01-01 08:51|       6|rfskpcezchhbhzsbd...|          595|     19|            Food|    175.85|         Rus|2012|    1|  1|
|1479564470133|2012-01-01 18:36|      17|wzswebdjowfjjbslh...|          679|     10|   Health Beauty|    350.13|        Keri|2012|    1|  1|
|1479564537634|2012-01-01 07:52|      12|             bhxoevw|          281|     34|    Baby Toddler|    352.02|        Keri|2012|    1|  1|
|1479564470197|2012-01-01 06:04|       5|plqxmnrcuqisfygkl...|          152|     19|             Toy|     53.67|     Dorothy|2012|    1|  1|
|1479564470201|2012-01-01 08:23|      13|frcatrjwwrbomxmnj...|           74|     20|   Auto and Tire|    359.81|       Ileen|2012|    1|  1|
|1479564470386|2012-01-01 10:16|      15|cevezkxpsrzszshen...|          814|     13|   Auto and Tire|     27.92|     Sherril|2012|    1|  1|
|1479564470724|2012-01-01 01:44|      26|jjiqfklffyzxzkyiz...|          248|      5|   Auto and Tire|    219.66|       Lidia|2012|    1|  1|
|1479564470799|2012-01-01 23:26|      18|     voakgmajahxfgbq|          769|     17|          Sports|    251.07|       Susan|2012|    1|  1|
|1479564470941|2012-01-01 13:28|      14|axkytaxghyujudtaw...|          207|      5|   Auto and Tire|    168.34|  Christoper|2012|    1|  1|
|1479564471016|2012-01-01 15:37|       3|sdcnxhosatucnwwqk...|          192|     23|         Jewelry|       2.5|      Michel|2012|    1|  1|
|1479564471049|2012-01-01 23:27|       9|zoppybkpqpgitrwlo...|          120|     32|          Sports|    147.28|     Dorothy|2012|    1|  1|
|1479564471063|2012-01-01 23:51|      24|zknmvfsrsdxdysmdw...|          169|      6|         Jewelry|    292.59|   Broderick|2012|    1|  1|
|1479564471113|2012-01-01 19:42|      20|uaqmjikgtisidskzm...|          388|     36|            Food|      3.55|       Maria|2012|    1|  1|
+-------------+----------------+--------+--------------------+-------------+-------+----------------+----------+------------+----+-----+---+
only showing top 20 rows

16/12/02 18:13:36 INFO datasources.DataSourceStrategy: Selected 1680 partitions out of 1680, pruned 0.0% partitions.
16/12/02 18:13:36 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 282.1 KB, free 414.5 KB)

The above log shows that IO operations like parquet.ParquetRelation: Reading Parquet file(s) are completely outside the timeline for Job(/stage) 1 and Job 2. This is where the missing time goes. It is actually pretty good to have only 9~10 seconds to go through the all 1.3 billion rows. Mystery is solved.

With the above findings in mind, I feel if I just do partition pruning and limit the number of rows scanned in the line 42 query, the gap timeline should be reduced as less IO is needed to read the data. So I add the partition pruning in the query on line 42 to select * from wzdb.sales_part where year=2013 and month=11 and day=13. Rerun the test. The result was exactly what I expected.

Here is the new timeline:
spark_event_timeline_8

As you can see, there is only 1 second gap between Job Id 1 and Job Id 2. Here are the execution log. Only one partition of data was read.

16/12/03 11:11:10 INFO scheduler.DAGScheduler: Job 1 finished: sql at test2.scala:42, took 1.322394 s
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.12.105:46755 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node09.enkitec.com:37048 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on enkbda1node03.enkitec.com:19685 in memory (size: 25.5 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO spark.ContextCleaner: Cleaned accumulator 2
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.12.105:46755 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node09.enkitec.com:37048 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on enkbda1node03.enkitec.com:19685 in memory (size: 25.6 KB, free: 530.0 MB)
16/12/03 11:11:10 INFO spark.ContextCleaner: Cleaned accumulator 1
16/12/03 11:11:11 INFO datasources.DataSourceStrategy: Selected 1 partitions out of 1680, pruned 99.94047619047619% partitions.
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 108.4 KB, free 108.4 KB)
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.0 KB, free 132.3 KB)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.12.105:46755 (size: 24.0 KB, free: 530.0 MB)
16/12/03 11:11:11 INFO spark.SparkContext: Created broadcast 2 from show at test2.scala:44
16/12/03 11:11:11 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/12/03 11:11:11 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0_copy_2, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000057_0_copy_3, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000165_0, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000165_0_copy_1, hdfs://enkbda-ns/user/hive/warehouse/wzdb.db/sales_part/year=2013/month=11/day=13/000191_0
16/12/03 11:11:11 INFO spark.SparkContext: Starting job: show at test2.scala:44
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Got job 2 (show at test2.scala:44) with 1 output partitions
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (show at test2.scala:44)
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Missing parents: List()
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at show at test2.scala:44), which has no missing parents
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 143.0 KB, free 275.3 KB)
16/12/03 11:11:11 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 16.5 KB, free 291.8 KB)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.12.105:46755 (size: 16.5 KB, free: 530.0 MB)
16/12/03 11:11:11 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
16/12/03 11:11:11 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at show at test2.scala:44)
16/12/03 11:11:11 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks
16/12/03 11:11:11 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, enkbda1node09.enkitec.com, partition 0,NODE_LOCAL, 2386 bytes)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on enkbda1node09.enkitec.com:37048 (size: 16.5 KB, free: 530.0 MB)
16/12/03 11:11:11 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on enkbda1node09.enkitec.com:37048 (size: 24.0 KB, free: 530.0 MB)
16/12/03 11:11:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 1523 ms on enkbda1node09.enkitec.com (1/1)

Lesson learned from this test is that Spark Metrics is helpful to identify the bottleneck of the spark application, but may not tell you the complete story. Just like in this case, if we just focus on the 1 or 2 seconds operations, it seems nothing need to be tuned here. On the contrary, we should need to focus on reducing the IO to access 1+ billion rows table by adding filter of partition keys and limiting total number of rows scan.