The Workaround Solution for Google Cloud VM Can not Connected after Reboot

I have been using Google Cloud for the past few years in different projects. In general, I am happy with the GCP’s VM performance. For the past few days, I run into a weird issue. I manually created a VM from GCP console, and can SSH to the VM using the command: gcloud compute ssh. Did a few installations. All good until I stopped the VM and trying to reconnect back. It has never working and always get timeout error. The same gcloud compute ssh command did not work anymore.

At first, I thought maybe I have some firewall issue. But I have no issue to connect to another VM with almost identical setup without any issue after reboot. The only difference is that I created that VM sometime back. I also tried adding SSH key and it did not work as well. This is weird. The only thing I can think of is that I have a bad OS image. I am using CentOS 7 image (centos-7-v20200714). Then I tried other versions, like centos-7-v20200618, centos-7-v20200403, even Redhat 7 version. None of them worked. So I think I can rule out the image issue. I also tried taking the VM image before VM reboot and tried the restore from the image. No luck as well. Maybe the rpm installations, then I tried to skip the rpm installation. Then the connection after reboot worked. This is insane. What’s going on?

I did some research. Surprisingly I am not the only guy that has this miserable experience for the past few days. It looks like there is a bug that has impact on both Redhat and CentOS for version 7 and 8. It was caused by yum update, which was exactly the action I performed. Here is the link to the bug: System hangs after POST and the grub menu never loads after applying the RHSA-2020:3216 or RHSA-2020:3217. There is also an active issue tracker – yum update breaks GCE Instances running RHEL and CentOS 7 and 8.

I tried out the workaround solution and it worked for me. Please note: this workaround works for VM that has not rebooted yet. If you stopped the VM before you can apply the above change, bad luck and you will need other workaround for the issue.

Here are the steps:
1. Run command rpm -q shim-x64. If you see your result is one of the following, you’re impacted:

CentOS 7: shim-x64-15-7.el7_9.x86_64
CentOS 8: shim-x64-15-13.el8.x86_64
RHEL 7: shim-x64-15-7.el7_8.x86_64
RHEL 8: shim-x64-15-14.el8_2.x86_64

2. Run the downgrade command:

# yum downgrade shim\* grub2\* mokutil

3. Then add the following to /etc/yum.conf file.

exclude=grub2* shim* mokutil

After that, reboot the VM. You should be able to connect although it takes a little long for the first time connection. Good luck to anyone who has the same issue.

How to do tagging efficiently in Terraform?

I have done a lot of Terraform stuff on all three major clouds (AWS, Azure, GCP) for the past few months. If you need to build up a lot of resource on clooud, Terraform is the way to go. However, sometimes it seems easier to copy some part code from one place to another place. If you have a huge enterprise level environment, this kind of duplicated code is a nightmare for changes and maintenance. Tagging is the one that is very easy to have duplicated code everywhere. In this blog, I am going to discuss how to do tagging efficiently in Terraform.

Here is an example how tagging is done in Azure environment.

resource "azurerm_resource_group" "default" {
   name     = "my-test-rg"
   location = "eastus"

   tags {
    environment = "Dev"
    project   = "MyLab"
    owner     = "weidong.zhou"

resource "azurerm_virtual_network" "default" {
   name                = "mytest-vnet1"
   address_space       = [""]
   location            = "${azurerm_resource_group.default.location}"
   resource_group_name = "${}"
   tags {
    environment = "Dev"
    project   = "MyLab"
    owner     = "weidong.zhou"
    network   = "vnet1"
    support   = "IT operation"

You can see the tags section can be easily duplicated to almost every resource that needs to be tagged. It is going to be a nightmare to add or change tags in the future. I remembered I sawed some code as follows somewhere.

resource "aws_vpc" "this" {
  count = "${var.create_vpc ? 1 : 0}"
  cidr_block                       = "${var.cidr}"
  instance_tenancy                 = "${var.instance_tenancy}"
  enable_dns_hostnames             = "${var.enable_dns_hostnames}"
  enable_dns_support               = "${var.enable_dns_support}"
  assign_generated_ipv6_cidr_block = "${var.assign_generated_ipv6_cidr_block}"
  tags = "${merge(map("Name", format("%s",, var.tags, var.vpc_tags)}"

Pay more attention to the last line. In one line, it does all the tagging it needs. This is the direction I am looking for. After some investigation, I figured out a similar way in the implementation. Here are the sample code:

locals {
  common_tags = {
    environment  = "${var.environment}"
    project      = "${var.project}"  
    Owner        = "${var.owner}"
  extra_tags  = {
    network = "${var.network1_name}"
    support = "${var.network_support_name}"
resource "azurerm_resource_group" "default" {
   name     = "my-test-rg"
   location = "eastus"

   tags = "${merge( local.common_tags, local.extra_tags)}"

resource "azurerm_virtual_network" "default" {
   name                = "mytest-vnet1"
   address_space       = [""]
   location            = "${azurerm_resource_group.default.location}"
   resource_group_name = "${}"
   tags = "${merge( local.common_tags, local.extra_tags)}"

For a project with a lot of tags in every resource, this approach will help in the maintenance.

Create Cloudera Hadoop Cluster Using Cloudera Director on Google Cloud

I have a blog discussing how to install Cloudera Hadoop Cluster several years ago. It basically took about at least half day to complete the installation in my VM cluster. In my last post, I discussed an approach to deploy Hadoop cluster using DataProc on Google Cloud Platform. It literally took less than two minutes to create a Hadoop Cluster. Although it is a good to have a cluster launched in a very short time, it does not have the nice UI like Cloudera Manager as the Hadoop distribution used by Dataproc is not CDH. I could repeat my blogs to build a Hadoop Cluster using VM instances on Google Cloud Platform. But it will take some time and involve a lot of work. Actually there is another way to create Hadoop cluster on the cloud. Cloudera has a product, called Cloudera Director. It currently supports not only Google Cloud, but also AWS and Azure as well. It is designed to deploy CDH cluster faster and easier to scale the cluster on the cloud. Another important feature is that Cloud Director allows you to move your deployment scripts or steps easily from one cloud provider to another provider and you don’t have to be locked in one cloud vendor. In this blog, I will show you the way to create a CDH cluster using Cloudera Director.

The first step is to start my Cloudera Director instance. In my case, I have already installed Cloudera Director based on the instruction from Cloudera. It is pretty straight forward process and I am not going to repeat it here. The Cloudera Director instance is where you can launch your CDH cluster deployment.

Both Cloudera Director and Cloudera Manager UI are browser-based and you have to setup secure connection between your local machine and VM instances on the cloud. To achieve this, you need to configure SOCKS proxy on your local machine that is used to connect to the Cloudera Director VM. It provides a secure way to connect to your VM on the cloud and can use VM’s internal IP and hostname in the web browser. Google has a nice note about the steps, Securely Connecting to VM Instances. Following this note will help you to setup SOCKS proxy.

Ok, here are the steps.
Logon to Cloudera Director
Open a terminal session locally, and run the following code:

gcloud compute ssh cdh-director-1 \
    --project cdh-director-173715 \
    --zone us-central1-c \
    --ssh-flag="-D" \
    --ssh-flag="1080" \

cdh-director-1 is the name of my Cloudera Director instance on Google cloud and cdh-director-173715 is my Google Cloud project id. After executing the above command, it looks hang and never complete. This is CORRECT behavior. Do not kill or exit this session. Open a browser and type in the internal IP of Cloudera Director instance with port number 7189. For my cdh-director-1 instance, the internal IP is

After input the URL for Cloudera Director. The login screen shows up. Login as admin user.

After login, the initial setup wizard shows up. Click Let’s get started.

In the Add Environment screen, input the information as follows. The Client ID JSON Key is the file you can create during the initial setup of you Google project with SSH key stuff.

In the next Add Cloudera Manager screen, I usually create the Instance Template first. Click the drop down of Instance Template, then select Create a new instance template. I need at least three template, one for Cloudera Manager, one for Master nodes, and one for Worker nodes. In my case here, I did not create a template for Edge nodes. To save resource on my Google cloud environment, I did not create the template for Edge node. Here are the configuration for all three templates.

Cloudera Manager Template

Master Node Template

Worker Node Template

Input the following for Cloudera Manager. For my test, I use Embedded Database. If it is used for production, you need to setup external database first and register the external database here.

After click Continue, Add Cluster screen shows up. There is a gateway instance group and I removed it by clicking Delete Group because I don’t have edge node here. Input the corresponding template and number of instances for masters and workders.

After click Continue, the deployment starts.

After about 20 minutes, it completes. Click Continue.

Review Cluster
The nice Cloudera Director dashboard shows up.

You can also login to Cloudera Manager from the link on Cloudera Director.

Nice and easy. Excellent product from Cloudera. For more information about deploying CDH cluster on Google Cloud, you can also check out Cloudera’s document, Getting Started on Google Cloud Platform.

E4 2017

I joined Enkitec in the summer of 2012, just a few weeks before the first Enkitec E4 conferece. Really good timing. Since then, I have been to every E4 conference for five years. It’s really an interesting conference more focusing on technical stuff and architecture. I always learn something new each time. The conference also grows from Exadata focus to more focuses on Oracle Engineering System, Big Data and Cloud. This year, I will be a speaker at E4 and co-present with Rashmi Kansakar of 84.51 on the topic of Analytics as a Business with Exadata and Big Data. It will have a lot of technical stuff in the presentation. The conference will be from June 12 to 15. For more information about E4, please visit Accenture Enkitec Group’s E4 site at

Validate Java Keystore on BDA

In many projects, I need to create a keystore to store SSL certifications. Majority of times I hardly worry about the validity of a Keystore. My keystores just works and I can see the content of all certifications by using keytool command. It works pretty well until recently when I needed to configure TLS for Cloudera Manager on BDA.
BDA has its own command to enable TLS for Cloudera Manager,Hue and Oozie in a pretty easy way. Just run command bdacli enable https_cm_hue_oozie. The only drawback for this command is that it is using self-signed certificate, not the users’ own certificates. Although it works good from security perspective, it’s not a good idea in the long run. I need to replace Oracle’s self-signed certificates with client’s certificates on BDA. Either Cloudera’s approach or Oracle’s approach is not going to work. Anyway, it is a different topic and I will discuss it in a different blog.

During my work to enable TLS with Cloudera Manager using client’s certificates, I run into various issues. After looking at many issues in detail, I suspect the key issue of my problem might come from the incorrectness of my keystore. Unfortunately to configure TLS with Cloudera Manager, agent and services, it requires to shut down CDH cluster and many steps to reach the stage I can test the keystore. It’s too time consuming for a busy BDA cluster. This blog is to discuss the approach to find a way, fast, easy and independent of CDH cluster to verify the content of a keystore is valid or not. Most importantly avoid the bridge building mistake shown below.

As my topic is related to BDA, I am going to list the ways to create a keystore in both Cloudera and Oracle ways.

Cloudera Way
See Cloudera’s document Step 1: Obtain Encryption Keys and Certificates for Cloudera Manager Server
I just highlight the key steps and commands as follows:
1. Generate Keystore for Cloudera Manager Host (Node 3 on BDA)

# keytool -genkeypair -alias cmhost -keyalg RSA -keystore \
/opt/cloudera/security/jks/cmhost-keystore.jks -keysize 2048 -dname \
",OU=Security,O=Example,L=Denver,ST=Colorado,C=US" \
-storepass password -keypass password

2. Generate a CSR for the host.

# keytool -certreq -alias cmhost \
-keystore /opt/cloudera/security/jks/cmhost-keystore.jks \
-file /opt/cloudera/security/x509/cmhost.csr -storepass password \
-keypass password

3. Submit the .csr file created by the -certreq command to Certificate Authority to obtain a server certificate.
4. Copy the root CA certificate and any intermediate CA certificates to /opt/cloudera/security/CAcerts/.
There is no /opt/cloudera/security/CAcerts/ directory exist on BDA and I don’t believe it is necessary.
Actually I like Oracle approach, just copy the root and intermediate CA certificates to /opt/cloudera/security/jks directory. But I do like Cloudera’s approach to import root CA and intermediate CA certificates to the alternative system JDK truststore, jssecacerts, before importing them to the Java keystore on BDA. This is what Oracle’s approach is missing.

# cp $JAVA_HOME/jre/lib/security/cacerts $JAVA_HOME/jre/lib/security/jssecacerts

# keytool -importcert -alias RootCA -keystore $JAVA_HOME/jre/lib/security/jssecacerts \
-file /opt/cloudera/security/CAcerts/RootCA.cer -storepass changeit

# keytool -importcert -alias SubordinateCA -keystore \
$JAVA_HOME/jre/lib/security/jssecacerts \
-file /opt/cloudera/security/CAcerts/SubordinateCA.cer -storepass changeit

5. Import the root and intermediate certificates into keystore.

# keytool -importcert -trustcacerts -alias RootCA -keystore \
/opt/cloudera/security/jks/cmhost-keystore.jks -file \
/opt/cloudera/security/CAcerts/RootCA.cer -storepass password

# keytool -importcert -trustcacerts -alias SubordinateCA -keystore \ 
/opt/cloudera/security/jks/cmhost-keystore.jks -file \
/opt/cloudera/security/CAcerts/SubordinateCA.cer -storepass password

6. Import the signed host certificate

# cp certificate-file.cer  /opt/cloudera/security/x509/cmhost.pem

# keytool -importcert -trustcacerts -alias cmhost \ 
-file /opt/cloudera/security/x509/cmhost.pem \ 
-keystore /opt/cloudera/security/jks/cmhost-keystore.jks -storepass password

Oracle Way
See Oracle Note How to Use Certificates Signed by a User’s Certificate Authority for Web Consoles and Hadoop Network Encryption Use on the BDA (Doc ID 2187903.1)

1. Create the keystore on all nodes called /opt/cloudera/security/jks/node.jks
This is the place I like Oracle’s approach. Cloudera does require to have keystore in all hosts, but document in a way in separate chapters: Cloudera Manager and Agent. Only when I am done with the configuration, I realized why not combine them together in one single step. This is where Oracle’s approach is much simpler and easy.

# dcli -C keytool -validity 720 -keystore /opt/cloudera/security/jks/node.jks \
-alias \$HOSTNAME -genkeypair -keyalg RSA -storepass $PW -keypass $PW \
-dname "CN=\${HOSTNAME},OU=,O=,L=,S=,C="  

# dcli -C ls -l /opt/cloudera/security/jks/node.jks

2. Create CSR for each node.

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias \$HOSTNAME \
-certreq -file /root/\$HOSTNAME-cert-file -keypass $PW -storepass $PW 

3. Submit the node specific CSR to CA and signed.
4. Copy the signed certificate to cert_file_signed
cert_file_signed_bdanode01 would be copied to Node 1 as: /opt/cloudera/security/jks/cert_file_signed
cert_file_signed_bdanode02 would be copied to Node 2 as: /opt/cloudera/security/jks/cert_file_signed

cert_file_signed_bdanode0n would be copied to Node n as: /opt/cloudera/security/jks/cert_file_signed
5. Copy CA public certificate to /opt/cloudera/security/jks/ca.crt

# cp /tmp/staging/ca.crt /opt/cloudera/security/jks/ca.crt  
# dcli -C -f /opt/cloudera/security/jks/ca.crt -d /opt/cloudera/security/jks/ca.crt  
# dcli -C ls -ltr /opt/cloudera/security/jks/ca.crt

6. Import the CA public certificate /opt/cloudera/security/jks/ca.crt into the keystore on each node

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias CARoot \
-import -file /opt/cloudera/security/jks/ca.crt -storepass $PW -keypass $PW -noprompt

7. Import the signed certificate for each node on BDA

# dcli -C keytool -keystore /opt/cloudera/security/jks/node.jks -alias \$HOSTNAME \
-import -file /opt/cloudera/security/jks/cert_file_signed -storepass $PW -keypass $PW -noprompt 

So for TLS on BDA, the keystore file is /opt/cloudera/security/jks/node.jks. Another important file is Truststore at /opt/cloudera/security/jks/.truststore. The approach to build this file is quite similar as node.jks.

Ok, I have the node.jks file. How to verify it that it is a valid one? Like many people, I used to use keytool command to check out the content of keystore file. For example,

[root@enkx4bda1node01 ~]# keytool -list -v -keystore /opt/cloudera/security/jks/node.jks
Enter keystore password:  

*****************  WARNING WARNING WARNING  *****************
* The integrity of the information stored in your keystore  *
* has NOT been verified!  In order to verify its integrity, *
* you must provide your keystore password.                  *
*****************  WARNING WARNING WARNING  *****************

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: enkx4bda1node01.enkitec.local
Creation date: Mar 5, 2016
Entry type: PrivateKeyEntry
Certificate chain length: 1
Owner: CN=enkx4bda1node01.enkitec.local, OU=, O=, L=, ST=, C=
Issuer: CN=enkx4bda1node01.enkitec.local, OU=, O=, L=, ST=, C=
Serial number: 26a1471b
Valid from: Sat Mar 05 02:17:40 CST 2016 until: Fri Feb 23 02:17:40 CST 2018
Certificate fingerprints:
	 MD5:  10B:30:3A:40:CD:94:38:7D:3A:33:1F:DD:49:B7:DF:99
	 SHA1: 98:6F:FC:84:68:BA:BD:25:37:8A:1B:D6:07:6F:FE:14:41:76:5B:09
	 SHA256: L3:43:4C:4C:9B:0E:36:18:DD:F1:10:84:46:9E:77:AA:BB:C7:85:E5:FC:19:4F:29:7F:70:BA:D4:0C:55:AD:F7
	 Signature algorithm name: SHA256withRSA
	 Version: 3


#1: ObjectId: Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: GH FD 23 C9 9A A3 28 F9   3D C5 3B 1E E7 97 49 4E  ......(.=.:...IN
0010: 12 69 27 D5                                        .i(.


It is usually works, but with certain limitations. Even the keystore has all the necessary certificates, if they are not in the right order, it might not be a valid one. As I suspect my keystore on BDA might not be a valid one, I tried to find other potential tools beyond keytool. Luckily, I found a blog Installing Trusted Certificates into a Java Keystore by Oracle’s Jim Connors. It’s a very nice blog about various tools for keystore. I am really interested in one of the tool he talked about: using weblogic.jar‘s ValidateCertChain program.

I happened to build an OEM Cloud Control 13c R2 environment. Ok, let me give it a try.

[root@enkx4bdacli02 tmp]# java -cp /u01/app/oracle/oem/wlserver/server/lib/weblogic.jar utils.ValidateCertChain -jks node.jks
Certificate chain is incomplete, can't confirm the entire chain is valid
Certificate chain appears valid

It indeed find something and tell me my certificate chain is incomplete. This gives me the clue to focus only on the steps in building keystore. After I figured out the issue and fixed the import sequence of certificates, rerun the command again. Here is the result:

[root@enkx4bdacli02 tmp]# java -cp /u01/app/oracle/oem/wlserver/server/lib/weblogic.jar utils.ValidateCertChain -jks node.jks
Cert[1]: CN=EnkLab Intermediate CA,OU=Bigdata,O=Enkitec,ST=Texas,C=US
Cert[2]: CN=EnkLab ROOT CA,OU=Bigdata,O=Enkitec,L=Irving,ST=TX,C=US
Certificate chain appears valid

Looks much better. It correctly shows there are one root certificate, one intemediate CA certificate, and one host certificate. This keystore is one of my major issues in building keystore on BDA.

There is another command, openssl s_client, to validate keystore, but only useful when everything is configured.

# openssl s_client -connect -CAfile
depth=2 C = US, ST = TX, L = Irving, O = Enkitec, OU = bigdata, CN = Enklab ROOT CA
verify return:1
depth=1 C = US, ST = TX, O = Enkitec, OU = bigdata, CN = Enklab Intermediate CA
verify return:1
depth=0 C = US, ST = TX, L = Irving, O = Enkitec, OU = bigdata, CN =
verify return:1
Certificate chain
 0 s:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/
   i:/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
 1 s:/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
   i:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
 2 s:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
   i:/C=US/ST=TX/L=Irving/O=Enkitec/OU=Bigdata/CN=Bigdata ROOT CA
Server certificate

issuer=/C=US/ST=TX/O=Enkitec/OU=Bigdata/CN=Bigdata Intermediate CA
No client certificate CA names sent
Server Temp Key: ECDH, secp521r1, 521 bits
SSL handshake has read 4430 bytes and written 443 bytes
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: 39023B1EB131C30355F20CD8F012DCF2FFC95E1A1F9F8D8D2B6954942E9
    Master-Key: XMB7RlA7yVd57iXPzl5EE73EAAB9B18B04B2718CAf1mijjAgMBAA5126650B5A3GjITAfM8EA269DBFE17A750EBBC5EC
    Key-Arg   : None
    Krb5 Principal: None
    PSK identity: None
    PSK identity hint: None
    Start Time: 9023528453
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)