The Workaround Solution for Google Cloud VM Can not Connected after Reboot

I have been using Google Cloud for the past few years in different projects. In general, I am happy with the GCP’s VM performance. For the past few days, I run into a weird issue. I manually created a VM from GCP console, and can SSH to the VM using the command: gcloud compute ssh. Did a few installations. All good until I stopped the VM and trying to reconnect back. It has never working and always get timeout error. The same gcloud compute ssh command did not work anymore.

At first, I thought maybe I have some firewall issue. But I have no issue to connect to another VM with almost identical setup without any issue after reboot. The only difference is that I created that VM sometime back. I also tried adding SSH key and it did not work as well. This is weird. The only thing I can think of is that I have a bad OS image. I am using CentOS 7 image (centos-7-v20200714). Then I tried other versions, like centos-7-v20200618, centos-7-v20200403, even Redhat 7 version. None of them worked. So I think I can rule out the image issue. I also tried taking the VM image before VM reboot and tried the restore from the image. No luck as well. Maybe the rpm installations, then I tried to skip the rpm installation. Then the connection after reboot worked. This is insane. What’s going on?

I did some research. Surprisingly I am not the only guy that has this miserable experience for the past few days. It looks like there is a bug that has impact on both Redhat and CentOS for version 7 and 8. It was caused by yum update, which was exactly the action I performed. Here is the link to the bug: System hangs after POST and the grub menu never loads after applying the RHSA-2020:3216 or RHSA-2020:3217. There is also an active issue tracker – yum update breaks GCE Instances running RHEL and CentOS 7 and 8.

I tried out the workaround solution and it worked for me. Please note: this workaround works for VM that has not rebooted yet. If you stopped the VM before you can apply the above change, bad luck and you will need other workaround for the issue.

Here are the steps:
1. Run command rpm -q shim-x64. If you see your result is one of the following, you’re impacted:

CentOS 7: shim-x64-15-7.el7_9.x86_64
CentOS 8: shim-x64-15-13.el8.x86_64
RHEL 7: shim-x64-15-7.el7_8.x86_64
RHEL 8: shim-x64-15-14.el8_2.x86_64

2. Run the downgrade command:

# yum downgrade shim\* grub2\* mokutil

3. Then add the following to /etc/yum.conf file.

exclude=grub2* shim* mokutil

After that, reboot the VM. You should be able to connect although it takes a little long for the first time connection. Good luck to anyone who has the same issue.