Use Oven to Fix iMac Hardware Issue

Seriously? Yes, I am not joking. I like repairing stuff, not for the purpose saving money, but just want to see whether I can do it. It is fun to challenge something seem quite impossible. Ok, get the story short. I have a iMac for almost nine years. It’s an old, but still functioning good. Recently it show the issue of stripped screen as shown below.

I know it’s time to change a new computer for this one. I have done some repairs to my home window PC and laptop longtime back, but have never seriously fixing some hardware issue on Mac. From what I read on the internet, it seems relating to graphics card. It would cost about $150 for a new graphic card, far exceed what this iMac is worth right now. Ok, let me find out whether anyone has fix similar issue before. Then I found this one: 2010 iMac 27″ Graphics Card Replacement.

It is a nice video showing how to open iMac and replace graphics card with a new one. It has a very good instruction in how to open iMac and take parts out. Mine was actually more complicated than this one and I had to take out the whole motherboard to get the graphics card. Then I found another video Burn the VGA and Fix your iMac – Don’t Pay Shitty Problems!!!. It’s insane approach to fix this striped screen issue: put the graphics card into oven with 380 F degrees for 8 minutes. This is probably the craziest idea to repair Mac I have ever heard. But from the comments, it seems it indeed fixed many people’s problem. Anyway, I am going to throw out this computer. So there is nothing to lose if I can fix it.

So my journey began during the Thanksgiving holiday. First I bought the required tools from Amazon: OWC General Servicing Kit for all Apple iMacs 2007 & Later. Actually the most useful tools are T10, T8 Torx Drivers and Suction Cups. 

I followed the videos’ instructions to remove the monitor screen and take out monitor related stuff. Here is what it looks like when monitor is removed.

As you can see, Apple intentionally makes the repair work of iMac super painful. Everything is packed in this limited space with many wirings and cables. I was not even sure I can put back anything after I disassembly the iMac. After I removed many screws, the whole motherboard was still stuck there and could not be taken out. I reviewed the videos many times and still did not have a clue why it was still stuck. Finally I remembered I upgraded RAM before and know RAM is located at the bottom of the iMac. Maybe it was RAM. So I took out the RAM and tried again. Bingo! It was RAM. Here is the photo for the motherboard:

Checked the clock, 2 hrs passed. Get the graphic card out of motherboard is quite simple, just a few minutes work.

As the video instructed, I put the card in the oven (380 F degrees) for 8 minutes. Then took it out and cool off completely. I thought it might be easier to put them back. I was completely wrong. It took 200% to 300% effort to put them back in place. As the space is very tight in iMac, any small variation in positioning of motherboard would cause issue and had to redo the whole assembly process. Some issues are:

  • One time. I assembled everything. Then I realized I could not put in RAMs at the bottom. Managed to squeeze one bank of RAM, but still had issues for the rest of 3 banks of RAMs.
  • Sometimes I had a perfect fit for the RAMs. Then realized I forgot to plug in Data cable from hard disk to motherboard. The computer can show the Apple logo, but it would show a folder logo with question mark.
  • A few times, I forgot to connect monitor cable to motherboard. During the boot, it had the normal noise of startup, but with complete black screen.
  • Sometimes, a few cables was blocked behind the motherboard and I couldn’t find them when I tried to plug them in.

Finally after almost 10 times re-assembly process, I finally made it working. It shows the nice Apple logo.

Exhausted after 9 hrs of work, but really had fun in fixing stuff.


Could not Get External IP for Load Balancer on Azure AKS

I used Kubernetes service on Google Cloud Platform and it was a great service. I also wrote one blog, Running Spark on Kubernetes, on this area. Recently I used Azure Kubernetes Service (AKS) for a different project and run into some issues. One of major annoying issues was that I could not get external IP for load balancer on AKS. This blog discusses the process I identified the issue and solution for this problem.

I used the example from Microsoft, Use Azure Kubernetes Service with Kafka on HDInsight, for my testing. The source code can be accessed at The example is pretty simple and straight forward and the most import part is file kafka-aks-test.yaml. Here is the content of the file.

apiVersion: apps/v1beta1
kind: Deployment
  name: kafka-aks-test
  replicas: 1
      maxSurge: 1
      maxUnavailable: 1
  minReadySeconds: 5
        app: kafka-aks-test
        - name: kafka-aks-test
          image: microsoft/kafka-aks-test:v1
            - containerPort: 80
              cpu: 250m
              cpu: 500m
apiVersion: v1
kind: Service
  name: kafka-aks-test
  type: LoadBalancer
    - port: 80
    app: kafka-aks-test

We can see the Service is using LoadBalancer. So it should automatically get an External IP for my load balancer of the service. Unfortunately, I can not get this external IP and was stuck in Pending stage forever.

[root@ Kafka-AKS-Test]# kubectl get service kafka-aks-test --watch
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kafka-aks-test   LoadBalancer   <pending>     80:32656/TCP   10s

To make the debugging process simpler, I used the following two lines of commands to create a NGIX service. This is a nice and quick way to find out whether AKS is working or not.

kubectl run my-nginx --image=nginx --replicas=1 --port=80
kubectl expose deployment my-nginx --port=80 --type=LoadBalancer

Got the same issue. For AKS service, a good way to find out what’s going on in the service is to use kubectl describe service command. Here is output from this command.

[root@ AKS-Test]# kubectl describe service my-nginx
Name:                     my-nginx
Namespace:                default
Labels:                   run=my-nginx
Annotations:              <none>
Selector:                 run=my-nginx
Type:                     LoadBalancer
IP:                       <pending>
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  31478/TCP
Session Affinity:         None
External Traffic Policy:  Cluster
  Type     Reason                      Age               From                Message
  ----     ------                      ----              ----                -------
  Warning  CreatingLoadBalancerFailed  2m (x3 over 3m)   service-controller  Error creating 
  load balancer (will retry): failed to ensure load balancer for service default/my-nginx: 
  [ensure(default/my-nginx): lb(kubernetes) - failed to ensure host in pool: "network.InterfacesClient#CreateOrUpdate: 
  Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service 
  returned an error. Status=403 Code=\"LinkedAuthorizationFailed\" Message=\"The client 
  '11b7e54a-e1bc-4092-af66-b014c11d9b87' with object id '11b7e54a-e1bc-4092-af66-b014c11d9b87' 
  has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope 
  however, it does not have permission to perform action 'Microsoft.Network/virtualNetworks/
  subnets/join/action' on the linked scope(s) '/subscriptions/763d9895-8916-4d35-8b43-d51b52642cef/
  subnets/snet-aks2'.\"", ensure(default/my-nginx): lb(kubernetes) - failed to ensure host 
  in pool: "network.InterfacesClient#CreateOrUpdate: Failure responding to request: StatusCode=403 
  -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"LinkedAuthorizationFailed\" 
  Message=\"The client '11b7e54a-e1bc-4092-af66-b014c11d9b87' with object id '11b7e54a-e1bc-4092-af66-b014c11d9b87' 
  has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope
  . . . . 

It seems this is a common issue and many people run into the similar issue. Checked out the issue site for Github and found out one issue related to my problem, Azure AKS CreatingLoadBalancerFailed on AKS cluster with advanced networking. One of recommendations was to add AKS’ Service Principal (SP) to the subnet or VNet as Contributor. Did not work on me. Tried to add the SP as Owner. It didn’t work.

If running command kubectl get all –all-namespaces, it provides everything related to Kubernetes on AKS.

[root@ ~]# kubectl get all --all-namespaces
NAMESPACE     NAME                                                                  READY     STATUS             RESTARTS   AGE
kube-system   pod/addon-http-application-routing-default-http-backend-66c97fw842d   1/1       Running            1          2d
kube-system   pod/addon-http-application-routing-external-dns-c547864b7-r7zts       1/1       Running            1          5d
kube-system   pod/addon-http-application-routing-nginx-ingress-controller-642qfcp   0/1       CrashLoopBackOff   4          1m
kube-system   pod/azureproxy-79c5db744-7ndvk                                        1/1       Running            4          5d
kube-system   pod/heapster-55f855b47-q5jtf                                          2/2       Running            0          2d
kube-system   pod/kube-dns-v20-7c556f89c5-5ngp5                                     3/3       Running            0          5d
kube-system   pod/kube-dns-v20-7c556f89c5-djf7d                                     3/3       Running            3          2d
kube-system   pod/kube-proxy-dpt28                                                  1/1       Running            2          5d
kube-system   pod/kube-proxy-jq8hx                                                  1/1       Running            1          5d
kube-system   pod/kube-proxy-v4xc5                                                  1/1       Running            0          5d
kube-system   pod/kube-svc-redirect-77kj4                                           1/1       Running            2          5d
kube-system   pod/kube-svc-redirect-j9545                                           1/1       Running            1          5d
kube-system   pod/kube-svc-redirect-kvh2r                                           1/1       Running            0          5d
kube-system   pod/kubernetes-dashboard-546f987686-ws5nm                             1/1       Running            0          2d
kube-system   pod/omsagent-4xn72                                                    1/1       Running            2          5d
kube-system   pod/omsagent-fbjsp                                                    1/1       Running            1          5d
kube-system   pod/omsagent-pvfrt                                                    1/1       Running            0          5d
kube-system   pod/tiller-deploy-7ccf99cd64-tstvl                                    1/1       Running            1          23h
kube-system   pod/tunnelfront-55bbb6b96c-nhlbk                                      1/1       Running            0          5d

NAMESPACE     NAME                                                          TYPE           CLUSTER-IP        EXTERNAL-IP   PORT(S)                      AGE
default       service/kubernetes                                            ClusterIP       <none>        443/TCP                      1d
kube-system   service/addon-http-application-routing-default-http-backend   ClusterIP    <none>        80/TCP                       5d
kube-system   service/addon-http-application-routing-nginx-ingress          LoadBalancer    <pending>     80:32704/TCP,443:32663/TCP   5d
kube-system   service/heapster                                              ClusterIP     <none>        80/TCP                       5d
kube-system   service/kube-dns                                              ClusterIP      <none>        53/UDP,53/TCP                5d
kube-system   service/kubernetes-dashboard                                  ClusterIP   <none>        80/TCP                       5d
kube-system   service/tiller-deploy                                         ClusterIP    <none>        44134/TCP                    23h

NAMESPACE     NAME                                     DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
kube-system   daemonset.extensions/kube-proxy          3         3         3         3            3    5d
kube-system   daemonset.extensions/kube-svc-redirect   3         3         3         3            3    5d
kube-system   daemonset.extensions/omsagent            3         3         3         3            3    5d

NAMESPACE     NAME                                                                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.extensions/addon-http-application-routing-default-http-backend       1         1         1            1           5d
kube-system   deployment.extensions/addon-http-application-routing-external-dns               1         1         1            1           5d
kube-system   deployment.extensions/addon-http-application-routing-nginx-ingress-controller   1         1         1            0           5d
kube-system   deployment.extensions/azureproxy                                                1         1         1            1           5d
kube-system   deployment.extensions/heapster                                                  1         1         1            1           5d
kube-system   deployment.extensions/kube-dns-v20                                              2         2         2            2           5d
kube-system   deployment.extensions/kubernetes-dashboard                                      1         1         1            1           5d
kube-system   deployment.extensions/tiller-deploy                                             1         1         1            1           23h
kube-system   deployment.extensions/tunnelfront                                               1         1         1            1           5d

NAMESPACE     NAME                                                                                       DESIRED   CURRENT   READY     AGE
kube-system   replicaset.extensions/addon-http-application-routing-default-http-backend-66c97f5dc7       1         1         1         5d
kube-system   replicaset.extensions/addon-http-application-routing-external-dns-c547864b7                1         1         1         5d
kube-system   replicaset.extensions/addon-http-application-routing-nginx-ingress-controller-6449fd79f9   1         1         0         5d
kube-system   replicaset.extensions/azureproxy-79c5db744                                                 1         1         1         5d
kube-system   replicaset.extensions/heapster-55f855b47                                                   1         1         1         5d
kube-system   replicaset.extensions/heapster-56c6f9566f                                                  0         0         0         5d
kube-system   replicaset.extensions/kube-dns-v20-7c556f89c5                                              2         2         2         5d
kube-system   replicaset.extensions/kubernetes-dashboard-546f987686                                      1         1         1         5d
kube-system   replicaset.extensions/tiller-deploy-7ccf99cd64                                             1         1         1         23h
kube-system   replicaset.extensions/tunnelfront-55bbb6b96c                                               1         1         1         5d

NAMESPACE     NAME                               DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
kube-system   daemonset.apps/kube-proxy          3         3         3         3            3    5d
kube-system   daemonset.apps/kube-svc-redirect   3         3         3         3            3    5d
kube-system   daemonset.apps/omsagent            3         3         3         3            3    5d

NAMESPACE     NAME                                                                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/addon-http-application-routing-default-http-backend       1         1         1            1           5d
kube-system   deployment.apps/addon-http-application-routing-external-dns               1         1         1            1           5d
kube-system   deployment.apps/addon-http-application-routing-nginx-ingress-controller   1         1         1            0           5d
kube-system   deployment.apps/azureproxy                                                1         1         1            1           5d
kube-system   deployment.apps/heapster                                                  1         1         1            1           5d
kube-system   deployment.apps/kube-dns-v20                                              2         2         2            2           5d
kube-system   deployment.apps/kubernetes-dashboard                                      1         1         1            1           5d
kube-system   deployment.apps/tiller-deploy                                             1         1         1            1           23h
kube-system   deployment.apps/tunnelfront                                               1         1         1            1           5d

NAMESPACE     NAME                                                                                 DESIRED   CURRENT   READY     AGE
kube-system   replicaset.apps/addon-http-application-routing-default-http-backend-66c97f5dc7       1         1         1         5d
kube-system   replicaset.apps/addon-http-application-routing-external-dns-c547864b7                1         1         1         5d
kube-system   replicaset.apps/addon-http-application-routing-nginx-ingress-controller-6449fd79f9   1         1         0         5d
kube-system   replicaset.apps/azureproxy-79c5db744                                                 1         1         1         5d
kube-system   replicaset.apps/heapster-55f855b47                                                   1         1         1         5d
kube-system   replicaset.apps/heapster-56c6f9566f                                                  0         0         0         5d
kube-system   replicaset.apps/kube-dns-v20-7c556f89c5                                              2         2         2         5d
kube-system   replicaset.apps/kubernetes-dashboard-546f987686                                      1         1         1         5d
kube-system   replicaset.apps/tiller-deploy-7ccf99cd64                                             1         1         1         23h
kube-system   replicaset.apps/tunnelfront-55bbb6b96c                                               1         1         1         5d

Pay attention more on the pod that has CrashLoopBackOff error. I saw this CrashLoopBackOff thing restarted over 1000 times within 5 days in our first AKS cluster. This is one Pod that is used internally by AKS before we can deploy anything else.

I opened a ticket with Microsoft and got Microsoft Support to work with me. After a very long conference call and even completely reinstalled AKS cluster, we finally figured out the way to get around this issue. The key is to give correct permission for AKS Service Principal.

There is one drawback when deploying AKS with Azure UI. You can not specify the name of Service Principal and SP is automatically created with the name like . For us, we have installed and uninstall AKS multiple times, so we have a few SP names. It is confusing to decide which one is the one we really care. Finding out the correct SP name is a challenge task. Anyway, the followings are the steps to add correct permission to AKS Service Principal.

1. Get Client ID
Run the following command to get client id.

[root@ AKS-Test]# az aks show -n exa-aksc2 -g exa-dev01-ue1-aksc2-vnet2-rg | grep clientId
"clientId": "27ae6273-9706-4156-b546-607279623990"

2. Get SP Name
Click Azure Active Directory, then click App registrations. Change dropdown from My Apps to All apps. Then input the clientId. It should show the SP name as screen below.

3. Set Correct Permission for the SP
At the time when AKS creates the cluster, it creates a SP showing above. Then grant Contributor role to the SP. This is the problem as certain operations require OWNER permissions. So need to add Owner role to the SP. All the resources used by AKS cluster are under MC_* resource group. In our case, it is MC_exa-dev01-ue1-aksc2-vnet2-rg_exa-aksc2_eastus.

Click Resource Group, then MC_exa-dev01-ue1-aksc2-vnet2-rg_exa-aksc2_eastus. Click Access Control (IAM), then click + Add.

After this change, our issue was gone. Here is the result from describe service. No error this time.

[root@exa-dev01-ue1-kfclient1-vm Kafka-AKS-Test]# kubectl describe service my-nginx
Name:                     my-nginx
Namespace:                default
Labels:                   run=my-nginx
Annotations:              <none>
Selector:                 run=my-nginx
Type:                     LoadBalancer
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  32026/TCP
Session Affinity:         None
External Traffic Policy:  Cluster
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  8s    service-controller  Ensuring load balancer

The deployment also looks good.
Sample output:

[root@exa-dev01-ue1-kfclient1-vm Kafka-AKS-Test]# kubectl describe deployment my-nginx
Name: my-nginx
Namespace: default
CreationTimestamp: Thu, 14 Jun 2018 15:03:23 +0000
Labels: run=my-nginx
Selector: run=my-nginx
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: run=my-nginx
Image: nginx
Port: 80/TCP
Host Port: 0/TCP
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
NewReplicaSet: my-nginx-9d5677d94 (1/1 replicas created)

For more information about our issue, you can check it out at

Running Spark on Kubernetes

If 2017 is the year of Docker, 2018 is the year for Kubernetes. Kubernetes allows easy container management. It does not manage containers directly, but pods. A pod has one or more tightly coupled containers as a deployed object. Kubernetes also supports horizontal autoscaling for the pods. When the application is accessed by a large number of users, you can instruct Kubernetes to replicate your pods to balance the load. As expected, Spark can be deployed on Kubernetes. Currently there are a few ways to run Spark on Kubernetes.

1. Standalone Spark Cluster
Spark Standalone Mode is a nice way to quickly start a Spark cluster without using YARN or Mesos. In this way, you don’t have to use HDFS to store huge datasets. Instead you can use cloud storage to store whatever you like and decouple Spark Cluster with its storage. For a spark cluster, you will have one pod for Spark Master and multiple pods for Spark workers. In the case when you want to run the job, just deploy Spark Master and create a Master service. Then you could deploy multiple Spark workers. Once the job completes, delete all the pods from Kubernetes Workload.

Actually this is the recommended way to run jobs against big dataset on cloud. You don’t need 200 nodes Spark cluster running all the time, just run whenever you need to run the job. This is going to save significantly on the cloud cost. The Standalone Spark Cluster is not my topic in this blog and I may cover it in a different blog.

2. Spark on Kubernetes
Spark on Kubernetes is another interesting mode to run Spark cluster. It uses native Kubernetes scheduler for the resource management of Spark cluster. Here is the architecture of Spark on Kubernetes.

There is a blog, Apache Spark 2.3 with Native Kubernetes Support, which go through the steps to start a basic example Pi. However, I followed the steps and it did not work. Many steps and stuffs are missing. After some research, I figured out the correct steps to run it on Google Cloud Platform (GCP). This blog discusses the steps to show how to run the Pi example on Kubernetes.

Download Apache Spark 2.3
One of the major changes in this release is the inclusion of new Kubernetes Scheduler backend.The software can be downloaded at or After downloading the software, unzip the file in the local machine.

Build Docker Image
The Spark on Kubernetes requires to specify an image for its driver and executors. I can get a Spark image from somewhere. But I like to build the image by myself. So I can easily customize it in the future. There is a docker file under spark-2.3.0-bin-hadoop2.7/kubernetes/dockerfiles/spark directory.

[root@docker1 spark]# cat Dockerfile 
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

FROM openjdk:8-alpine

ARG spark_jars=jars
ARG img_path=kubernetes/dockerfiles

# Before building the docker image, first build and make a Spark distribution following
# the instructions in
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    apk upgrade --no-cache && \
    apk add --no-cache bash tini libc6-compat && \
    mkdir -p /opt/spark && \
    mkdir -p /opt/spark/work-dir \
    touch /opt/spark/RELEASE && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd

COPY ${spark_jars} /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY conf /opt/spark/conf
COPY ${img_path}/spark/ /opt/
COPY examples /opt/spark/examples
COPY data /opt/spark/data

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir

ENTRYPOINT [ "/opt/" ]

Pay more attention of line COPY examples /opt/spark/examples. The associated jar file for Pi example is in the examples directory. You need to remember to use this path /opt/spark/examples instead of the path on your local machine that run the job submission. I run into an issue of SparkPi class not found. It was caused by the fact I included the local path to the jar file on my local computer instead of the path on the docker image.

I has a Docker VM and use it for all Docker related operations. Logon the docker VM and run the followings to download/unzip the software:

[root@docker1 ]# mkdir spark-2.3
[root@docker1 ]# cd spark-2.3
[root@docker1 spark-2.3]# wget
--2018-04-24 19:11:09--
Resolving (, 2001:bc8:2142:300::
Connecting to (||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 226128401 (216M) [application/x-gzip]
Saving to: ‘spark-2.3.0-bin-hadoop2.7.tgz’

100%[===========================================================================================================================================>] 226,128,401 26.8MB/s   in 8.8s   

2018-04-24 19:11:18 (24.6 MB/s) - ‘spark-2.3.0-bin-hadoop2.7.tgz’ saved [226128401/226128401]

[root@docker1 spark-2.3]# ls -l
total 220856
-rw-r--r--. 1 root root     22860 Apr 24 19:10 spark-2.3.0-bin-hadoop2.7.tgz
-rw-r--r--. 1 root root 226128401 Feb 22 19:54 spark-2.3.0-bin-hadoop2.7.tgz.1
[root@docker1 spark-2.3]# tar -xzf spark-2.3.0-bin-hadoop2.7.tgz

Build the image and push to my google private container registry.

[root@docker1 spark-2.3.0-bin-hadoop2.7]# bin/ -r -t k8s-spark-2.3 build
Sending build context to Docker daemon  256.4MB
Step 1/14 : FROM openjdk:8-alpine
8-alpine: Pulling from library/openjdk
ff3a5c916c92: Pull complete 
5de5f69f42d7: Pull complete 
fd869c8b9b59: Pull complete 
. . . .
Step 13/14 : WORKDIR /opt/spark/work-dir
Removing intermediate container ed4b6fe3efd6
 ---> 69cd2dd1cae8
Step 14/14 : ENTRYPOINT [ "/opt/" ]
 ---> Running in 07da54b9fd34
Removing intermediate container 07da54b9fd34
 ---> 9c3bd46e026d
Successfully built 9c3bd46e026d
Successfully tagged

[root@docker1 spark-2.3.0-bin-hadoop2.7]# bin/ -r -t k8s-spark-2.3 push
The push refers to repository []
e7930b27b5e2: Pushed 
6f0480c071be: Pushed 
d7e218db3d89: Pushed 
8281f673b660: Pushed 
92e162ecfbe3: Pushed 
938ba54601ba: Pushed 
dc1345b437d9: Pushed 
4e3f1d639db8: Pushed 
685fdd7e6770: Layer already exists 
c9b26f41504c: Layer already exists 
cd7100a72410: Layer already exists 
k8s-spark-2.3: digest: sha256:2f865bf17985317909c866d036ba7988e1dbfc5fe10440a95f366264ceee0518 size: 2624

[root@docker1 ~]# docker image ls
REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE                 k8s-spark-2.3       9c3bd46e026d        3 days ago          346MB
ubuntu                                           16.04               c9d990395902        2 weeks ago         113MB
hello-world                                      latest              e38bc07ac18e        2 weeks ago         1.85kB
openjdk                                          8-alpine            224765a6bdbe        3 months ago        102MB

Check Google Container Registry. It shows the image with the correct tag k8s-spark-2.3.

Configure RBAC
I have already had a Kubernetes cluster up and running with 3 nodes. I have to setup Role-Based Access Control (RBAC) to allow Spark on Kubernetes working. Otherwise it will throw the error as follows during job execution:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-449efacd5a4a386ca31177faddb8eab4-driver. Message: Forbidden!Configured service account doesn’t have access. Service account may have been revoked. pods “spark-pi-449efacd5a4a386ca31177faddb8eab4-driver” is forbidden: User “system:serviceaccount:default:default” cannot get pods in the namespace “default”: Unknown user “system:serviceaccount:default:default”.

Check service account and clusterrolebinding.

weidong.zhou:@macpro spark-2.3.0-bin-hadoop2.7 > kubectl get serviceaccount
default   1         5m
weidong.zhou:@macpro spark-2.3.0-bin-hadoop2.7 > kubectl get clusterrolebinding
NAME                                           AGE
cluster-admin                                  5m
event-exporter-rb                              5m
gce:beta:kubelet-certificate-bootstrap         5m
gce:beta:kubelet-certificate-rotation          5m
heapster-binding                               5m
kube-apiserver-kubelet-api-admin               5m
kubelet-cluster-admin                          5m
npd-binding                                    5m
system:basic-user                              5m
system:controller:attachdetach-controller      5m
. . . .
system:controller:statefulset-controller       5m
system:controller:ttl-controller               5m
system:discovery                               5m
system:kube-controller-manager                 5m
system:kube-dns                                5m
system:kube-dns-autoscaler                     5m
system:kube-scheduler                          5m
system:node                                    5m
system:node-proxier                            5m

Create the spark service account and cluster role binding.

weidong.zhou:@macpro spark-2.3.0-bin-hadoop2.7 > kubectl create serviceaccount spark
serviceaccount "spark" created
weidong.zhou:@macpro spark-2.3.0-bin-hadoop2.7 > kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
clusterrolebinding "spark-role" created

weidong.zhou:@macpro spark-2.3.0-bin-hadoop2.7 > kubectl get serviceaccount
default   1         1h
spark     1         56m

Run Spark Application
You might need to set SPARK_LOCAL_IP. Also need to find out MASTER_IP by running kubectl cluster-info | grep master |awk ‘{print $6}’. Use the following commands to set environment.

export PROJECT_ID="wz-gcptest-357812"
export ZONE="us-east1-b"
export KUBE_CLUSTER_NAME="wz-kube1"

gcloud config set project ${PROJECT_ID}
gcloud config set compute/zone ${ZONE}
gcloud container clusters get-credentials ${KUBE_CLUSTER_NAME}

Finally I can run the job. I intentionally gave a parameter of 1000000 to make the job running for a long time.

bin/spark-submit \
    --master k8s:// \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=2 \
    --conf \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark  \
    --conf \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar 1000000

If checking out GCP’s Kubernetes Workload screen, you will see one Spark driver and two executors running.

Monitor the Spark Job
If the job can run for a longer time, you will see the screen below when checking out Pod details. It shows CPU, Memory and Disk usage. It is usually good enough for monitoring purpose.

But how do I check out Spark UI screen? There are no resource manager like YARN in the picture. At this moment I need to use port forwarding to access Spark UI. Find out the driver pod and then setup the port forwarding.

weidong.zhou:@macpro ~ > kubectl get pods
NAME                                               READY     STATUS    RESTARTS   AGE
spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver   1/1       Running   0          7m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-1   1/1       Running   0          7m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-2   1/1       Running   0          7m
weidong.zhou:@macpro ~ > kubectl port-forward spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver 4040:4040
Forwarding from -> 4040

Find out the IP for the pod.

weidong.zhou:@macpro mytest_gcp > kubectl get pod -o wide
NAME                                               READY     STATUS    RESTARTS   AGE       IP          NODE
spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver   1/1       Running   0          10m   gke-wz-kube1-default-pool-2aac262a-thw0
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-1   1/1       Running   0          10m   gke-wz-kube1-default-pool-2aac262a-09vt
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-2   1/1       Running   0          10m   gke-wz-kube1-default-pool-2aac262a-23gk

Now we can see the familiar Spark UI.

If want to check out the logs from the driver pod, just run the followings:

weidong.zhou:@macpro mytest_gcp > kubectl -n=default logs -f spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver
2018-04-27 20:40:02 INFO  TaskSetManager:54 - Starting task 380242.0 in stage 0.0 (TID 380242,, executor 2, partition 380242, PROCESS_LOCAL, 7865 bytes)
2018-04-27 20:40:02 INFO  TaskSetManager:54 - Finished task 380240.0 in stage 0.0 (TID 380240) in 3 ms on (executor 2) (380241/1000000)
2018-04-27 20:40:02 INFO  TaskSetManager:54 - Starting task 380243.0 in stage 0.0 (TID 380243,, executor 1, partition 380243, PROCESS_LOCAL, 7865 bytes)
2018-04-27 20:40:02 INFO  TaskSetManager:54 - Finished task 380241.0 in stage 0.0 (TID 380241) in 5 ms on (executor 1) (380242/1000000)
2018-04-27 20:40:02 INFO  TaskSetManager:54 - Starting task 380244.0 in stage 0.0 (TID 380244,, executor 2, partition 380244, PROCESS_LOCAL, 7865 bytes)

Killing Executor and Driver
What’s happened if I killed one of executors?

weidong.zhou:@macpro mytest_gcp > kubectl get pods
NAME                                               READY     STATUS    RESTARTS   AGE
spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver   1/1       Running   0          23m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-1   1/1       Running   0          23m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-2   1/1       Running   0          23m
weidong.zhou:@macpro mytest_gcp > kubectl delete pod spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-1
pod "spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-1" deleted
weidong.zhou:@macpro mytest_gcp > kubectl get pods
NAME                                               READY     STATUS    RESTARTS   AGE
spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver   1/1       Running   0          25m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-2   1/1       Running   0          25m

After 30 seconds, check again. A new executor starts.

weidong.zhou:@macpro mytest_gcp > kubectl get pods
NAME                                               READY     STATUS    RESTARTS   AGE
spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver   1/1       Running   0          26m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-2   1/1       Running   0          25m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-3   1/1       Running   0          19s

The Spark UI show the executor changes.

This is actually what I expected. Ok, what’s happened if I killed the driver?

weidong.zhou:@macpro mytest_gcp > kubectl get pods
NAME                                               READY     STATUS    RESTARTS   AGE
spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver   1/1       Running   0          31m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-2   1/1       Running   0          31m
spark-pi-6e2c3b5d707531689031d3259f57b2ea-exec-3   1/1       Running   0          5m
weidong.zhou:@macpro mytest_gcp > kubectl delete pod spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver
pod "spark-pi-6e2c3b5d707531689031d3259f57b2ea-driver" deleted
weidong.zhou:@macpro mytest_gcp > kubectl get pods
No resources found, use --show-all to see completed objects.

So killing driver pod is actually the way to stop the Spark Application during the execution.

The nice thing about Spark on Kubernets is that all pods disappear whether the Spark job completes by it self or is killed. This allows the free of resource automatically. Overall, Spark on Kubernetes is an easy to quickly run Spark application on Kubernetes.

Use Jupyter Notebook to Access H2O Driverless AI

I discussed H2O Driverless AI installation in my last blog, Install H2O Driverless AI on Google Cloud Platform. H2O AI docker image contains the deployment of Jupyter Notebook. Once H2O AI starts, we can use Jupyter notebook directly. In this blog, I am going to discuss how to use Jupyter Notebook to connect to H2O AI.

To login Jupyter Notebook, I need to know the login token. It is usually shown in the console output at the ‎time starting Jupyter. However If I check out the Docker logs command, it shows the output from H2O AI.

root@h2otest:~# docker ps
CONTAINER ID        IMAGE                    COMMAND             CREATED             STATUS              PORTS                                                                                                NAMES
5b803337e8b5        opsh2oai/h2oai-runtime   "./"          About an hour ago   Up About an hour>8888/tcp,>9090/tcp,>12345/tcp,>54321/tcp   h2oai

root@h2otest:~# docker logs h2oai
Welcome to's Driverless AI
     version: 1.0.30

- Put data in the volume mounted at /data
- Logs are written to the volume mounted at /log/20180424-140930
- Connect to Driverless AI on port 12345 inside the container
- Connect to Jupyter notebook on port 8888 inside the container

But the output at least tells me the logfile location. SSH to the container and check out Jupyter log.

root@h2otest:~# ./ 
root@5b803337e8b5:/# cd /log/20180424-140930
root@5b803337e8b5:/log/20180424-140930# ls -l
total 84
-rw-r--r-- 1 root root 61190 Apr 24 14:53 h2oai.log
-rw-r--r-- 1 root root 14340 Apr 24 15:14 h2o.log
-rw-r--r-- 1 root root  2700 Apr 24 14:58 jupyter.log
-rw-r--r-- 1 root root    52 Apr 24 14:09 procsy.log
root@5b803337e8b5:/log/20180424-140930# cat jupyter.log
[I 14:10:01.512 NotebookApp] Writing notebook server cookie secret to /jupyter/.local/share/jupyter/runtime/notebook_cookie_secret
[W 14:10:04.062 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 14:10:04.224 NotebookApp] Serving notebooks from local directory: /jupyter
[I 14:10:04.224 NotebookApp] 0 active kernels
[I 14:10:04.224 NotebookApp] The Jupyter Notebook is running at:
[I 14:10:04.224 NotebookApp] http://[all ip addresses on your system]:8888/?token=f1b8f6dc7fb0aab7caec278a2bf971249b765140e4b3b338
[I 14:10:04.224 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:10:04.224 NotebookApp] 
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
[W 14:19:26.189 NotebookApp] 401 POST /login?next=%2Ftree%3F ( 834.30ms referer=
[I 14:20:15.706 NotebookApp] 302 POST /login?next=%2Ftree%3F ( 1.36ms

Although this approach worked majority of time, I did run into issue for a few times that Jupyter login said the token is invalid. After some research, I found out another way that guarantees to get the correct token. It’s a json file under /jupyter/.local/share/jupyter/runtime directory. The filename nbserver-xx.json changes each time H2O AI starts.

root@5b803337e8b5:/# ls -l /jupyter/.local/share/jupyter/runtime
total 12
-rw-r--r-T 1 root root  263 Apr 24 14:24 kernel-b225302b-f2d9-47ac-b99c-f1f55eb54021.json
-rw-r--r-- 1 root root  245 Apr 24 14:10 nbserver-51.json
-rw------- 1 root root 1386 Apr 24 14:10 notebook_cookie_secret
root@5b803337e8b5:/# cat /jupyter/.local/share/jupyter/runtime/nbserver-51.json
  "base_url": "/",
  "hostname": "localhost",
  "notebook_dir": "/jupyter",
  "password": false,
  "pid": 51,
  "port": 8888,
  "secure": false,
  "token": "f1b8f6dc7fb0aab7caec278a2bf971249b765140e4b3b338",
  "url": "http://localhost:8888/"

Based on that, I created a script to get the token without ssh to the container.

root@h2otest:~# cat 

JSON_FILENAME=`docker exec -it h2oai ls -l /jupyter/.local/share/jupyter/runtime | grep nbserver |awk '{print $9}' | tr -d "\r"`
docker exec -it h2oai grep token /jupyter/.local/share/jupyter/runtime/$JSON_FILENAME

Run the script and got the token.

root@h2otest:~# ./ 
  "token": "f1b8f6dc7fb0aab7caec278a2bf971249b765140e4b3b338",

Ok, let me go to the login screen and input the token.

The Jupyter screen shows up.

There is two sample notebooks installed by default. I tried to make it working. However the sample data in docker image does not seem working. There is no detail API document available at this moment. So I just did a few basic stuff to prove it work. The following is the code I input in the notebook.

import h2oai_client
import numpy as np
import pandas as pd
# import h2o
import requests
import math
from h2oai_client import Client, ModelParameters, InterpretParameters

address = ''
username = 'h2o'
password = 'h2o'
h2oai = Client(address = address, username = username, password = password)

stock_path = '/data/stock_price.csv'
stockData = h2oai.create_dataset_sync(stock_path)

I went back to H2O AI UI and found out three more stock_price dataset were created by my Jupyter notebook.

So each time I run the command h2oai.create_dataset_sync(stock_path), it creates a new dataset. The dataset with same path is not going to eliminated. To avoid duplication, I have to manually delete the duplicated one from UI. It’s not a big deal. Just need to remember to cleanup the duplicated dataset if run the same notebook multiple times. Another way to get around this issue is to use different login name. As different login name sees the datasets only belong to the current user, you could have a login name for production use and a different login name for development or testing. You can safely remove the duplicated dataset in the development username without worrying about removing the wrong one.

Install H2O Driverless AI on Google Cloud Platform

I wrote many blogs about H2O and H2O Sparkling Water in the past. Today I am going to discuss the installation of H2O Driverless AI (H2O AI). H2O AI targets machine learning, especially deep learning. While H2O focuses more on algorithm, models, and predication, H2O AI automates some of the most difficult data science and ML workflows to offer automatic visualizations and Machine Learning Interpretability (MLI). Here is the architecture of H2O AI.

There are some difference in different installation environment. To check out different environment, use H2O Driverless AI installation document at

This blog discusses the topic only related to Google Cloud. Here are a few important things to know before the installation.
1. It requires a lot of memory and CPUs, if possible use GPU. I uses 8 CPUs and 52 GB memory on Google cloud. If you can use GPU, add GPU option. For me, I don’t have the access to GPU in my account.
2. The OS is based on Ubuntu 16.04 and I believe it is the minimum version supported.
3. OS disk size should be >= 64GB. I used 64GB.
4. Instead of installation software package, H2O AI uses Docker image. Yes, Docker needs to be installed first.
5. If plan to use python to connect the H2O AI, the supported version of python is v3.6.

Ok, here is the installation procedure on GCP:
1. Create a new firewall rule
Click VPC Network -> Firewall Rules -> Create Firewall Rule
Input the following:
Name : h2oai
Description: The firewall rule for H2O driverless AI
Target tags: h2o
Source IP ranges:
Protocols and ports: tcp:12345,54321
Please note: H2O’s documentation misses the port 54321, which is used by H2O Flow UI. Needs to open this port. Otherwise you can not access H2O Flow UI.

2. Create a new VM instance
Name: h2otest
Zone: us-east1-c
Cores: 8 vCPU
Memory: 52 GB
Boot disk: 64 GB, Ubuntu 16.04
Service account: use your GCP service account
Network tags: h2o

3. Install and configure Docker
Logon to h2otest VM instance and su to root user.
Create a script,

apt-get -y update
apt-get -y --no-install-recommends install \
  curl \
  apt-utils \
  python-software-properties \

add-apt-repository -y "deb [arch=amd64] $(lsb_release -cs) stable"
curl -fsSL | apt-key add -

apt-get update
apt-get install -y docker-ce

Run the script

root@h2otest:~# chmod u+x
root@h2otest:~# ./

Created required directories.

mkdir ~/tmp
mkdir ~/log
mkdir ~/data
mkdir ~/scripts
mkdir ~/license
mkdir ~/demo
mkdir -p ~/jupyter/notebooks

Adding current user to Docker container is optional. I did anyway.

root@h2otest:~# usermod -aG docker weidong.zhou
root@h2otest:~# id weidong.zhou
uid=1001(weidong.zhou) gid=1002(weidong.zhou) groups=1002(weidong.zhou),4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),109(netdev),110(lxd),1000(ubuntu),1001(google-sudoers),999(docker)

4. Download and Load H2O AI Docker Image
Download the docker image.

root@h2otest:~# wget
--2018-04-18 16:43:31--
Resolving (
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2167098485 (2.0G) [application/gzip]
Saving to: ‘driverless-ai-docker-runtime-latest-release.gz’

driverless-ai-docker-runtime-latest-release.g 100%[==============================================================================================>]   2.02G  26.2MB/s    in 94s     

2018-04-18 16:45:05 (22.0 MB/s) - ‘driverless-ai-docker-runtime-latest-release.gz’ saved [2167098485/2167098485]

Load Docker image.

root@h2otest:~# docker load < driverless-ai-docker-runtime-latest-release.gz 9d3227c1793b: Loading layer [==================================================>]  121.3MB/121.3MB
a1a54d352248: Loading layer [==================================================>]  15.87kB/15.87kB
. . . .
ed86b627a562: Loading layer [==================================================>]  1.536kB/1.536kB
7d38d6d61cec: Loading layer [==================================================>]  1.536kB/1.536kB
de539994349c: Loading layer [==================================================>]  3.584kB/3.584kB
8e992954a9eb: Loading layer [==================================================>]  3.584kB/3.584kB
ff71b3e896ef: Loading layer [==================================================>]  8.192kB/8.192kB
Loaded image: opsh2oai/h2oai-runtime:latest
root@h2otest:~# docker image ls
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
opsh2oai/h2oai-runtime   latest              dff251c69407        12 days ago         5.46GB

5. Start H2O AI
Create a startup script, Please note: H2O document has an error, missing port 54321 for H2O Flow UI. Then run the script

root@h2otest:~# cat 

docker run \
    --rm \
    -u `id -u`:`id -g` \
    -p 12345:12345 \
    -p 54321:54321 \
    -p 8888:8888 \
    -p 9090:9090 \
    -v `pwd`/data:/data \
    -v `pwd`/log:/log \
    -v `pwd`/license:/license \
    -v `pwd`/tmp:/tmp \

root@h2otest:~# chmod a+x
root@h2otest:~# ./ 
Welcome to's Driverless AI
     version: 1.0.30

- Put data in the volume mounted at /data
- Logs are written to the volume mounted at /log/20180419-094058
- Connect to Driverless AI on port 12345 inside the container
- Connect to Jupyter notebook on port 8888 inside the container

Also create a script, to quickly ssh to the H2O AI container without knowing the container id first.

root@h2otest:~# vi
root@h2otest:~# cat 

CONTAINER_ID=`docker ps|grep h2oai-runtime|awk '{print $1}'`
docker exec -it $CONTAINER_ID bash
root@h2otest:~# chmod a+x 
root@h2otest:~# ./

6. Use H2O AI
Get the external IP of the H2O VM. In my case, it is Then access URL at You will see H2O AI evaluation agreement screen. Click I Agree to these Terms to continue.
The Logon screen shows up. I use the following information to sign in.
Username: h2o
Password: h2o
Actually it doesn’t matter what you input. You can use any username to login. It just didn’t check. I know it has the feature to integrate with LDAP. I just didn’t give a try this time.

After sign in, it will ask you to input license information. Fill out your information at and you will receive a 21-day trail license in the email.

The first screen shows up is the Datasets overview. You can add dataset from one of three sources: File System, Hadoop File System, Amazon S3. To use some sample data, I chose Amazon S3‘s file.

For every dataset, there are two kinds of Actions: Visualize or Predict
Click Visualize. Many interesting visualization charts show up.

If click Predict, Experiment screen shows up. Choose a Target Column. In my example, I chose ArrTime column. Click Launch Experiment

Once finished, it will show a list of options. For example, I clicked Interpret this model on original features

For people familiar with H2O Flow UI, H2O AI still has this UI, just click H2O-3 from the menu. The H2O Flow UI will show up.

In general, H2O AI has an impressive UI and tons of new stuff. No wonder it is not a free version. In the next blog, I am going to discuss how to configure python client to access H2O AI.