Install H2O Driverless AI on Google Cloud Platform

I wrote many blogs about H2O and H2O Sparkling Water in the past. Today I am going to discuss the installation of H2O Driverless AI (H2O AI). H2O AI targets machine learning, especially deep learning. While H2O focuses more on algorithm, models, and predication, H2O AI automates some of the most difficult data science and ML workflows to offer automatic visualizations and Machine Learning Interpretability (MLI). Here is the architecture of H2O AI.

There are some difference in different installation environment. To check out different environment, use H2O Driverless AI installation document at http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/installing.html.

This blog discusses the topic only related to Google Cloud. Here are a few important things to know before the installation.
1. It requires a lot of memory and CPUs, if possible use GPU. I uses 8 CPUs and 52 GB memory on Google cloud. If you can use GPU, add GPU option. For me, I don’t have the access to GPU in my account.
2. The OS is based on Ubuntu 16.04 and I believe it is the minimum version supported.
3. OS disk size should be >= 64GB. I used 64GB.
4. Instead of installation software package, H2O AI uses Docker image. Yes, Docker needs to be installed first.
5. If plan to use python to connect the H2O AI, the supported version of python is v3.6.

Ok, here is the installation procedure on GCP:
1. Create a new firewall rule
Click VPC Network -> Firewall Rules -> Create Firewall Rule
Input the following:
Name : h2oai
Description: The firewall rule for H2O driverless AI
Target tags: h2o
Source IP ranges: 0.0.0.0/0
Protocols and ports: tcp:12345,54321
Please note: H2O’s documentation misses the port 54321, which is used by H2O Flow UI. Needs to open this port. Otherwise you can not access H2O Flow UI.


2. Create a new VM instance
Name: h2otest
Zone: us-east1-c
Cores: 8 vCPU
Memory: 52 GB
Boot disk: 64 GB, Ubuntu 16.04
Service account: use your GCP service account
Network tags: h2o

3. Install and configure Docker
Logon to h2otest VM instance and su to root user.
Create a script, build.sh

apt-get -y update
apt-get -y --no-install-recommends install \
  curl \
  apt-utils \
  python-software-properties \
  software-properties-common

add-apt-repository -y "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

apt-get update
apt-get install -y docker-ce

Run the script

root@h2otest:~# chmod u+x build.sh
root@h2otest:~# ./build.sh

Created required directories.

mkdir ~/tmp
mkdir ~/log
mkdir ~/data
mkdir ~/scripts
mkdir ~/license
mkdir ~/demo
mkdir -p ~/jupyter/notebooks

Adding current user to Docker container is optional. I did anyway.

root@h2otest:~# usermod -aG docker weidong.zhou
root@h2otest:~# id weidong.zhou
uid=1001(weidong.zhou) gid=1002(weidong.zhou) groups=1002(weidong.zhou),4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),109(netdev),110(lxd),1000(ubuntu),1001(google-sudoers),999(docker)

4. Download and Load H2O AI Docker Image
Download the docker image.

root@h2otest:~# wget https://s3-us-west-2.amazonaws.com/h2o-internal-release/docker/driverless-ai-docker-runtime-latest-release.gz
--2018-04-18 16:43:31--  https://s3-us-west-2.amazonaws.com/h2o-internal-release/docker/driverless-ai-docker-runtime-latest-release.gz
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.209.8
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.209.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2167098485 (2.0G) [application/gzip]
Saving to: ‘driverless-ai-docker-runtime-latest-release.gz’

driverless-ai-docker-runtime-latest-release.g 100%[==============================================================================================>]   2.02G  26.2MB/s    in 94s     

2018-04-18 16:45:05 (22.0 MB/s) - ‘driverless-ai-docker-runtime-latest-release.gz’ saved [2167098485/2167098485]

Load Docker image.

root@h2otest:~# docker load < driverless-ai-docker-runtime-latest-release.gz 9d3227c1793b: Loading layer [==================================================>]  121.3MB/121.3MB
a1a54d352248: Loading layer [==================================================>]  15.87kB/15.87kB
. . . .
ed86b627a562: Loading layer [==================================================>]  1.536kB/1.536kB
7d38d6d61cec: Loading layer [==================================================>]  1.536kB/1.536kB
de539994349c: Loading layer [==================================================>]  3.584kB/3.584kB
8e992954a9eb: Loading layer [==================================================>]  3.584kB/3.584kB
ff71b3e896ef: Loading layer [==================================================>]  8.192kB/8.192kB
Loaded image: opsh2oai/h2oai-runtime:latest
root@h2otest:~# docker image ls
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
opsh2oai/h2oai-runtime   latest              dff251c69407        12 days ago         5.46GB

5. Start H2O AI
Create a startup script, start_h2oai.sh. Please note: H2O document has an error, missing port 54321 for H2O Flow UI. Then run the script

root@h2otest:~# cat start_h2oai.sh 
#!/bin/bash

docker run \
    --rm \
    -u `id -u`:`id -g` \
    -p 12345:12345 \
    -p 54321:54321 \
    -p 8888:8888 \
    -p 9090:9090 \
    -v `pwd`/data:/data \
    -v `pwd`/log:/log \
    -v `pwd`/license:/license \
    -v `pwd`/tmp:/tmp \
    opsh2oai/h2oai-runtime

root@h2otest:~# chmod a+x start_h2oai.sh
root@h2otest:~# ./start_h2oai.sh 
---------------------------------
Welcome to H2O.ai's Driverless AI
---------------------------------
     version: 1.0.30

- Put data in the volume mounted at /data
- Logs are written to the volume mounted at /log/20180419-094058
- Connect to Driverless AI on port 12345 inside the container
- Connect to Jupyter notebook on port 8888 inside the container

Also create a script, ssh_h2oai.sh to quickly ssh to the H2O AI container without knowing the container id first.

root@h2otest:~# vi ssh_h2oai.sh
root@h2otest:~# cat ssh_h2oai.sh 
#!/bin/bash

CONTAINER_ID=`docker ps|grep h2oai-runtime|awk '{print $1}'`
docker exec -it $CONTAINER_ID bash
root@h2otest:~# chmod a+x ssh_h2oai.sh 
root@h2otest:~# ./ssh_h2oai.sh
root@09bd138f4f41:/#

6. Use H2O AI
Get the external IP of the H2O VM. In my case, it is 35.196.90.114. Then access URL at http://35.196.90.114:12345/. You will see H2O AI evaluation agreement screen. Click I Agree to these Terms to continue.
The Logon screen shows up. I use the following information to sign in.
Username: h2o
Password: h2o
Actually it doesn’t matter what you input. You can use any username to login. It just didn’t check. I know it has the feature to integrate with LDAP. I just didn’t give a try this time.

After sign in, it will ask you to input license information. Fill out your information at https://www.h2o.ai/try-driverless-ai/ and you will receive a 21-day trail license in the email.

The first screen shows up is the Datasets overview. You can add dataset from one of three sources: File System, Hadoop File System, Amazon S3. To use some sample data, I chose Amazon S3‘s AirlinesTest.csv.zip file.



For every dataset, there are two kinds of Actions: Visualize or Predict
Click Visualize. Many interesting visualization charts show up.



If click Predict, Experiment screen shows up. Choose a Target Column. In my example, I chose ArrTime column. Click Launch Experiment

Once finished, it will show a list of options. For example, I clicked Interpret this model on original features


For people familiar with H2O Flow UI, H2O AI still has this UI, just click H2O-3 from the menu. The H2O Flow UI will show up.

In general, H2O AI has an impressive UI and tons of new stuff. No wonder it is not a free version. In the next blog, I am going to discuss how to configure python client to access H2O AI.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s