Use Jupyter Notebook to Access H2O Driverless AI

I discussed H2O Driverless AI installation in my last blog, Install H2O Driverless AI on Google Cloud Platform. H2O AI docker image contains the deployment of Jupyter Notebook. Once H2O AI starts, we can use Jupyter notebook directly. In this blog, I am going to discuss how to use Jupyter Notebook to connect to H2O AI.

To login Jupyter Notebook, I need to know the login token. It is usually shown in the console output at the ‎time starting Jupyter. However If I check out the Docker logs command, it shows the output from H2O AI.

root@h2otest:~# docker ps
CONTAINER ID        IMAGE                    COMMAND             CREATED             STATUS              PORTS                                                                                                NAMES
5b803337e8b5        opsh2oai/h2oai-runtime   "./run.sh"          About an hour ago   Up About an hour    0.0.0.0:8888->8888/tcp, 0.0.0.0:9090->9090/tcp, 0.0.0.0:12345->12345/tcp, 0.0.0.0:54321->54321/tcp   h2oai

root@h2otest:~# docker logs h2oai
---------------------------------
Welcome to H2O.ai's Driverless AI
---------------------------------
     version: 1.0.30

- Put data in the volume mounted at /data
- Logs are written to the volume mounted at /log/20180424-140930
- Connect to Driverless AI on port 12345 inside the container
- Connect to Jupyter notebook on port 8888 inside the container

But the output at least tells me the logfile location. SSH to the container and check out Jupyter log.

root@h2otest:~# ./ssh_h2oai.sh 
root@5b803337e8b5:/# cd /log/20180424-140930
root@5b803337e8b5:/log/20180424-140930# ls -l
total 84
-rw-r--r-- 1 root root 61190 Apr 24 14:53 h2oai.log
-rw-r--r-- 1 root root 14340 Apr 24 15:14 h2o.log
-rw-r--r-- 1 root root  2700 Apr 24 14:58 jupyter.log
-rw-r--r-- 1 root root    52 Apr 24 14:09 procsy.log
root@5b803337e8b5:/log/20180424-140930# cat jupyter.log
config:
    /jupyter/.jupyter
    /h2oai_env/etc/jupyter
    /usr/local/etc/jupyter
    /etc/jupyter
data:
    /jupyter/.local/share/jupyter
    /h2oai_env/share/jupyter
    /usr/local/share/jupyter
    /usr/share/jupyter
runtime:
    /jupyter/.local/share/jupyter/runtime
[I 14:10:01.512 NotebookApp] Writing notebook server cookie secret to /jupyter/.local/share/jupyter/runtime/notebook_cookie_secret
[W 14:10:04.062 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 14:10:04.224 NotebookApp] Serving notebooks from local directory: /jupyter
[I 14:10:04.224 NotebookApp] 0 active kernels
[I 14:10:04.224 NotebookApp] The Jupyter Notebook is running at:
[I 14:10:04.224 NotebookApp] http://[all ip addresses on your system]:8888/?token=f1b8f6dc7fb0aab7caec278a2bf971249b765140e4b3b338
[I 14:10:04.224 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 14:10:04.224 NotebookApp] 
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=f1b8f6dc7fb0aab7caec278a2bf971249b765140e4b3b338
[W 14:19:26.189 NotebookApp] 401 POST /login?next=%2Ftree%3F (10.142.0.2) 834.30ms referer=http://10.142.0.2:8888/login?next=%2Ftree%3F
[I 14:20:15.706 NotebookApp] 302 POST /login?next=%2Ftree%3F (10.142.0.2) 1.36ms

Although this approach worked majority of time, I did run into issue for a few times that Jupyter login said the token is invalid. After some research, I found out another way that guarantees to get the correct token. It’s a json file under /jupyter/.local/share/jupyter/runtime directory. The filename nbserver-xx.json changes each time H2O AI starts.

root@5b803337e8b5:/# ls -l /jupyter/.local/share/jupyter/runtime
total 12
-rw-r--r-T 1 root root  263 Apr 24 14:24 kernel-b225302b-f2d9-47ac-b99c-f1f55eb54021.json
-rw-r--r-- 1 root root  245 Apr 24 14:10 nbserver-51.json
-rw------- 1 root root 1386 Apr 24 14:10 notebook_cookie_secret
root@5b803337e8b5:/# cat /jupyter/.local/share/jupyter/runtime/nbserver-51.json
{
  "base_url": "/",
  "hostname": "localhost",
  "notebook_dir": "/jupyter",
  "password": false,
  "pid": 51,
  "port": 8888,
  "secure": false,
  "token": "f1b8f6dc7fb0aab7caec278a2bf971249b765140e4b3b338",
  "url": "http://localhost:8888/"

Based on that, I created a script to get the token without ssh to the container.

root@h2otest:~# cat get_jy_token.sh 
#!/bin/bash

JSON_FILENAME=`docker exec -it h2oai ls -l /jupyter/.local/share/jupyter/runtime | grep nbserver |awk '{print $9}' | tr -d "\r"`
#echo $JSON_FILENAME
docker exec -it h2oai grep token /jupyter/.local/share/jupyter/runtime/$JSON_FILENAME

Run the script and got the token.

root@h2otest:~# ./get_jy_token.sh 
  "token": "f1b8f6dc7fb0aab7caec278a2bf971249b765140e4b3b338",

Ok, let me go to the login screen and input the token.

The Jupyter screen shows up.

There is two sample notebooks installed by default. I tried to make it working. However the sample data in docker image does not seem working. There is no detail API document available at this moment. So I just did a few basic stuff to prove it work. The following is the code I input in the notebook.

import h2oai_client
import numpy as np
import pandas as pd
# import h2o
import requests
import math
from h2oai_client import Client, ModelParameters, InterpretParameters

address = 'http://35.229.57.147:12345'
username = 'h2o'
password = 'h2o'
h2oai = Client(address = address, username = username, password = password)

stock_path = '/data/stock_price.csv'
stockData = h2oai.create_dataset_sync(stock_path)
stockData.dump()


I went back to H2O AI UI and found out three more stock_price dataset were created by my Jupyter notebook.

So each time I run the command h2oai.create_dataset_sync(stock_path), it creates a new dataset. The dataset with same path is not going to eliminated. To avoid duplication, I have to manually delete the duplicated one from UI. It’s not a big deal. Just need to remember to cleanup the duplicated dataset if run the same notebook multiple times. Another way to get around this issue is to use different login name. As different login name sees the datasets only belong to the current user, you could have a login name for production use and a different login name for development or testing. You can safely remove the duplicated dataset in the development username without worrying about removing the wrong one.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s