Running H2O Cluster in Background and at Specific Port Number

In a few of my previous blog, I discussed H2O vs Sparkling Water, Sparking Water Shell: Cloud size under 12 Exception, and Access Sparkling Water via R Studio.

When using H2O Sparkling Water, there are two common issues. The first one is that the default port number is 54321. However, if the H2O cluster has been bounced multiple times, the assigned port could be assigned to a different port number, actually next available port number, 54323. If the cluster is used by many data analysts, it is inconvenient to inform all of them every time the port number is changed. You want users remember only one port number.

The second issue is that the sparkling shell session can not be running in the background. If close the putty session running the sparkling shell, the H2O cluster is terminated.

This blog discusses the solution to work around the above two issues.

For port number, there are actually two parameters related: spark.ext.h2o.client.port.base and spark.ext.h2o.node.port.base. The spark.ext.h2o.client.port.base is the port number for H2O UI while spark.ext.h2o.node.port.base is the port used by H2O cluster internally for the communication among H2O nodes. Make sure these two port numbers should be different. Having these two are the same will cause issue. I also add spark.ext.h2o.cloud.name for the name of my H2O cluster.

I created two separate scripts: run_sparkling_shell.sh for the running command for sparkling shell and sparkling-shell-init.scala for starting up commands for H2O cluster in scala.

[wzhou@enkbda1node05 ~]$ cat run_sparkling_shell.sh
bin/sparkling-shell \
--master yarn \
--conf spark.ext.h2o.cloud.name=WeidongH2O-Cluster \
--conf spark.ext.h2o.client.port.base=26000 \
--conf spark.ext.h2o.node.port.base=26005 \
--conf spark.executor.instances=10 \
--conf spark.executor.memory=12g \
--conf spark.driver.memory=8g \
--conf spark.executor.cores=4 \
--conf spark.yarn.executor.memoryOverhead=4g \
--conf spark.yarn.driver.memoryOverhead=4g \
--conf spark.scheduler.maxRegisteredResourcesWaitingTime=1000000 \
--conf spark.ext.h2o.fail.on.unsupported.spark.param=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.sql.autoBroadcastJoinThreshold=-1 \
--conf spark.locality.wait=30000 \
--conf spark.yarn.queue=WZTestPool \
--conf spark.scheduler.minRegisteredResourcesRatio=1 \
-i  sparkling-shell-init.scala

The code for sparkling-shell-init.scala.

import org.apache.spark.h2o._
val h2oContext = H2OContext.getOrCreate(spark)
import h2oContext._

To execute sparkling-shell in background, my first try was to use nohup. It didn’t work. When calling sparkling-shell-init.scala script, it automatically adds :quit command at the end and terminate H2O cluster.

When I did work on Exadata, I used to use screen command a lot. It is a very useful tool for protecting long running critical job execution, like patching/upgrade and import/export. Therefore, I use the same trick in screen to help me to get around the background issue. Here are the steps.

1. Start screen session
Use screen -ls command to check whether I have screen available.

[wzhou@enkbda1node05 ~]$ screen -ls
No Sockets found in /var/run/screen/S-wzhou.

Start a screen session.
[wzhou@enkbda1node05 ~]$ screen

2. Start H2O Cluster
Run the script to start H2O cluster.

[wzhou@enkbda1node05 ~]$ ./run_sparkling_shell.sh

-----
  Spark master (MASTER)     : yarn
  Spark home   (SPARK_HOME) :
  H2O build version         : 3.14.0.7 (weierstrass)
  Spark build version       : 2.2.0
  Scala version             : 2.11
----

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/anaconda2/lib/python2.7/site-packages/pyspark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/12/09 10:12:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/09 10:12:58 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
17/12/09 10:12:59 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://192.168.10.14:4040
Spark context available as 'sc' (master = yarn, app id = application_2567590118914_1007).
Spark session available as 'spark'.
Loading sparkling-shell-init.scala...
import org.apache.spark.h2o._
h2oContext: org.apache.spark.h2o.H2OContext =

Sparkling Water Context:
 * H2O name: WeidongH2O-Cluster
 * cluster size: 10
 * list of used nodes:
  (executorId, host, port)
  ------------------------
  (2,enkbda1node08.enkitec.com,26005)
  (4,enkbda1node12.enkitec.com,26005)
  (9,enkbda1node17.enkitec.com,26005)
  (5,enkbda1node13.enkitec.com,26005)
  (7,enkbda1node15.enkitec.com,26005)
  (1,enkbda1node09.enkitec.com,26005)
  (8,enkbda1node04.enkitec.com,26005)
  (6,enkbda1node10.enkitec.com,26005)
  (3,enkbda1node11.enkitec.com,26005)
  (10,enkbda1node05.enkitec.com,26005)
  ------------------------

  Open H2O Flow in browser: http://192.168.10.14:26000 (CMD + click in Mac OSX)


import h2oContext._

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.
scala>

3. Verify H2O Cluster
Right now, close the session. Open a new session. Check out the existing screen session.

[wzhou@enkbda1node05 ~]$ screen -ls
There is a screen on:
        19044.pts-0.enkbda1node05       (Detached)
1 Socket in /var/run/screen/S-wzhou.

Now, attach to the existing session
[wzhou@enkbda1node05 ~]$ screen -x 19044

You should see the original session that was running sparking shell. The UI is still working as expected.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s