In my last blog, I compared Sparking Water and H2O. Before I made Sparking-shell work, I run into a lot of issues. One of annoying errors was runtime exception: Cloud size under xx. Searched internet and found many people have the similar problems. There are many recommendations, ranging from downloading the latest and matching version, to set to certain parameters during startup. Unfortunately none of them were working for me. But finally I figured out the issue and would like to share my solution in this blog.
After I downloaded Sparking Water, unzipped the file, and run sparking-shell command as shown from http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/2/index.html. It looked good initially.
[sparkling-water-2.2.2]$ bin/sparkling-shell --conf "spark.executor.memory=1g" ----- Spark master (MASTER) : local[*] Spark home (SPARK_HOME) : /opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2 H2O build version : 3.14.0.7 (weierstrass) Spark build version : 2.2.0 Scala version : 2.11 ---- Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://10.132.110.145:4040 Spark context available as 'sc' (master = yarn, app id = application_3608740912046_1343). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144) Type in expressions to have them evaluated. Type :help for more information. scala>
But when I run command val h2oContext = H2OContext.getOrCreate(spark), it gave me many errors as follows:
scala> import org.apache.spark.h2o._ import org.apache.spark.h2o._ scala> val h2oContext = H2OContext.getOrCreate(spark) 17/11/05 10:07:48 WARN internal.InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000 17/11/05 10:07:48 WARN internal.InternalH2OBackend: Due to non-deterministic behavior of Spark broadcast-based joins We recommend to disable them by configuring `spark.sql.autoBroadcastJoinThreshold` variable to value `-1`: sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold=-1") 17/11/05 10:07:48 WARN internal.InternalH2OBackend: The property 'spark.scheduler.minRegisteredResourcesRatio' is not specified! We recommend to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1` 17/11/05 10:07:48 WARN internal.InternalH2OBackend: Unsupported options spark.dynamicAllocation.enabled detected! 17/11/05 10:07:48 WARN internal.InternalH2OBackend: The application is going down, since the parameter (spark.ext.h2o.fail.on.unsupported.spark.param,true) is true! If you would like to skip the fail call, please, specify the value of the parameter to false. java.lang.IllegalArgumentException: Unsupported argument: (spark.dynamicAllocation.enabled,true) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$checkUnsupportedSparkOptions$1.apply(InternalBackendUtils.scala:46) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$$anonfun$checkUnsupportedSparkOptions$1.apply(InternalBackendUtils.scala:38) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.checkUnsupportedSparkOptions(InternalBackendUtils.scala:38) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.checkUnsupportedSparkOptions(InternalH2OBackend.scala:30) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.checkAndUpdateConf(InternalH2OBackend.scala:60) at org.apache.spark.h2o.H2OContext.<init>(H2OContext.scala:90) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:355) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:383) ... 50 elided
You can see I need to pass in more parameters when starting sparking-shell. Change the parameters as follows:
bin/sparkling-shell \ --master yarn \ --conf spark.executor.memory=1g \ --conf spark.scheduler.maxRegisteredResourcesWaitingTime=1000000 \ --conf spark.ext.h2o.fail.on.unsupported.spark.param=false \ --conf spark.dynamicAllocation.enabled=false \ --conf spark.sql.autoBroadcastJoinThreshold=-1 \ --conf spark.locality.wait=30000 \ --conf spark.scheduler.minRegisteredResourcesRatio=1
Ok, this time it looked better, at least warning messages disappeared. But got error message java.lang.RuntimeException: Cloud size under 2.
[sparkling-water-2.2.2]$ bin/sparkling-shell \ > --master yarn \ > --conf spark.executor.memory=1g \ > --conf spark.scheduler.maxRegisteredResourcesWaitingTime=1000000 \ > --conf spark.ext.h2o.fail.on.unsupported.spark.param=false \ > --conf spark.dynamicAllocation.enabled=false \ > --conf spark.sql.autoBroadcastJoinThreshold=-1 \ > --conf spark.locality.wait=30000 \ > --conf spark.scheduler.minRegisteredResourcesRatio=1 ----- Spark master (MASTER) : yarn Spark home (SPARK_HOME) : /opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2 H2O build version : 3.14.0.7 (weierstrass) Spark build version : 2.2.0 Scala version : 2.11 ---- Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://10.132.110.145:4040 Spark context available as 'sc' (master = yarn, app id = application_3608740912046_1344). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0.cloudera1 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144) Type in expressions to have them evaluated. Type :help for more information. scala> import org.apache.spark.h2o._ import org.apache.spark.h2o._ scala> val h2oContext = H2OContext.getOrCreate(spark) java.lang.RuntimeException: Cloud size under 2 at water.H2O.waitForCloudSize(H2O.java:1689) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:117) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:121) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:355) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:383) ... 50 elided
Not big deal. Anyway I didn’t specify the number of executors used and executor memory. It may complain about the size of the H2O cluster is too small. Add the following three parameters.
–conf spark.executor.instances=12 \
–conf spark.executor.memory=10g \
–conf spark.driver.memory=8g \
Rerun the whole thing got the same error with the size number changing to 12. It did not look right to me. Then I check out the H2O error logfile and found tons of messages as follows:
11-06 10:23:16.355 10.132.110.145:54325 30452 #09:54321 ERRR: Got IO error when sending batch UDP bytes: java.net.ConnectException: Connection refused 11-06 10:23:16.790 10.132.110.145:54325 30452 #06:54321 ERRR: Got IO error when sending batch UDP bytes: java.net.ConnectException: Connection refused
It looks like Sparking Water can not connect to Spark cluster. After some investigation, I then realized I installed and run sparking-shell from edge node on the BDA. If the H2O cluster was running inside a Spark application, the communication of Spark cluster on BDA is through BDA’s private network, or InfiniteBand network. Edge node can not directly communicate to IB network on BDA. With this assumption in mind, I installed and run Sparking Water on one of BDA nodes, it worked perfectly without any issue. Problem solved!
Pingback: Access Sparkling Water via R Studio | My Big Data World
Pingback: Running H2O Cluster in Background and at Specific Port Number | My Big Data World
Pingback: Weird Ref-count mismatch Message from H2O | My Big Data World
Pingback: Use Python for H2O | My Big Data World
Pingback: Parquet File Can not Be Read in Sparkling Water H2O | My Big Data World
Hi Could you elaborate the process what exactly you have done to solve the issue ?
Run Sparkling-water software under the same subnet as your spark cluster.