Missing Classpath Issue during Sqoop Import

Recently I run into an issue when using Sqoop import with the following error messages:

17/02/22 10:44:17 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1487780389745_0001/
17/02/22 10:44:17 INFO mapreduce.Job: Running job: job_1487780389745_0001
17/02/22 10:44:27 INFO mapreduce.Job: Job job_1487780389745_0001 running in uber mode : false
17/02/22 10:44:27 INFO mapreduce.Job:  map 0% reduce 0%
17/02/22 10:44:38 INFO mapreduce.Job: Task Id : attempt_1487780389745_0001_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at org.apache.sqoop.manager.oracle.OracleUtils.escapeIdentifier(OracleUtils.java:36)
	at org.apache.sqoop.manager.oracle.OraOopOracleQueries.getTableColumns(OraOopOracleQueries.java:683)
	at org.apache.sqoop.manager.oracle.OraOopOracleQueries.getTableColumns(OraOopOracleQueries.java:767)
	at org.apache.sqoop.manager.oracle.OraOopDBRecordReader.getSelectQuery(OraOopDBRecordReader.java:195)
	at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
	at org.apache.sqoop.manager.oracle.OraOopDBRecordReader.nextKeyValue(OraOopDBRecordReader.java:356)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

I used sqoop import many times and have never run into this issue. This made me wondering what’s going.
I checked out the CDH version. The version is 5.8.0 and I used to use CDH 5.7 a lot.

[root@quickstart lib]# hadoop version
Hadoop 2.6.0-cdh5.8.0
Subversion http://github.com/cloudera/hadoop -r 57e7b8556919574d517e874abfb7ebe31a366c2b
Compiled by jenkins on 2016-06-16T19:38Z
Compiled with protoc 2.5.0
From source with checksum 9e99ecd28376acfd5f78c325dd939fed
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.8.0.jar

Check out the sqoop version and it is 1.4.6

[root@quickstart lib]# sqoop version
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/02/22 13:37:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
Sqoop 1.4.6-cdh5.8.0
git commit id 
Compiled by jenkins on Thu Jun 16 12:25:21 PDT 2016

The Cloudera Quickstart VM is using CDH 5.8.0 and this is the first time I am touching CDH 5.8. It might relate to CDH 5.8. Did some research and found someone had the exact issue as I did. It is a sqoop bug: SQOOP-2999Sqoop ClassNotFoundException (org.apache.commons.lang3.StringUtils) is thrown when executing Oracle direct import map task.

It looks like the new version of Sqoop in the CDH 5.8.0 used a new class OracleUtils that has a dependency on org.apache.commons.lang3.StringUtils. The jar that contains the class is not on the classpath that Sqoop passes to the mappers. Therefore the exception thrown at runtime. Unfortunately the fixed is in CDH 5.8.2 as indicated in the Cloudera’s release note and the first sqoop bug in the note is this one: SQOOP-2999.

At this moment, I don’t want to go through the upgrade or patching of CDH. Tried to see whether there is any workaround on the internet. Could not find anything related to this one. So I will use my way to fix this issue.

First I need to find out which jar file containing this class. There is one post discussing about this class and the related jar file: commons-lang3-3.1.jar. Luckily I do have a commons-lang3-3.1.jar file in the system.

[root@quickstart lib]# locate commons-lang3-3.1.jar

Next I need to figure out how to add this jar to the class path for Sqoop. Majority of Hadoop environment variables setting can be found at /etc/default directory and I did found a file called sqoop2-server.

[root@quickstart default]# cat sqoop2-server
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#     http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.


Uncomment AUX_CLASSPATH= line and change to the following
After restarted Sqoop service, no luck. Still not working.

I could change /etc/sqoop2/conf/setenv.sh file and add this jar file to CLASSPATH. But there is no CLASSPATH used in this script. I also don’t want to change something and completely forgot later on. So I tried the option to add CLASSPATH on Cloudera Manager. But the tricky question is to which parameter I can add.

As Sqoop is managed by YARN, I tried to the add the class in the YARN Application Classpath or yarn.application.classpath.
The result is shown below.
After the change, redeploy the client configuration and bounce the CDH cluster. Rerun the Sqoop job and it completed without error. Please note: this is just temporary workaround solution. The right approach should still be upgrade to the new version of CDH.