Setting external jars to hadoop classpath
NickName:mnm Ask DateTime:2014-11-05T10:35:36

Setting external jars to hadoop classpath

I am trying to set external jars to hadoop classpath but no luck so far.

I have the following setup

$ hadoop version
Hadoop 2.0.6-alpha Subversion https://git-wip-us.apache.org/repos/asf/bigtop.git -r ca4c88898f95aaab3fd85b5e9c194ffd647c2109 Compiled by jenkins on 2013-10-31T07:55Z From source with checksum 95e88b2a9589fa69d6d5c1dbd48d4e This command was run using /usr/lib/hadoop/hadoop-common-2.0.6-alpha.jar

Classpath

$ echo $HADOOP_CLASSPATH
/home/tom/workspace/libs/opencsv-2.3.jar

I am able see the above HADOOP_CLASSPATH has been picked up by hadoop

$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/:/usr/lib/hadoop/.//:/home/tom/workspace/libs/opencsv-2.3.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/:/usr/lib/hadoop-hdfs/.//:/usr/lib/hadoop-yarn/lib/:/usr/lib/hadoop-yarn/.//:/usr/lib/hadoop-mapreduce/lib/:/usr/lib/hadoop-mapreduce/.//

Command

$ sudo hadoop jar FlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result

I tried with -libjars option as well

$ sudo hadoop jar FlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result -libjars /home/tom/workspace/libs/opencsv-2.3.jar

The stacktrace

14/11/04 16:43:23 INFO mapreduce.Job: Running job: job_1415115532989_0001 14/11/04 16:43:55 INFO mapreduce.Job: Job job_1415115532989_0001 running in uber mode : false 14/11/04 16:43:56 INFO mapreduce.Job: map 0% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: map 50% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: Task Id : attempt_1415115532989_0001_m_000001_0, Status : FAILED Error: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:19) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:10) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

Any help is highly appreciated.

Copyright Notice:Content Author:「mnm」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/26748811/setting-external-jars-to-hadoop-classpath

Answers
blackSmith 2014-11-05T05:54:56

Your external jar is missing on the node running maps. You have to add it to the cache to make it available. Try :\n\nDistributedCache.addFileToClassPath(new Path(\"pathToJar\"), conf);\n\n\nNot sure in which version DistributedCache was deprecated, but from Hadoop 2.2.0 onward you can use :\n\njob.addFileToClassPath(new Path(\"pathToJar\")); \n",


Isaiah4110 2016-03-27T00:20:55

If you are adding the external JAR to the Hadoop classpath then its better to copy your JAR to one of the existing directories that hadoop is looking at. On the command line run the command \"hadoop classpath\" and then find a suitable folder and copy your jar file to that location and hadoop will pick up the dependencies from there. This wont work with CloudEra etc as you may not have read/write rights to copy files to the hadoop classpath folders.\n\nLooks like you tried the LIBJARs option as well, did you edit your driver class to implement the TOOL interface? First make sure that you edit your driver class as shown below:\n\n public class myDriverClass extends Configured implements Tool {\n\n public static void main(String[] args) throws Exception {\n int res = ToolRunner.run(new Configuration(), new myDriverClass(), args);\n System.exit(res);\n }\n\n public int run(String[] args) throws Exception\n {\n\n // Configuration processed by ToolRunner \n Configuration conf = getConf();\n Job job = new Job(conf, \"My Job\");\n\n ...\n ...\n\n return job.waitForCompletion(true) ? 0 : 1;\n }\n }\n\n\nNow edit your \"hadoop jar\" command as shown below: \n\nhadoop jar YourApplication.jar [myDriverClass] args -libjars path/to/jar/file\n\n\nNow lets understand what happens underneath. Basically we are handling the new command line arguments by implementing the TOOL Interface. ToolRunner is used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool. \n\nWithin our Main() we are calling ToolRunner.run(new Configuration(), new myDriverClass(), args) - this runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. It uses the given Configuration, or builds one if it's null and then sets the Tool's configuration with the possibly modified version of the conf.\n\nNow within the run method, when we call getConf() we get the modified version of the Configuration. So make sure that you have the below line in your code. If you implement everything else and still make use of Configuration conf = new Configuration(), nothing would work.\n\nConfiguration conf = getConf();\n",


nitinm 2016-06-02T10:54:06

I tried setting the opencsv jar in the hadoop classpath but it didn't work.We need to explicitly copy the jar in the classpath for this to work.It did worked for me.\nBelow are the steps i followed:\n\nI have done this in a HDP CLuster.I ahave copied my opencsv jar in hbase libs and exported it before running my jar\n\nCopy ExternalJars to HDP LIBS:\n\nTo Run Open CSV Jar:\n1.Copy the opencsv jar in directory /usr/hdp/2.2.9.1-11/hbase/lib/ and /usr/hdp/2.2.9.1-11/hadoop-yarn/lib\n\n**sudo cp /home/sshuser/Amedisys/lib/opencsv-3.7.jar /usr/hdp/2.2.9.1-11/hbase/lib/**\n\n\n2.Give the file permissions using \n sudo chmod 777 opencsv-3.7.jar\n3.List Files\n ls -lrt\n\n4.exporthadoop classpath:hbase classpath\n\n5.Now run your Jar.It will pick up the opencsv jar and will execute properly.",


mnm 2014-11-05T15:33:32

I found another workaround by implementing ToolRunner like below. With this approach hadoop accepts command line options. We can avoid hard coding of adding files to DistributedCache\n\n public class FlightsByCarrier extends Configured implements Tool {\n\n public int run(String[] args) throws Exception {\n // Configuration processed by ToolRunner\n Configuration conf = getConf();\n\n // Create a JobConf using the processed conf\n JobConf job = new JobConf(conf, FlightsByCarrier.class);\n\n // Process custom command-line options\n Path in = new Path(args[1]);\n Path out = new Path(args[2]);\n\n // Specify various job-specific parameters \n job.setJobName(\"my-app\");\n job.setInputPath(in);\n job.setOutputPath(out);\n job.setMapperClass(MyMapper.class);\n job.setReducerClass(MyReducer.class);\n\n // Submit the job, then poll for progress until the job is complete\n JobClient.runJob(job);\n return 0;\n }\n\n public static void main(String[] args) throws Exception {\n // Let ToolRunner handle generic command-line options \n int res = ToolRunner.run(new Configuration(), new FlightsByCarrier(), args);\n\n System.exit(res);\n }\n }\n",


More about “Setting external jars to hadoop classpath” related questions

Setting external jars to hadoop classpath

I am trying to set external jars to hadoop classpath but no luck so far. I have the following setup $ hadoop version Hadoop 2.0.6-alpha Subversion https://git-wip-us.apache.org/repos/asf/

Show Detail

External jars & HBase in classpath

I'm having problem with using external jars in my file. I always get: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration` at this line: `

Show Detail

Setting additional classpath for a hadoop tool

How can I dynamically set up an additional classpath so that jars are accessible from a hadoop tool class? I'm running my job via "hadoop jar". I need something like the "-libjars" switch which

Show Detail

Where and How to set HADOOP_CLASSPATH

I have written a maven program and then I build it and put 3rd party jars on target/lib folder. with which command and where I can put them in my HADOOP_CLASSPATH? The location of setting

Show Detail

Configure classpath for external jars

I would like to configure external jars into java classpath. Is there any way to configure this into Apache Felix configuration file.

Show Detail

Hadoop 1.2.1: Put jars in hdfs on classpath

I have a hadoop job which requires several 3rd party jars. I have put them on the classpath with conf/hadoop-env.sh export HADOOP_CLASSPATH=hdfs://name.node.private.ip:9000/home/ec2-user/hadoop-gr...

Show Detail

How HBase add its dependency jars and use HADOOP_CLASSPATH

48. HBase, MapReduce, and the CLASSPATH By default, MapReduce jobs deployed to a MapReduce cluster do not have access to either the HBase configuration under $HBASE_CONF_DIR or the HBase classes. ...

Show Detail

Hadoop DistributedCache classpath resolution

I am facing an issue using a Hadoop's DistributedCache. I am getting java.lang.ClassNotFoundException for the jars which were successfully added to cache. I revealed the problem: - The client is

Show Detail

How to figure out what JARs are in Hadoop classpath fin HDP2.5 sandbox?

How to figure out what JARs are in Hadoop classpath? I'm using Hortonworks 2.5 sandbox and want to run my custom application using already present im sandbox Hadoop JARs

Show Detail

hadoop jaxb classpath issue

I am seeing this inside a hadoop job: 2014-08-21 09:26:57,216 ERROR org.apache.hadoop.mapred.Child: Error running child : java.lang.ExceptionInInitializerError at com.sun.xml.internal.ws.clien...

Show Detail