Error launching Hadoop Streaming job: Not a file:

Search

NickName:Rajat Khirid Ask DateTime:2022-08-27T01:42:23

Error launching Hadoop Streaming job: Not a file:

I am using AWS EMR-6.5.0 with Hadoop-3.2.1 I'm following this guide to launch the stream job: https://levelup.gitconnected.com/map-reduce-with-python-hadoop-on-aws-emr-341bdd07b804

When I run the command : $ hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input books-input -output books-output

I get the error:

ERROR streaming.StreamJob: Error Launching job : Not a file: hdfs://ip-172-31-55-89.ec2.internal/172.31.55.89:8032/user/hadoop/books-input/1340.txt
Streaming Command Failed!

Complete log:

2022-08-26 15:55:12,295 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-89.ec2.internal/172.31.55.89:8032

2022-08-26 15:55:12,592 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-55-89.ec2.internal/172.31.55.89:8032
2022-08-26 15:55:12,653 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-89.ec2.internal/172.31.55.89:8032

2022-08-26 15:55:12,654 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-55-89.ec2.internal/172.31.55.89:8032

2022-08-26 15:55:13,083 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1661529292338_0001

2022-08-26 15:55:14,507 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library

2022-08-26 15:55:14,518 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 049362b7cf53ff5f739d6b1532457f2c6cd495e8]

2022-08-26 15:55:14,690 INFO mapred.FileInputFormat: Total input files to process : 49

2022-08-26 15:55:14,691 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1661529292338_0001

2022-08-26 15:55:14,769 ERROR streaming.StreamJob: Error Launching job : Not a file: hdfs:/ip-172-31-55-89.ec2.internal/172.31.55.89:8032/user/hadoop/books-input/1340.txt
Streaming Command Failed!

I don't know why it says "not a file" for a .txt file,

Copyright Notice：Content Author:「Rajat Khirid」，Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/73504754/error-launching-hadoop-streaming-job-not-a-file

Products recommended by Amazon

More >>>

SanDisk 128GB microSDXC-Card, Licensed for Nintendo-Switch - SDSQXAO-128G-GNCZN

SanDisk 128GB microSDXC-Card, Licensed for Nintendo-Switch - SDSQXAO-128G-GNCZN

HUANUO Dual Monitor Stand - Adjustable Spring Monitor Desk Mount Swivel Vesa Bracket with C Clamp, Grommet Mounting Base for 13 to 27 Inch Computer Screens - Each Arm Holds 4.4 to 14.3lbs

HUANUO Dual Monitor Stand - Adjustable Spring Monitor Desk Mount Swivel Vesa Bracket with C Clamp, Grommet Mounting Base for 13 to 27 Inch Computer Screens - Each Arm Holds 4.4 to 14.3lbs

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

Timberland Men's 6-Inch Premium Waterproof Boot

Timberland Men's 6-Inch Premium Waterproof Boot

More about “Error launching Hadoop Streaming job: Not a file:” related questions

Error launching Hadoop Streaming job: Not a file:

I am using AWS EMR-6.5.0 with Hadoop-3.2.1 I'm following this guide to launch the stream job: https://levelup.gitconnected.com/map-reduce-with-python-hadoop-on-aws-emr-341bdd07b804 When I run the c...

Show Detail

Using mrjob for hadoop streaming: "Error launching job , bad input path : File does not exist:"

Running this code example: #!/usr/bin/env python from mrjob.job import MRJob class MRWordCount(MRJob): def mapper(self, _,line): for word in line.split(): ...

Show Detail

Hadoop Error: Error launching job , bad input path : File does not exist.Streaming Command Failed

I am running an MRJob on Hadoop cluster & I am getting the following error: No configs found; falling back on auto-configuration Looking for hadoop binary in $PATH... Found hadoop binary: /usr...

Show Detail

Error launching job using mrjob on Hadoop

I am new to hadoop and mrjob and this book really helped me a lot to learn. I was trying to run mrSVM.py on hadoop as it works fine locally. But I ran the following command:python mrSVM.py -r hadoop

Show Detail

Hadoop Streaming job error

Hello I am trying to run Hadoop Streaming job using Python in EMR 4.7.2 with command as follows: hadoop-streaming -archives s3://mybucket/scripts/HDP/python_scripts/py.tgz -mapper py.tgz/processR...

Show Detail

Hadoop Streaming Job failed error in python

From this guide, I have successfully run the sample exercise. But on running my mapreduce job, I am getting the following error ERROR streaming.StreamJob: Job not Successful! 10/12/16 17:13:38 INFO

Show Detail

hadoop streaming job fails in python

im trying to implement an algorithm in hadoop. i tried to execute part of the code in hadoop but streaming job fails $ /home/hadoop/hadoop/bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -...

Show Detail

Hadoop error with map-reduce job with security.UserGroupInformation: PriviledgedActionException (no such file or directory)

I am running a simple streaming map-reduce job and can't seem to get by this error. hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.2.1.jar -input textDataFiles/* -output counts -file...

Show Detail

"Text file busy" error for the mapper in a Hadoop streaming job execution

I have an application that creates text files with one line each and dumps it to hdfs. This location is in turn being used as the input directory for a hadoop streaming job. The expectation is tha...

Show Detail

Hadoop Streaming Job Failed (Unsuccessful) in Python

So my scripts work perfectly when I run: cat England.txt | ./mapperEngl.py | sort | ./reducerEngl.py However when I run: /shared/hadoop/cur/bin/hadoop jar /shared/hadoop/cur/share/hadoop/tools/lib/

Show Detail