I am using AWS EMR-6.5.0 with Hadoop-3.2.1
I'm following this guide to launch the stream job: https://levelup.gitconnected.com/map-reduce-with-python-hadoop-on-aws-emr-341bdd07b804
When I run the command :
$ hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -files mapper.py,reducer.py -mapper mapper.py -reducer reducer.py -input books-input -output books-output
I get the error:
ERROR streaming.StreamJob: Error Launching job : Not a file: hdfs://ip-172-31-55-89.ec2.internal/172.31.55.89:8032/user/hadoop/books-input/1340.txt
Streaming Command Failed!
Complete log:
2022-08-26 15:55:12,295 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-89.ec2.internal/172.31.55.89:8032
2022-08-26 15:55:12,592 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-55-89.ec2.internal/172.31.55.89:8032
2022-08-26 15:55:12,653 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-89.ec2.internal/172.31.55.89:8032
2022-08-26 15:55:12,654 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-55-89.ec2.internal/172.31.55.89:8032
2022-08-26 15:55:13,083 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1661529292338_0001
2022-08-26 15:55:14,507 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
2022-08-26 15:55:14,518 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 049362b7cf53ff5f739d6b1532457f2c6cd495e8]
2022-08-26 15:55:14,690 INFO mapred.FileInputFormat: Total input files to process : 49
2022-08-26 15:55:14,691 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1661529292338_0001
2022-08-26 15:55:14,769 ERROR streaming.StreamJob: Error Launching job : Not a file: hdfs:/ip-172-31-55-89.ec2.internal/172.31.55.89:8032/user/hadoop/books-input/1340.txt
Streaming Command Failed!
I don't know why it says "not a file" for a .txt file,
Copyright Notice:Content Author:「Rajat Khirid」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/73504754/error-launching-hadoop-streaming-job-not-a-file