Hadoop - How to run another mapreduce job while one is running?
NickName:Rishabh Dixit Ask DateTime:2018-10-22T18:18:32

Hadoop - How to run another mapreduce job while one is running?

I already have a high time consuming map reduce job running on my cluster. When I am submitting another job, it is stuck at the below point which suggests that it is waiting for currently running job to complete:

hive> select distinct(circle) from vf_final_table_orc_format1;
Query ID = hduser_20181022153503_335ffd89-1528-49be-b091-21213d702a03
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 10
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1539782606189_0033, Tracking URL = http://secondary:8088/proxy/application_1539782606189_0033/
Kill Command = /home/hduser/hadoop/bin/hadoop job  -kill job_1539782606189_0033

I am running a mapreduce job on 166GB of data currently. My setup included 7 nodes out of which 5 are DN with 32GB RAM and 8.7TB HDD while 1 NN and 1 SN has 32 GB RAM and 1.1TB HDD.

What settings do I need to tweak in order to execute the jobs in parallel? I am currently using hadoop 2.5.2 version.

EDIT : Right now my cluster is consuming only 8-10 GB of RAM out of 32 GB per node. The other HIVE queries,MR Jobs are stuck and are waiting for a single job to finish. How do I increase the memory consumption to facilitate more jobs executing in parallel. Here is the current output of ps command :

[hduser@secondary ~]$ ps -ef | grep -i runjar | grep -v grep
hduser   110398      1  0 Nov11 ?        00:07:15 /opt/jdk1.8.0_77//bin/java -Dproc_jar -Xmx1000m 
-Dhadoop.log.dir=/home/hduser/hadoop/logs -Dyarn.log.dir=/home/hduser/hadoop/logs 
-Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log -Dyarn.home.dir= 
-Dyarn.id.str= -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console -Dyarn.policy.file=hadoop-policy.xml
-Dhadoop.log.dir=/home/hduser/hadoop/logs -Dyarn.log.dir=/home/hduser/hadoop/logs 
-Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log 
-Dyarn.home.dir=/home/hduser/hadoop -Dhadoop.home.dir=/home/hduser/hadoop 
-Dhadoop.root.logger=INFO,console 
-Dyarn.root.logger=INFO,console 
-classpath /home/hduser/hadoop/etc/hadoop:/home/hduser/hadoop/etc/hadoop:/home/hduser/hadoop/etc/hadoop:/home/hduser/hadoop/share/hadoop/common/lib/*:/home/hduser/hadoop/share/hadoop/common/*:/home/hduser/hadoop/share/hadoop/hdfs:/home/hduser/hadoop/share/hadoop/hdfs/lib/*:/home/hduser/hadoop/share/hadoop/hdfs/*:/home/hduser/hadoop/share/hadoop/yarn/lib/*:/home/hduser/hadoop/share/hadoop/yarn/*:/home/hduser/hadoop/share/hadoop/mapreduce/lib/*:/home/hduser/hadoop/share/hadoop/mapreduce/*:/home/hduser/hadoop/contrib/capacity-scheduler/*.jar:/home/hduser/hadoop/share/hadoop/yarn/*:/home/hduser/hadoop/share/hadoop/yarn/lib/* 
org.apache.hadoop.util.RunJar abc.jar def.mydriver2 /raw_data /mr_output/

Copyright Notice:Content Author:「Rishabh Dixit」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/52926999/hadoop-how-to-run-another-mapreduce-job-while-one-is-running

More about “Hadoop - How to run another mapreduce job while one is running?” related questions

Hadoop - How to run another mapreduce job while one is running?

I already have a high time consuming map reduce job running on my cluster. When I am submitting another job, it is stuck at the below point which suggests that it is waiting for currently running j...

Show Detail

NullPoinerEcxeption while running HBase MapReduce Job

I was trying to run an HBase Mapreduce Job in hadoop. I use ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/lib/hbase-server-VERSION.jar rowcounter usertable for running job. While running I am getting

Show Detail

Hadoop MapReduce Job Hangs

I am trying to simulate the Hadoop environment using latest Hadoop version 2.6.0, Java SDK 1.70 on my Ubuntu desktop. I configured the hadoop with necessary environment parameters and all its proc...

Show Detail

Unable to run Hadoop MapReduce Word Count on Hadoop 2.6

I have set up Hadoop 2.6.0 on 2 Vm's each running CentOS 6.5(64 bit) ,Yarn as well as Hadoop is running fine. My host m/c is Windows 8.1 (64 bit) I am trying to run Word Count map reduce problem from

Show Detail

Running MapReduce job on hadoop remote cluster

I would like to know if there is a method for running MapReduce Job into a Hadoop remote cluster. In my University there is a cluster which has Hadoop installed, so I have been learning MapReduce for

Show Detail

IOException while running mapreduce job

I created a mapreduce job to calculate average of some monitoring information based on certain time periods. It was working fine. Yesterday the hadoop running machine unexpectedly shutdown due to p...

Show Detail

Not able to run Hadoop job remotely

I want to run a hadoop job remotely from a windows machine. The cluster is running on Ubuntu. Basically, I want to do two things: Execute the hadoop job remotely. Retrieve the result from hadoop ...

Show Detail

Right way to package and deploy Hadoop MapReduce job?

I run Hadoop 2.2.0.2.0.6.0-101 on a local node, CentOS. My MapReduce job compiles in Eclipse when I include neccessary jars from /usr/lib/hadoop and /usr/lib/hive as dependencies in Eclipse project.

Show Detail

Running a Hadoop job from a Java Program

I am writing a distributed system and am facing problem with connecting it to Hadoop. Here's my situation: 1) I have a distributed system running on 3 computers (sys1, sys2, sys3) 2) Sys2 and Sys...

Show Detail

MapReduce job not working in HADOOP-2.6.0

I am trying to run wordcount example Here is the code import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs....

Show Detail