Hadoop: count frequency and then set variable in second map/reduce
NickName:Sal Ask DateTime:2015-05-27T21:46:03

Hadoop: count frequency and then set variable in second map/reduce

In a single Hadoop program I have three Map/Reduce jobs. The first one should count the total number of words in my data set. The second and third Map/Reduce do something else depending on the number from the first Map/Reduce. Is there a way to set the output of the first Map/Reduce to a global variable to be used throughout the rest of the program?

My first thought was to have the first Reduce step write the number as an output and then have the second Mapper read this file, but I would rather not do this.

Copyright Notice:Content Author:「Sal」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/30484383/hadoop-count-frequency-and-then-set-variable-in-second-map-reduce

Answers
gwgyk 2015-05-28T03:31:33

Can you set the these 3 jobs in the one job? And you can define a global variable to keey the number.",


More about “Hadoop: count frequency and then set variable in second map/reduce” related questions

Hadoop: count frequency and then set variable in second map/reduce

In a single Hadoop program I have three Map/Reduce jobs. The first one should count the total number of words in my data set. The second and third Map/Reduce do something else depending on the number

Show Detail

Mapreduce Word Count Hadoop Highest Frequency Word

So from the Hadoop tutorial website (http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Source_Code) on how to implement word count us...

Show Detail

Map Reduce Line Frequency of Words

I am currently working on a Hadoop project in Java. My objective is to make a map reduce that counts the line frequency of every word. As in, not outputting the exact amount of times a word is coun...

Show Detail

Hadoop Map Reduce Program

When I was trying the Map Reduce programming example from Hadoop in Action book based on Hadoop 0.20 API I got the error java.io.IOException: Type mismatch in value from map: expected org.apache....

Show Detail

How to retrieve hadoop job map/reduce input/output count

Is there any way to retrieve and print the number of reduce output records after running a hadoop job? I'm iteratively running a map-reduce and I want to stop when my previous reduce output count i...

Show Detail

hadoop - map reduce task and static variable

I just started working on some hadoop/hbase MapReduce job (using cloudera) and I have the following question : Let's say, we have a java class with a main and a static viariable. That class define...

Show Detail

hadoop - map reduce task and static variable

I just started working on some hadoop/hbase MapReduce job (using cloudera) and I have the following question : Let's say, we have a java class with a main and a static viariable. That class define...

Show Detail

hadoop map reduce job pending too long

I have a question about running hadoop mapreduce job. I have a table staff, partitioned by join date. Create statement like that: create table staff (id int, age int) partitioned by (join_date str...

Show Detail

Finding percentage on hadoop map reduce

I am trying to analyze A Flight data( around 20 GB ) on MapReduce Framework. I need to find the percentage of delayed flights. If a flight departure max 5 minutes early or late, I am saying tha...

Show Detail

Hadoop MapReduce Jobs for Highest Frequency

I'm trying to use the basic word count as defined here. Is it possible that when the IntSumReducer does context.write, that context.write could be passed to a second reducer or output class that wo...

Show Detail