hadoop reducer output was read in reducer iteratively
NickName:Suanmeiguo Ask DateTime:2013-09-17T00:18:20

hadoop reducer output was read in reducer iteratively

I am just testing on the word count example using a 3 machine cluster. My codes are the same as this example except the following:

I add two line code in the reducer code before "output.collect(key, new IntWritable(sum))" line:

System.out.println(key);
key.set(key + " - Key in Reducer");

Then I check my reducer log (last 8 K, I found this:

3M3WI - Key in Reducer - Key in Reducer
3M3WIG - Key in Reducer - Key in Reducer
3M3WL - Key in Reducer - Key in Reducer
3M3WNWPLG - Key in Reducer - Key in Reducer
3M3WQ - Key in Reducer - Key in Reducer
3M3WQNG.K78QJ0WN, - Key in Reducer - Key in Reducer
3M3WWR - Key in Reducer - Key in Reducer
3M3WX - Key in Reducer - Key in Reducer
3M3X - Key in Reducer - Key in Reducer
3M3X,. - Key in Reducer - Key in Reducer
3M3X.KZA8J - Key in Reducer - Key in Reducer
3M3X1 - Key in Reducer - Key in Reducer
3M3X8RC - Key in Reducer - Key in Reducer
3M3XC - Key in Reducer - Key in Reducer
3M3XCBD9R337PK - Key in Reducer - Key in Reducer
3M3XD - Key in Reducer - Key in Reducer
3M3XLW - Key in Reducer - Key in Reducer
3M3XML - Key in Reducer - Key in Reducer
3M3XN - Key in Reducer - Key in Reducer
3M3XU - Key in Reducer - Key in Reducer
3M3XX - Key in Reducer - Key in Reducer
3M3XZ - Key in Reducer - Key in Reducer
3M3Y - Key in Reducer - Key in Reducer
3M3YAIJL - Key in Reducer - Key in Reducer

Which means that my reducer output was input again in reducer. This should be the way hadoop works right? It shouldn't be iterative... And my code are the same as the example in hadoop.apache.com website...

Does anyone encounter the same issue?

Attached all my code, mostly the same as the example.

package test;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();

      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
          word.set(tokenizer.nextToken());
          output.collect(word, one);
        }
      }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int sum = 0;
        while (values.hasNext()) {
          sum += values.next().get();
        }
        System.out.println(key);
        key.set(key+" - Key in Reducer");
        output.collect(key, new IntWritable(sum));
      }
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);

      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);

      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);

      FileInputFormat.setInputPaths(conf, new Path(args[0]));
      FileOutputFormat.setOutputPath(conf, new Path(args[1]));

      JobClient.runJob(conf);
    }
}

Copyright Notice:Content Author:「Suanmeiguo」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/18832776/hadoop-reducer-output-was-read-in-reducer-iteratively

More about “hadoop reducer output was read in reducer iteratively” related questions

hadoop reducer output was read in reducer iteratively

I am just testing on the word count example using a 3 machine cluster. My codes are the same as this example except the following: I add two line code in the reducer code before "output.collect(ke...

Show Detail

hadoop: Reducer output to another Reducer

Is it possible to send output of a Reducer directly to another reducer if we want to group by the same key ( output of the first reducer ) Sometimes while chaining i observe that i am using a mapper

Show Detail

Split input to a reducer in hadoop

This question is kind of related to my other question Hadoop handling data skew in reducer. However, I would like to ask if there are some configuration settings available so that if say the max re...

Show Detail

Hadoop - Reducer to mapper port

Accordiing to http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Reducer.html the reducer copies the sorted output from each Mapper using HTTP across the network. What...

Show Detail

Hadoop: Reducer is called twice

I'm working with Hadoop on EMR. I wrote a simple program, that runs a single map-reduce process. The output I got was not what I expected, and with debug prints I discovered that the reducer is act...

Show Detail

Override in hadoop reducer

Iam a beginner to hadoop. I decided to create a sentiment analysis program .I have a mapper class. Output of the mapper is LongWritable and Text format. which is input to the reducer. Hence i have a

Show Detail

Hadoop: reducer not getting invoked

I know this is a very basic question but I am not able to find where I am making a mistake. My Reducer is not getting invoked from the driver code. I would greatly appreciate if anyone can help me ...

Show Detail

How make hadoop mapper output to be processed by its own reducer?

Now, when a hadoop mapper output some blocks, it will be transferred to a new node where a reducer is running. Even though the node which runs the mapper, also run the reducer it is not guaranteed ...

Show Detail

why Hadoop combiner output not merged by reducer

I ran a simple wordcount MapReduce example adding combiner with a small change in combiner output, The output of combiner is not merged by reducer. scenario is as follows Test: Map -> Combiner ->R...

Show Detail

Hadoop streaming - remove trailing tab from reducer output

I have a hadoop streaming job whose output does not contain key/value pairs. You can think of it as value-only pairs or key-only pairs. My streaming reducer (a php script) is outputting records

Show Detail