Split reduced data into output and new input in Hadoop

Search

NickName:Mennny Ask DateTime:2013-01-14T00:15:47

Split reduced data into output and new input in Hadoop

I've been looking around for days trying to find a way using reduced data for further mapping in hadoop. I've got objects of class A as input data and objects of class B as output data. The Problem is, that while mapping not only Bs are generated but new As as well.

Here's what I'd like to achieve:

1.1 input: a list of As
1.2 map result: for each A a list of new As and a list of Bs is generated
1.3 reduce: filtered Bs are saved as output, filtered As are added to the map jobs

2.1 input: a list of As produced by the first map/reduce
2.2 map result: for each A a list of new As and a list of Bs is generated
2.3 ...

3.1 ...

You should get the basic idea.

I've read a lot about chaining but I'm not sure how to combine ChainReducer and ChainMapper or even if this would be the right approach.

So here's my question: How can I split the mapped data while reducing to save one part as output and the other part as new input data.

Copyright Notice：Content Author:「Mennny」，Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/14305351/split-reduced-data-into-output-and-new-input-in-hadoop

Products recommended by Amazon

More >>>

SanDisk 128GB microSDXC-Card, Licensed for Nintendo-Switch - SDSQXAO-128G-GNCZN

SanDisk 128GB microSDXC-Card, Licensed for Nintendo-Switch - SDSQXAO-128G-GNCZN

HUANUO Dual Monitor Stand - Adjustable Spring Monitor Desk Mount Swivel Vesa Bracket with C Clamp, Grommet Mounting Base for 13 to 27 Inch Computer Screens - Each Arm Holds 4.4 to 14.3lbs

HUANUO Dual Monitor Stand - Adjustable Spring Monitor Desk Mount Swivel Vesa Bracket with C Clamp, Grommet Mounting Base for 13 to 27 Inch Computer Screens - Each Arm Holds 4.4 to 14.3lbs

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

Timberland Men's 6-Inch Premium Waterproof Boot

Timberland Men's 6-Inch Premium Waterproof Boot

More about “Split reduced data into output and new input in Hadoop” related questions

Split reduced data into output and new input in Hadoop

I've been looking around for days trying to find a way using reduced data for further mapping in hadoop. I've got objects of class A as input data and objects of class B as output data. The Problem...

Show Detail

Unable to get the expected reduced output using Mapreduce in Hadoop

I am trying to learn MapReduce and doing this task. My input is as below(State, Sport, Amount(in USD)): California Football 69.09 California Swimming 31.5 Illinois Golf 8.31 Illinois Tennis 15.75

Show Detail

Input Split always 2 in hadoop cluster HDinsight

I deployed a 5 node hadoop MR cluster in Azure. I am using a bash script to perform chaining. I am using Hadoop streaming API, as my implementation is in Python. My input data is always in one fil...

Show Detail

Input split for Map function in Hadoop

This is my first implementation in Hadoop. I am trying to implement my algorithm for probabilistic dataset in Map Reduce. In my dataset, last column will have some id(number of unique id's in the d...

Show Detail

Hadoop which node will do split input data to multiple blocks?

I am new to hadoop have few questions? which node will do split input data to multiple blocks? Find datanode based on shortpath . question is find shortpath between client vs datanode or datanode vs

Show Detail

Hadoop read input split multiple times

I need to iterate over the input splits more than once. The reason I need this is beyond the scope of this question. Let's suppose I just need it (A brief explanation would be that I need to use the

Show Detail

Default size of input split in Hadoop

What is the default size of input split in Hadoop. As I know default size of block is 64 MB. Is there any file in Hadoop jar in which we can see the default values of all such things ? like default

Show Detail

Hadoop Reduce Output File Never Created for Large Data

I'm writing an application in Java on Hadoop 1.1.1 (Ubuntu) that compares strings in order to find the longest common substrings. I've got both the map and reduce phases running successfully for sm...

Show Detail

How (in Hadoop),is the data put into map and reduce functions in correct types?

I'm having a bit difficult in understanding in Hadoop, how the data put into the map and reduced functions. I know that we can define the input format and output format and then the key types for i...

Show Detail

Big data and hadoop exception when running a Map reduced programme

I got the WordCount.java code from the internet and I tried to run it in eclipse after including the necessary libraries. But the code throws this exception: 2015-05-27 17:48:24,759 WARN util.

Show Detail