I've been looking around for days trying to find a way using reduced data for further mapping in hadoop. I've got objects of class A
as input data and objects of class B
as output data. The Problem is, that while mapping not only B
s are generated but new A
s as well.
Here's what I'd like to achieve:
1.1 input: a list of As
1.2 map result: for each A a list of new As and a list of Bs is generated
1.3 reduce: filtered Bs are saved as output, filtered As are added to the map jobs
2.1 input: a list of As produced by the first map/reduce
2.2 map result: for each A a list of new As and a list of Bs is generated
2.3 ...
3.1 ...
You should get the basic idea.
I've read a lot about chaining but I'm not sure how to combine ChainReducer and ChainMapper or even if this would be the right approach.
So here's my question: How can I split the mapped data while reducing to save one part as output and the other part as new input data.
Copyright Notice:Content Author:「Mennny」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/14305351/split-reduced-data-into-output-and-new-input-in-hadoop