In a single Hadoop program I have three Map/Reduce jobs. The first one should count the total number of words in my data set. The second and third Map/Reduce do something else depending on the number from the first Map/Reduce. Is there a way to set the output of the first Map/Reduce to a global variable to be used throughout the rest of the program?
My first thought was to have the first Reduce step write the number as an output and then have the second Mapper read this file, but I would rather not do this.
Copyright Notice:Content Author:「Sal」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/30484383/hadoop-count-frequency-and-then-set-variable-in-second-map-reduce