Reducer 类中的 Run 和 Reduce 方法答案

【问题标题】：Run and Reduce mehods in Reducer classReducer 类中的 Run 和 Reduce 方法
【发布时间】：2014-09-11 07:26:45
【问题描述】：

谁能帮我解释一下 Reducer 类中 run() 和 reduce() 方法的执行流程。我正在尝试计算 MapReduce 作业中的平均字数。我的 Reducer 类接收“单词”和“出现的可迭代”作为键值对。

我的目标是计算文档中所有单词的单词出现次数的平均值。 reducer 中的 run() 方法可以遍历所有键并计算所有单词数吗？然后我可以使用这个总和通过遍历键提供的每个可迭代值来找到平均值

    import java.io.IOException;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;

    public class AverageReducer extends Reducer<Text, IntWritable, Text,IntWritable>  {

    private IntWritable average = new IntWritable();

    private static int count=0;

    protected void run()
     {
        //loop through all the keys and increment count
     }

   public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
     {
       int sum=0;
       for(IntWritable val:values)
         {
           sum=sum+val.get();
         }
       average.set(sum/count);
       context.write(key, average);
     }

【问题讨论】：

标签： hadoop mapreduce

【解决方案1】：

如here 所述，您不能对值进行两次迭代。而且我认为重写run 方法是个坏主意，它只是遍历键并为每一对（source）调用reduce 方法。因此，您无法仅使用一项 map-reduce 作业来计算单词出现的平均值。

【讨论】：

谢谢阿列克谢。节省了我的时间。