如何使用 mapreduce 计算特定单词？答案

【问题标题】：How can I count specific word using mapreduce?如何使用 mapreduce 计算特定单词？
【发布时间】：2015-10-05 15:42:23
【问题描述】：

我正在修改普通的字数统计程序，它计算每个单词，使其只计算特定的单词。

reducer 和 map 类与普通字数相同。没有正确计算字数。我在文件中多次出现相同的特定单词，但计数只有一个。

public class wordcountmapper extends MapReduceBase implements Mapper<LongWritable, Tex, Text, IntWritable>                       // mapper function implemented.
{
    private final static IntWritable one = new IntWritable(1); // intwritable
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();      // conversion in string
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            if (line.compareTo("Cold") == 0) {  //cold is the specific word to get count for
                output.collect(word, one);      // getting 1 as a count for 'cold' as if its counting only first line 'cold' and not going to next line.
            }
        }
    }
}

【问题讨论】：

那么你遇到了什么问题？
我没有得到文件中每个“冷”的计数。我只是得到一个作为计数。我如何使用 if 语句有什么问题吗？ @VigneshI
reducer 数量是否设置为 1？并且你总结了来自reducer端的map端的值。也可以通过在 ouput.collect 之后放置一个 sysout 来尝试调试，然后检查密钥写入收集器的次数。
reducer 的数量为 1。我确实总结了来自地图方面的值。我通过将 sysout 放在 output.collect 之后进行检查，并且收集器获得了正确的计数，但 HDFS：文件系统计数器中写入的字节数 = 0。如果我从映射器中删除 if 语句，那么代码可以正常工作并计算文件中存在的所有单词。它与我的 if 条件有关，我做得不对！ @VigneshI

标签： java hadoop mapreduce hdfs

【解决方案1】：

首先，您的if statement 将线对象与“冷”进行比较，这是错误的。它应该将标记化的单词与“Cold”if(tokenizer.nextToken().equals("Cold")) 进行比较。

我不确定当前的逻辑是如何将“Cold”计数为 1。可能在您的输入中，您有一行包含一个单词“Cold”。

【讨论】：