在 map reduce word count 程序中需要获取单词所在的文件答案

【问题标题】：In a map reduce word count program need to fetch the files where the words exist在 map reduce word count 程序中需要获取单词所在的文件
【发布时间】：2019-07-01 15:42:03
【问题描述】：

我正在读取多个输入文件以解决字数问题。

示例文件名：文件1.txt 文件2.txt 文件3.txt

我可以获取字数，但是如果我还想获取文件名以及单词存在的计数，应该添加什么。

举个例子，

文件 1 的内容：欢迎使用 Hadoop

文件2的内容：这是hadoop

当前输出：

Hadoop 2

是 1

这个 1

到 1

欢迎 1

预期输出：

Hadoop 2 File01.txt File02.txt

是 1 个 File02.txt

这 1 个 File02.txt

到 1 个 File01.txt

欢迎 1 File01.txt

【问题讨论】：

How to get the input file name in the mapper in a Hadoop program?的可能重复
谢谢@BenWatson
很高兴它有帮助。

标签： java hadoop mapreduce hadoop2 hadoop-partitioning

【解决方案1】：

第一次输入一个拆分 String file = ((FileSplit)inputSplit).getPath().getName(); 并从映射器中收集单词和文件名作为输出。

在 reducer 中，根据 key 计算文件名并递增计数器并继续附加文件名。

   file += filename;
   textString = counter + file;
   output.collect(key,new Text(textString));

这解决了问题。

【讨论】：