AWS 使用 Hadoop API 增加 map 和 reducers答案

【问题标题】：AWS Increase maps and reducers using Hadoop APIAWS 使用 Hadoop API 增加 map 和 reducers
【发布时间】：2012-09-26 04:19:25
【问题描述】：

我在 AWS 服务器上运行 WordCount 示例。我想测试我的输出并分析它们。我想增加编号。映射器和没有。减速器，也没有。块。

我怎样才能达到同样的效果？

我必须设置否。创建工作时映射器/减速器的数量？或者我必须添加一些代码？我正在使用 java。

【问题讨论】：

标签： java hadoop amazon-ec2 amazon-web-services mapreduce

【解决方案1】：

您可以分别使用 JobConf 的 conf.setNumMapTasks(int num) 和 conf.setNumRedTasks(int num) 在启动 MapReduce 作业的 Java 程序的 main 函数中设置映射器和缩减器的数量。

对于映射器，请注意api:中的以下内容

"This is only a hint to the framework. The actual number of spawned map tasks depends on the number of InputSplits generated by the job's InputFormat.getSplits(JobConf, int). A custom InputFormat is typically used to accurately control the number of map tasks for the job."

明确设置输入块的数量有点困难。输入的拆分方式由您使用的InputFormat 和它使用的相应InputSplits 决定。如果您希望控制输入的拆分方式，则必须制作自己的自定义 InputFormat/InputSplits。

【讨论】：