【问题标题】:Hadoop error in shuffle in fetcher: Exceeded MAX_FAILED_UNIQUE_FETCHES提取器中随机播放的 Hadoop 错误:超过 MAX_FAILED_UNIQUE_FETCHES
【发布时间】:2014-06-05 17:07:10
【问题描述】:

我是 hadoop 新手。我在虚拟机上设置了一个启用 kerberos 安全性的 hadoop 集群(主服务器和 1 个从服务器)。我正在尝试从 hadoop 示例 'pi' 运行工作。作业终止并出现错误 Exceeded MAX_FAILED_UNIQUE_FETCHES。我尝试搜索此错误,但互联网上给出的解决方案似乎对我不起作用。也许我错过了一些明显的东西。我什至尝试从 etc/hadoop/slaves 文件中删除从属,以查看该作业是否只能在主控上运行,但同样的错误也会失败。下面是日志。我在 64 位 Ubuntu 14.04 虚拟机上运行它。任何帮助表示赞赏。

montauk@montauk-vmaster:/usr/local/hadoop$ sudo -u yarn bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 2 10
Number of Maps  = 2
Samples per Map = 10
OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/06/05 12:04:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
14/06/05 12:04:49 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.29:8040
14/06/05 12:04:50 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 17 for yarn on 192.168.0.29:54310
14/06/05 12:04:50 INFO security.TokenCache: Got dt for hdfs://192.168.0.29:54310; Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:50 INFO input.FileInputFormat: Total input paths to process : 2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: number of splits:2
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401975262053_0007
14/06/05 12:04:51 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.0.29:54310, Ident: (HDFS_DELEGATION_TOKEN token 17 for yarn)
14/06/05 12:04:53 INFO impl.YarnClientImpl: Submitted application application_1401975262053_0007
14/06/05 12:04:53 INFO mapreduce.Job: The url to track the job: http://montauk-vmaster:8088/proxy/application_1401975262053_0007/
14/06/05 12:04:53 INFO mapreduce.Job: Running job: job_1401975262053_0007
14/06/05 12:05:29 INFO mapreduce.Job: Job job_1401975262053_0007 running in uber mode : false
14/06/05 12:05:29 INFO mapreduce.Job:  map 0% reduce 0%
14/06/05 12:06:04 INFO mapreduce.Job:  map 50% reduce 0%
14/06/05 12:06:06 INFO mapreduce.Job:  map 100% reduce 0%
14/06/05 12:06:34 INFO mapreduce.Job:  map 100% reduce 100%
14/06/05 12:06:34 INFO mapreduce.Job: Task Id : attempt_1401975262053_0007_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#4
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
    at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:323)
    at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:245)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:347)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)

【问题讨论】:

    标签: hadoop mapreduce


    【解决方案1】:

    当我使用 tarball 安装带有 kerberos 安全性的 cdh5.1.0 时,我遇到了与您相同的问题,google 找到的解决方案内存不足,但我认为这不是我的情况,因为我的输入非常小(52K)。

    经过几天的挖掘,我在this link找到了根本原因。

    总结该链接中的解决方案可以是:

    1. 在 yarn-site.xml 中添加以下属性,即使它是 yarn-default.xml 中的默认属性

      <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>

    2. 删除属性yarn.nodemanager.local-dirs并使用默认值/tmp。然后执行以下命令:

      mkdir -p /tmp/hadoop-yarn/nm-local-dir chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir

    问题可以总结了:

    设置yarn.nodemanager.local-dirs属性后,yarn-default.xml中的yarn.nodemanager.aux-services.mapreduce_shuffle.class属性没有工作。

    我也没有找到根本原因。

    【讨论】:

    • 仅供参考,yarn-site.xml 通常位于此处/etc/hadoop/yarn-site.xml,但对我而言,在 AWS EMR 上,它位于此处/etc/hadoop/conf/yarn-site.xml,这是/etc/alternatives/hadoop-conf/yarn-site.xml 实际位置的符号链接。
    【解决方案2】:

    我遇到了同样的问题。我在没有 reducer 的情况下进行了 mapreduce 工作。然后我使用 job.setNumReduceTasks(0); 解决了它

    【讨论】:

    • 这条线去哪儿了?我不确定如何处理job.setNumReduceTasks(0);
    【解决方案3】:
    1. 更改 yarn-site.xml 中的以下属性并创建目录。

      yarn.nodemanager.local-dirs /tmp

      mkdir -p /tmp/hadoop-yarn/nm-local-dir chown yarn:yarn /tmp/hadoop-yarn/nm-local-dir

    2. 调整 mapred-site.xml 中的资源属性

      mapreduce.reduce.shuffle.input.buffer.percent=0.50 mapreduce.reduce.shuffle.memory.limit.percent=0.2 mapreduce.reduce.shuffle.parallelcopies=4

    3. 在各自的节点上重启resourcemanager和nodemanager。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-09-03
      • 1970-01-01
      • 1970-01-01
      • 2023-01-23
      • 1970-01-01
      相关资源
      最近更新 更多