【问题标题】:Hadoop Error: Error launching job , bad input path : File does not exist.Streaming Command FailedHadoop 错误:启动作业时出错,输入路径错误:文件不存在。流式传输命令失败
【发布时间】:2017-03-18 00:48:14
【问题描述】:

我在 Hadoop 集群上运行 MRJob,我收到以下错误:

No configs found; falling back on auto-configuration
Looking for hadoop binary in $PATH...
Found hadoop binary: /usr/local/hadoop/bin/hadoop
Using Hadoop version 2.7.3
Looking for Hadoop streaming jar in /usr/local/hadoop...
Found Hadoop streaming jar: /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar
Creating temp directory /tmp/Mr_Jobs.hduser.20170227.030012.446820
Copying local files to hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/...
Running step 1 of 1...
  session.id is deprecated. Instead, use dfs.metrics.session-id
  Initializing JVM Metrics with processName=JobTracker, sessionId=
  Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
  Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/hduser1748755362/.staging/job_local1748755362_0001
  Error launching job , bad input path : File does not exist: /app/hadoop/tmp/mapred/staging/hduser1748755362/.staging/job_local1748755362_0001/files/Mr_Jobs.py#Mr_Jobs.py
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['/usr/local/hadoop/bin/hadoop', 'jar', '/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar', '-files', 'hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/Mr_Jobs.py#Mr_Jobs.py,hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/mrjob.zip#mrjob.zip,hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/files/File.txt', '-output', 'hdfs:///user/hduser/tmp/mrjob/Mr_Jobs.hduser.20170227.030012.446820/output', '-mapper', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --mapper', '-combiner', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --combiner', '-reducer', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --reducer']' returned non-zero exit status 512

我正在通过此命令运行作业:

python3 /home/bhoots21304/Desktop/MrJobs-MR.py -r hadoop hdfs://input3/File.txt

第一行还说: 未找到配置;依靠自动配置

我上网查了一下。它说 /etc/ 文件夹中应该有名为 mrjob.conf 的文件。但它在我的文件系统中的任何地方都不存在。 我需要创建这个文件吗?如果是这样,它的内容应该是什么。

我使用此文件中提到的说明安装了 hadoop:

https://github.com/ev2900/Dev_Notes/blob/master/Hadoop/notes.txt

hadoop-env.sh、core-site.xml、mapred-site.xml、hdfs-site.xml 也配置得很好,因为如果我只运行一个简单的 worcount 作业(没有 MRJob),它就可以工作

(使用 'sudo -H pip3 install mrjob' 安装 MRJob)

【问题讨论】:

  • 您找到解决方案了吗?可以分享一下吗?

标签: python hadoop mrjob


【解决方案1】:

您需要在 mrjob.conf 中指定 python-bin 和 hadoop_streaming_jar。它应该看起来像这样,具体取决于 jar 的位置。

runners:
    hadoop:
        python_bin: python3
        hadoop_streaming_jar: /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar

【讨论】:

  • 嘿,我面临同样的问题,但找不到 mrjob.conf 文件。你能建议我在我的虚拟机上哪里可以找到它吗?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多