Hadoop Mapreduce 作业失败：java.lang.RuntimeException：配置对象时出错答案

【问题标题】：Hadoop Mapreduce Job failes :java.lang.RuntimeException: Error in configuring objectHadoop Mapreduce 作业失败：java.lang.RuntimeException：配置对象时出错
【发布时间】：2021-07-20 12:42:16
【问题描述】：

我正在尝试使用 python 实现 Hadoop Mapreduce 流，以获取本教程中的简单字数统计示例： https://www.geeksforgeeks.org/hadoop-streaming-using-python-word-count-problem/

我的映射器.py

#!/usr/bin/env python3
import sys

# Remove whitespace either side 
for line in sys.stdin:
    myline = line.strip() 
    
       # Break the line into words 
    words = myline.split() 
    
       # Iterate the words list
    for myword in words:
          # Write the results to standard output 
          print  (myword+'\t', 1)

还有我的 reducer.py

#!/usr/bin/env python3

from operator import itemgetter 
import sys 

current_word = ""
current_count = 0 
word = "" 

for line in sys.stdin:
   # Remove whitespace either side 
   myline = line.strip() 
   # Split the input we got from mapper.py word, 
   word,count = myline.split('\t', 1) 

   # Convert count variable to integer 
   try: 
      count = int(count) 

   except ValueError: 
       continue
   
   if current_word == word:
       current_count += count
       
   else :
       if(current_count>0):
           print(current_word, current_count)
       current_count = count
       current_word = word

if current_word == word:
        print (current_word,current_count)

当我使用此代码运行时，我的程序正在本地运行

cat word.txt | python mapper.py | sort -k1,1 | python reducer.py

图片中显示的输出：

但是当我尝试使用 hadoop 流式传输它时，我得到了以下错误：

21/04/27 09:53:46 INFO mapreduce.Job: Task Id : attempt_1619506264660_0001_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:113)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:79)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:110)
    ... 9 more
Caused by: java.lang.RuntimeException: configuration exception
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
    at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
    ... 14 more
Caused by: java.io.IOException: Cannot run program "/home/edureka/reducer.py": error=2, No such file or directory
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
    ... 15 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
    at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
    at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
    at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
    ... 17 more

我认为减速器的错误是因为映射器已 100% 完成，如图所示，但仍然不确定如何解决我的问题：

【问题讨论】：

标签： python python-3.x hadoop

【解决方案1】：

错误是因为我的减速器目录路径错误

【讨论】：