【发布时间】:2012-03-25 05:20:19
【问题描述】:
我是 hadoop 新手,正在尝试在我的 Windows 7 机器上安装 Hadoop 0.20.2 的单节点。
我的问题有两个方面 - 一个是关于安装本身的完整性,另一个是关于示例 Word Count 程序的 reduce 阶段的错误。
我的安装步骤如下:
我正在关注http://blog.benhall.me.uk/2011/01/installing-hadoop-0210-on-windows.html的安装过程。
我已经安装了 cygwin 并在我的本地主机上设置了无密码 ssh 我的java版本是:
java version "1.7.0_02"
Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)
conf/core-site.xml的内容:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
conf/hdfs-site.xml的内容:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml 的内容:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
我设置了 JAVA_HOME 变量,命令“hadoop version”打印 0.20.2 hadoop namenode -format 创建 DFS 没有任何错误
start-all.sh 打印 namenode、secondarynamenode、datanode、jobtracker 和 tasktracker 都已启动。
但是,命令“jps”打印:
$ jps
4584 Jps
11008 JobTracker
2084 NameNode
我注意到 jps 还打印了 tasktracker、secondarynamenode 的 pid。
我可以查看
的输出 http://localhost:50030 对于jobtracker, http://localhost:50060 用于 tasktracker 和 http://localhost:50070 用于名称节点。我尝试了对 hdfs 的 put 和 get 命令,它们都成功了:
bin/hadoop fs -mkdir In
bin/hadoop fs -put *.txt In
mkdir temp
bin/hadoop fs -get In temp
ls -l temp/In
$ ls -l temp/In/
total 365
348624 Mar 24 23:59 CHANGES.txt
13366 Mar 24 23:59 LICENSE.txt
101 Mar 24 23:59 NOTICE.txt
1366 Mar 24 23:59 README.txt
我还可以通过 namenode 的 http 界面浏览 DFS 来查看这些文件
- 我的安装完成了吗?
- 如果是,为什么 jps 命令没有显示所有五个组件的 pid?
- 如果没有,那么我需要哪些步骤才能完成安装?
- 还有哪些其他健全性检查可用于测试安装的完整性?
我最初认为我的安装已经完成,并按照http://jayant7k.blogspot.com/2010/06/writing-your-first-map-reduce-program.html 的行运行了一个示例 WordCount map-reduce 程序
我得到以下输出:
12/03/25 00:10:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/25 00:10:26 INFO input.FileInputFormat: Total input paths to process : 1
12/03/25 00:10:27 INFO mapred.JobClient: Running job: job_201203242348_0001
12/03/25 00:10:28 INFO mapred.JobClient: map 0% reduce 0%
12/03/25 00:10:35 INFO mapred.JobClient: map 100% reduce 0%
12/03/25 00:21:29 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:32:25 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_1, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:44:02 INFO mapred.JobClient: Task Id : attempt_201203242348_0001_r_0
00000_2, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
12/03/25 00:55:00 INFO mapred.JobClient: Job complete: job_201203242348_0001
12/03/25 00:55:00 INFO mapred.JobClient: Counters: 12
12/03/25 00:55:00 INFO mapred.JobClient: Job Counters
12/03/25 00:55:00 INFO mapred.JobClient: Launched reduce tasks=4
12/03/25 00:55:00 INFO mapred.JobClient: Launched map tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: Data-local map tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: Failed reduce tasks=1
12/03/25 00:55:00 INFO mapred.JobClient: FileSystemCounters
12/03/25 00:55:00 INFO mapred.JobClient: HDFS_BYTES_READ=13366
12/03/25 00:55:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=23511
12/03/25 00:55:00 INFO mapred.JobClient: Map-Reduce Framework
12/03/25 00:55:00 INFO mapred.JobClient: Combine output records=0
12/03/25 00:55:00 INFO mapred.JobClient: Map input records=244
12/03/25 00:55:00 INFO mapred.JobClient: Spilled Records=1887
12/03/25 00:55:00 INFO mapred.JobClient: Map output bytes=19699
12/03/25 00:55:00 INFO mapred.JobClient: Combine input records=0
12/03/25 00:55:00 INFO mapred.JobClient: Map output records=1887
map 任务看似完成,但 reduce 任务在日志中显示以下错误:
2012-03-25 00:10:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0: Got 1 new map-outputs
2012-03-25 00:10:40,193 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201203242348_0001_m_000000_0, compressed len: 23479, decompressed len: 23475
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 23475 bytes (23479 raw bytes) into RAM from attempt_201203242348_0001_m_000000_0
2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:11:35,194 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:12:35,197 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Need another 1 map output(s) where 1 is already in progress
2012-03-25 00:13:35,202 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201203242348_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-03-25 00:13:40,249 INFO org.apache.hadoop.mapred.ReduceTask: Failed to shuffle from attempt_201203242348_0001_m_000000_0
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:239)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:680)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2959)
at org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
at org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1522)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
以下是任务跟踪日志的内容:
2012-03-25 00:10:27,910 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000002_0 task's state:UNASSIGNED
2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:27,915 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:28,453 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_625085452
2012-03-25 00:10:28,454 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_625085452 spawned.
2012-03-25 00:10:29,217 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_625085452 given task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:29,523 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000002_0 0.0% setup
2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000002_0 is done.
2012-03-25 00:10:29,524 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000002_0 was 0
2012-03-25 00:10:29,526 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2012-03-25 00:10:29,718 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_625085452 exited. Number of tasks it ran: 1
2012-03-25 00:10:30,911 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201203242348_0001/attempt_201203242348_0001_m_000002_0/output/file.out in any of the configured local directories
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_m_000000_0 task's state:UNASSIGNED
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201203242348_0001_m_000002_0
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201203242348_0001_m_000002_0 done; removing files.
2012-03-25 00:10:30,952 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201203242348_0001_m_000002_0 not found in cache
2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_m_-1399302881
2012-03-25 00:10:31,077 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_m_-1399302881 spawned.
2012-03-25 00:10:31,812 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_m_-1399302881 given task: attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_m_000000_0 1.0%
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201203242348_0001_m_000000_0 is done.
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201203242348_0001_m_000000_0 was 0
2012-03-25 00:10:32,642 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2012-03-25 00:10:32,822 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201203242348_0001_m_-1399302881 exited. Number of tasks it ran: 1
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203242348_0001_r_000000_0 task's state:UNASSIGNED
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:33,982 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203242348_0001_r_625085452
2012-03-25 00:10:34,057 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201203242348_0001_r_625085452 spawned.
2012-03-25 00:10:34,852 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201203242348_0001_r_625085452 given task: attempt_201203242348_0001_r_000000_0
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475
2012-03-25 00:10:40,243 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60790, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0
2012-03-25 00:10:41,153 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:10:44,158 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker: Sent out 23479 bytes for reduce: 0 from map: attempt_201203242348_0001_m_000000_0 given 23479/23475
2012-03-25 00:16:05,244 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 192.168.1.33:50060, dest: 192.168.1.33:60864, bytes: 23479, op: MAPRED_SHUFFLE, cliID: attempt_201203242348_0001_m_000000_0
2012-03-25 00:16:05,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:16:08,249 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201203242348_0001_r_000000_0 0.0% reduce > copy >
2012-03-25 00:21:25,251 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201203242348_0001_r_000000_0 - Killed due to Shuffle Failure: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
我在 windows 防火墙中打开了端口 9000 和 9001 我检查了 telnet 输出以验证这些端口确实是打开的:
C:\Windows\system32>netstat -a -n | grep -e "500[367]0"
TCP 0.0.0.0:50030 0.0.0.0:0 LISTENING
TCP 0.0.0.0:50060 0.0.0.0:0 LISTENING
TCP 0.0.0.0:50070 0.0.0.0:0 LISTENING
TCP [::]:50030 [::]:0 LISTENING
TCP [::]:50060 [::]:0 LISTENING
TCP [::]:50070 [::]:0 LISTENING
C:\Windows\system32>netstat -a -n | grep -e "900[01]"
TCP 127.0.0.1:9000 0.0.0.0:0 LISTENING
TCP 127.0.0.1:9000 127.0.0.1:60332 ESTABLISHED
TCP 127.0.0.1:9000 127.0.0.1:60987 ESTABLISHED
TCP 127.0.0.1:9001 0.0.0.0:0 LISTENING
TCP 127.0.0.1:9001 127.0.0.1:60410 ESTABLISHED
TCP 127.0.0.1:60332 127.0.0.1:9000 ESTABLISHED
TCP 127.0.0.1:60410 127.0.0.1:9001 ESTABLISHED
TCP 127.0.0.1:60987 127.0.0.1:9000 ESTABLISHED
您能帮助解决安装问题和让 reduce 任务工作吗?
我看了
http://wiki.apache.org/hadoop/SocketTimeout和其他一些链接并尝试了这些建议,但没有任何成功。
感谢您耐心阅读这篇文章,并很乐意提供更多详细信息。
提前致谢。
【问题讨论】: