由于内存错误，Hadoop Map Job 卡住了答案

【问题标题】：Hadoop Map Job gets stuck due to memory error由于内存错误，Hadoop Map Job 卡住了
【发布时间】：2015-09-08 19:19:49
【问题描述】：

我正在运行 Hive 脚本来对包含 5452689 行和 7GB 数据大小的表进行一些操作。但是，我的 map reduce 卡在 70% 左右，并给出 no space left 错误

错误如下：

Hadoop job information for Stage-4: number of mappers: 27; number of reducers: 1
2015-06-23 01:47:43,748 Stage-4 map = 0%,  reduce = 0%
2015-06-23 01:48:04,550 Stage-4 map = 1%,  reduce = 0%, Cumulative CPU 258.06 sec
2015-06-23 01:48:06,661 Stage-4 map = 2%,  reduce = 0%, Cumulative CPU 331.98 sec
2015-06-23 01:48:07,796 Stage-4 map = 3%,  reduce = 0%, Cumulative CPU 370.35 sec
2015-06-23 01:48:09,931 Stage-4 map = 4%,  reduce = 0%, Cumulative CPU 406.65 sec
2015-06-23 01:48:28,778 Stage-4 map = 7%,  reduce = 0%, Cumulative CPU 973.33 sec
2015-06-23 01:48:30,987 Stage-4 map = 8%,  reduce = 0%, Cumulative CPU 1034.17 sec
2015-06-23 01:48:34,251 Stage-4 map = 11%,  reduce = 0%, Cumulative CPU 1121.22 sec
2015-06-23 01:48:35,419 Stage-4 map = 13%,  reduce = 0%, Cumulative CPU 1173.4 sec
2015-06-23 01:48:36,458 Stage-4 map = 18%,  reduce = 0%, Cumulative CPU 1191.29 sec
2015-06-23 01:48:37,499 Stage-4 map = 22%,  reduce = 0%, Cumulative CPU 1215.44 sec
2015-06-23 01:48:38,607 Stage-4 map = 29%,  reduce = 0%, Cumulative CPU 1267.07 sec
2015-06-23 01:48:39,671 Stage-4 map = 32%,  reduce = 0%, Cumulative CPU 1289.57 sec
2015-06-23 01:48:40,883 Stage-4 map = 34%,  reduce = 0%, Cumulative CPU 1309.96 sec
2015-06-23 01:48:41,922 Stage-4 map = 36%,  reduce = 0%, Cumulative CPU 1366.31 sec
2015-06-23 01:48:48,693 Stage-4 map = 39%,  reduce = 0%, Cumulative CPU 1554.9 sec
2015-06-23 01:48:54,121 Stage-4 map = 40%,  reduce = 0%, Cumulative CPU 1709.04 sec
2015-06-23 01:49:00,973 Stage-4 map = 43%,  reduce = 0%, Cumulative CPU 1895.86 sec
2015-06-23 01:49:03,099 Stage-4 map = 46%,  reduce = 0%, Cumulative CPU 1976.89 sec
2015-06-23 01:49:05,180 Stage-4 map = 49%,  reduce = 0%, Cumulative CPU 2003.08 sec
2015-06-23 01:49:06,225 Stage-4 map = 58%,  reduce = 0%, Cumulative CPU 2062.33 sec
2015-06-23 01:49:07,353 Stage-4 map = 60%,  reduce = 0%, Cumulative CPU 2067.9 sec
2015-06-23 01:49:08,388 Stage-4 map = 66%,  reduce = 0%, Cumulative CPU 2087.55 sec
2015-06-23 01:49:09,551 Stage-4 map = 74%,  reduce = 2%, Cumulative CPU 2112.96 sec
2015-06-23 01:49:10,607 Stage-4 map = 75%,  reduce = 2%, Cumulative CPU 2118.14 sec
2015-06-23 01:49:11,669 Stage-4 map = 19%,  reduce = 0%, Cumulative CPU 433.75 sec
2015-06-23 01:49:12,699 Stage-4 map = 16%,  reduce = 0%, Cumulative CPU 350.93 sec
2015-06-23 01:49:14,760 Stage-4 map = 15%,  reduce = 0%, Cumulative CPU 263.95 sec
2015-06-23 01:49:26,177 Stage-4 map = 16%,  reduce = 0%, Cumulative CPU 341.29 sec
2015-06-23 01:49:31,365 Stage-4 map = 15%,  reduce = 0%, Cumulative CPU 334.86 sec
2015-06-23 01:49:39,713 Stage-4 map = 23%,  reduce = 0%, Cumulative CPU 300.53 sec
2015-06-23 01:49:40,758 Stage-4 map = 15%,  reduce = 0%, Cumulative CPU 300.53 sec
2015-06-23 01:49:43,868 Stage-4 map = 100%,  reduce = 100%, Cumulative CPU 263.95 sec
MapReduce Total cumulative CPU time: 4 minutes 23 seconds 950 msec
Ended Job = job_1434953415026_0374 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1434953415026_0374_m_000025 (and more) from job job_1434953415026_0374
Examining task ID: task_1434953415026_0374_m_000004 (and more) from job job_1434953415026_0374
Examining task ID: task_1434953415026_0374_m_000019 (and more) from job job_1434953415026_0374
Examining task ID: task_1434953415026_0374_m_000003 (and more) from job job_1434953415026_0374
Examining task ID: task_1434953415026_0374_m_000022 (and more) from job job_1434953415026_0374
Examining task ID: task_1434953415026_0374_m_000002 (and more) from job job_1434953415026_0374
Examining task ID: task_1434953415026_0374_m_000005 (and more) from job job_1434953415026_0374

Task with the most failures(4): 
-----
Task ID:
  task_1434953415026_0374_m_000000

URL:
  http://pfaquaap1u:8088/taskdetails.jsp?jobid=job_1434953415026_0374&tipid=task_1434953415026_0374_m_000000
-----
Diagnostic Messages for this Task:
FSError: java.io.IOException: No space left on device

但是，运行df 表明我在本地系统上有超过 80 GB 的可用空间。 df -i 还表明没有一个 inode 被过度使用。如果我使用较小的数据输入，脚本运行良好。谁能告诉我我应该在这里做什么？提前致谢。

【问题讨论】：

你能检查任务跟踪日志吗。你在映射器代码中打印任何东西吗..？
检查您的查询..确保您没有得到它的笛卡尔积并填满您的磁盘..还可以使用“hadoop fs -df -h”检查磁盘上剩余的空间

标签： hadoop mapreduce hive ioexception

【解决方案1】：

我认为df 不会向您显示 hdfs 磁盘使用情况。而是尝试hadoop fs -df -h（请参阅此问题：HDFS free space available command）

我认为您可能只需要向集群中添加更多节点！

【讨论】：

真的很低，map-reduce 作业可能需要相当大的临时存储空间。你的集群的规格是什么？在 AWS 上，一个 m1.xlarge（现货价 0.05 美元/小时）有 4x420 GB HD，所以你真的应该考虑升级。