hadoop容器被杀死但工作成功答案

【问题标题】：hadoop container killed but job succeedhadoop容器被杀死但工作成功
【发布时间】：2016-11-07 14:09:26
【问题描述】：

我正在尝试在 hadoop 上执行 map reduce 程序。当我在 Macbook 上提交 jar 并在桌面上运行作业时，作业因容器超出虚拟内存限制而失败。但是 http://master-hadoop:8088/cluster 告诉我，我的工作成功了，结果似乎是正确的。

您可以看到使用的物理内存大小为 170MB，而使用的虚拟内存大小为 17.8GB。而且输入的文件只有10MB。

我想不通的是为什么程序使用了这么多虚拟内存，为什么 hadoop 说我的工作成功了，所以这可能是容器被杀死后的结果。

16/11/07 21:31:40 INFO Join: 20161107213140620
16/11/07 21:31:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/07 21:31:42 INFO client.RMProxy: Connecting to ResourceManager at master-hadoop/192.168.199.162:8032
16/11/07 21:31:43 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/11/07 21:31:44 INFO input.FileInputFormat: Total input paths to process : 2
16/11/07 21:31:44 INFO mapreduce.JobSubmitter: number of splits:2
16/11/07 21:31:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1478524274348_0001
16/11/07 21:31:46 INFO impl.YarnClientImpl: Submitted application application_1478524274348_0001
16/11/07 21:31:46 INFO mapreduce.Job: The url to track the job: http://master-hadoop:8088/proxy/application_1478524274348_0001/
16/11/07 21:31:46 INFO mapreduce.Job: Running job: job_1478524274348_0001
16/11/07 21:31:55 INFO mapreduce.Job: Job job_1478524274348_0001 running in uber mode : false
16/11/07 21:31:55 INFO mapreduce.Job:  map 0% reduce 0%
16/11/07 21:32:04 INFO mapreduce.Job:  map 100% reduce 0%
16/11/07 21:32:11 INFO mapreduce.Job:  map 100% reduce 100%
16/11/07 21:32:12 INFO mapreduce.Job: Job job_1478524274348_0001 completed successfully
16/11/07 21:32:12 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=1974092
        FILE: Number of bytes written=4301313
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=20971727
        HDFS: Number of bytes written=23746
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=13291
        Total time spent by all reduces in occupied slots (ms)=3985
        Total time spent by all map tasks (ms)=13291
        Total time spent by all reduce tasks (ms)=3985
        Total vcore-milliseconds taken by all map tasks=13291
        Total vcore-milliseconds taken by all reduce tasks=3985
        Total megabyte-milliseconds taken by all map tasks=13609984
        Total megabyte-milliseconds taken by all reduce tasks=4080640
    Map-Reduce Framework
        Map input records=162852
        Map output records=162852
        Map output bytes=1648382
        Map output materialized bytes=1974098
        Input split bytes=207
        Combine input records=0
        Combine output records=0
        Reduce input groups=105348
        Reduce shuffle bytes=1974098
        Reduce input records=162852
        Reduce output records=4423
        Spilled Records=325704
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=364
        CPU time spent (ms)=6300
        Physical memory (bytes) snapshot=705949696
        Virtual memory (bytes) snapshot=5738041344
        Total committed heap usage (bytes)=492830720
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=20971520
    File Output Format Counters 
        Bytes Written=23746
16/11/07 21:32:12 INFO client.RMProxy: Connecting to ResourceManager at master-hadoop/192.168.199.162:8032
16/11/07 21:32:12 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/11/07 21:32:12 INFO input.FileInputFormat: Total input paths to process : 2
16/11/07 21:32:12 INFO mapreduce.JobSubmitter: number of splits:2
16/11/07 21:32:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1478524274348_0002
16/11/07 21:32:13 INFO impl.YarnClientImpl: Submitted application application_1478524274348_0002
16/11/07 21:32:13 INFO mapreduce.Job: The url to track the job: http://master-hadoop:8088/proxy/application_1478524274348_0002/
16/11/07 21:32:13 INFO mapreduce.Job: Running job: job_1478524274348_0002
16/11/07 21:32:24 INFO mapreduce.Job: Job job_1478524274348_0002 running in uber mode : false
16/11/07 21:32:24 INFO mapreduce.Job:  map 0% reduce 0%
16/11/07 21:32:32 INFO mapreduce.Job:  map 100% reduce 0%
16/11/07 21:32:38 INFO mapreduce.Job: Task Id : attempt_1478524274348_0002_r_000000_0, Status : FAILED
Container [pid=4170,containerID=container_1478524274348_0002_01_000004] is running beyond virtual memory limits. Current usage: 170.0 MB of 1 GB physical memory used; 17.8 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1478524274348_0002_01_000004 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 4174 4170 4170 4170 (java) 407 30 19121176576 42828 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m -Djava.io.tmpdir=/usr/local/hadoop/tmp/nm-local-dir/usercache/lining/appcache/application_1478524274348_0002/container_1478524274348_0002_01_000004/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1478524274348_0002/container_1478524274348_0002_01_000004 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dyarn.app.mapreduce.shuffle.logger=INFO,shuffleCLA -Dyarn.app.mapreduce.shuffle.logfile=syslog.shuffle -Dyarn.app.mapreduce.shuffle.log.filesize=0 -Dyarn.app.mapreduce.shuffle.log.backups=0 org.apache.hadoop.mapred.YarnChild 127.0.1.1 33077 attempt_1478524274348_0002_r_000000_0 4 
    |- 4170 4168 4170 4170 (bash) 0 0 17051648 700 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Xmx200m -Djava.io.tmpdir=/usr/local/hadoop/tmp/nm-local-dir/usercache/lining/appcache/application_1478524274348_0002/container_1478524274348_0002_01_000004/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/usr/local/hadoop/logs/userlogs/application_1478524274348_0002/container_1478524274348_0002_01_000004 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Dyarn.app.mapreduce.shuffle.logger=INFO,shuffleCLA -Dyarn.app.mapreduce.shuffle.logfile=syslog.shuffle -Dyarn.app.mapreduce.shuffle.log.filesize=0 -Dyarn.app.mapreduce.shuffle.log.backups=0 org.apache.hadoop.mapred.YarnChild 127.0.1.1 33077 attempt_1478524274348_0002_r_000000_0 4 1>/usr/local/hadoop/logs/userlogs/application_1478524274348_0002/container_1478524274348_0002_01_000004/stdout 2>/usr/local/hadoop/logs/userlogs/application_1478524274348_0002/container_1478524274348_0002_01_000004/stderr  

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

16/11/07 21:32:47 INFO mapreduce.Job:  map 100% reduce 100%
16/11/07 21:32:48 INFO mapreduce.Job: Job job_1478524274348_0002 completed successfully
16/11/07 21:32:48 INFO mapreduce.Job: Counters: 50
    File System Counters
        FILE: Number of bytes read=3373558
        FILE: Number of bytes written=7100224
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=21019219
        HDFS: Number of bytes written=307797
        HDFS: Number of read operations=15
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Failed reduce tasks=1
        Launched map tasks=2
        Launched reduce tasks=2
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=12513
        Total time spent by all reduces in occupied slots (ms)=7584
        Total time spent by all map tasks (ms)=12513
        Total time spent by all reduce tasks (ms)=7584
        Total vcore-milliseconds taken by all map tasks=12513
        Total vcore-milliseconds taken by all reduce tasks=7584
        Total megabyte-milliseconds taken by all map tasks=12813312
        Total megabyte-milliseconds taken by all reduce tasks=7766016
    Map-Reduce Framework
        Map input records=162852
        Map output records=22115
        Map output bytes=3315932
        Map output materialized bytes=3373564
        Input split bytes=207
        Combine input records=0
        Combine output records=0
        Reduce input groups=177
        Reduce shuffle bytes=3373564
        Reduce input records=22115
        Reduce output records=17692
        Spilled Records=44230
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=381
        CPU time spent (ms)=5320
        Physical memory (bytes) snapshot=727543808
        Virtual memory (bytes) snapshot=22958596096
        Total committed heap usage (bytes)=493355008
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=20971520
    File Output Format Counters 
        Bytes Written=307797
16/11/07 21:32:48 INFO Join: 20161107213248192

【问题讨论】：

我从未在配置中设置任何与 vmem 和 pmem 相关的内容

标签： hadoop containers virtual-memory

【解决方案1】：

您的一项 reduce 任务的第一次尝试失败，但很可能已重新安排，然后成功完成，这就是您的整个作业报告成功的原因。

【讨论】：