【发布时间】:2023-03-03 10:28:01
【问题描述】:
我在 1MB 数据上运行了 Hadoop-Mapreduce 作业 wordcount 程序。我对以下信息有一些疑问:
- 什么是计数器?
为什么maptasks是两个,因为我知道map的数量是由输入分割的#决定的,输入分割的最小大小是64MB。所以逻辑上应该只有一个 Map 任务!?
reducer 输出数据的大小是多少?
消耗CPU时间,哪个CPU导致每个tasktracker都有自己的CPU和内存?
非常感谢!
[user1@li417-43 ~]$ hadoop jar wordcount1.jar wordcount1.WordCount -D mapred.reduce.tasks=10 wordin wordout10-1m
14/12/16 19:55:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/12/16 19:55:46 INFO mapred.FileInputFormat: Total input paths to process : 1
14/12/16 19:55:46 INFO mapred.JobClient: Running job: job_201405031326_0032
14/12/16 19:55:47 INFO mapred.JobClient: map 0% reduce 0%
14/12/16 19:55:59 INFO mapred.JobClient: map 100% reduce 0%
14/12/16 19:56:04 INFO mapred.JobClient: map 100% reduce 40%
14/12/16 19:56:09 INFO mapred.JobClient: map 100% reduce 80%
14/12/16 19:56:14 INFO mapred.JobClient: map 100% reduce 100%
14/12/16 19:56:15 INFO mapred.JobClient: Job complete: job_201405031326_0032
14/12/16 19:56:15 INFO mapred.JobClient: Counters: 34
14/12/16 19:56:15 INFO mapred.JobClient: File System Counters
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of bytes read=2008100
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of bytes written=5988058
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of read operations=0
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of large read operations=0
14/12/16 19:56:15 INFO mapred.JobClient: FILE: Number of write operations=0
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of bytes read=1005254
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of bytes written=140119
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of read operations=14
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/12/16 19:56:15 INFO mapred.JobClient: HDFS: Number of write operations=20
14/12/16 19:56:15 INFO mapred.JobClient: Job Counters
14/12/16 19:56:15 INFO mapred.JobClient: Launched map tasks=2
14/12/16 19:56:15 INFO mapred.JobClient: Launched reduce tasks=10
14/12/16 19:56:15 INFO mapred.JobClient: Data-local map tasks=1
14/12/16 19:56:15 INFO mapred.JobClient: Rack-local map tasks=1
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=12953
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=49609
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/12/16 19:56:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/12/16 19:56:15 INFO mapred.JobClient: Map-Reduce Framework
14/12/16 19:56:15 INFO mapred.JobClient: Map input records=35293
14/12/16 19:56:15 INFO mapred.JobClient: Map output records=181014
14/12/16 19:56:15 INFO mapred.JobClient: Map output bytes=1646012
14/12/16 19:56:15 INFO mapred.JobClient: Input split bytes=206
14/12/16 19:56:15 INFO mapred.JobClient: Combine input records=0
14/12/16 19:56:15 INFO mapred.JobClient: Combine output records=0
14/12/16 19:56:15 INFO mapred.JobClient: Reduce input groups=14276
14/12/16 19:56:15 INFO mapred.JobClient: Reduce shuffle bytes=2008160
14/12/16 19:56:15 INFO mapred.JobClient: Reduce input records=181014
14/12/16 19:56:15 INFO mapred.JobClient: Reduce output records=14276
14/12/16 19:56:15 INFO mapred.JobClient: Spilled Records=362028
14/12/16 19:56:15 INFO mapred.JobClient: CPU time spent (ms)=26020
14/12/16 19:56:15 INFO mapred.JobClient: Physical memory (bytes) snapshot=1427562496
14/12/16 19:56:15 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8291246080
14/12/16 19:56:15 INFO mapred.JobClient: Total committed heap usage (bytes)=477896704
14/12/16 19:56:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/12/16 19:56:15 INFO mapred.JobClient: BYTES_READ=1002479
【问题讨论】:
标签: performance hadoop mapreduce jobs