起因

因为工作需要用到,所以需要学习hadoop,所以记录这篇文章,主要分享自己快速搭建hadoop环境与运行一个demo

 

搭建环境

网上搭建hadoop环境的例子我看蛮多的.但是我看都比较复杂,要求安装java,hadoop,然后各种设置..很多参数变量都不明白是啥意思...我的目标很简单,首先应该是用最简单的方法搭建好一个环境.各种变量呀参数呀这些我觉得一开始对我都不太重要..我只要能跑起来1个自己的简单demo就行.而且现实中基本上环境也不会让我来维护..所以对我来说简单就行.

刚好最近我一直在看docker..所以我就打算用docker来搭建这个环境.算是同时学习hadoop和docker吧.

首先安装docker....很简单...这里就不介绍了.官方有一键安装脚本...

docker hub中有1个官方的hadoop的例子.

https://hub.docker.com/r/sequenceiq/hadoop-docker/

我稍微修改了一下命令:

额外挂载了1个目录,因为我要上传我自己写的demo jar到docker里去用hadoop运行.

另外把这个container取名字为hadoop2,因为我跑了很多容器,取名字便于区分,而且后面可能要用多个hadoop容器来制作集群.

docker run -it -v /dockerVolumes/hadoop2:/dockerVolume --name hadoop2  sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash

运行好这个命令,这个容器就已经运行起来了.我们可以跑一下官方的example.

 

cd $HADOOP_PREFIX
# run the mapreduce
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'

# check the output
bin/hdfs dfs -cat output/*

输出内容:

bash-4.1# clear
bash-4.1# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
18/06/11 07:35:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/11 07:35:39 INFO input.FileInputFormat: Total input paths to process : 31
18/06/11 07:35:39 INFO mapreduce.JobSubmitter: number of splits:31
18/06/11 07:35:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528635021541_0007
18/06/11 07:35:40 INFO impl.YarnClientImpl: Submitted application application_1528635021541_0007
18/06/11 07:35:40 INFO mapreduce.Job: The url to track the job: http://e1bed6899d06:8088/proxy/application_1528635021541_0007/
18/06/11 07:35:40 INFO mapreduce.Job: Running job: job_1528635021541_0007
18/06/11 07:35:45 INFO mapreduce.Job: Job job_1528635021541_0007 running in uber mode : false
18/06/11 07:35:45 INFO mapreduce.Job:  map 0% reduce 0%
18/06/11 07:36:02 INFO mapreduce.Job:  map 10% reduce 0%
18/06/11 07:36:03 INFO mapreduce.Job:  map 19% reduce 0%
18/06/11 07:36:19 INFO mapreduce.Job:  map 35% reduce 0%
18/06/11 07:36:20 INFO mapreduce.Job:  map 39% reduce 0%
18/06/11 07:36:33 INFO mapreduce.Job:  map 42% reduce 0%
18/06/11 07:36:35 INFO mapreduce.Job:  map 55% reduce 0%
18/06/11 07:36:36 INFO mapreduce.Job:  map 55% reduce 15%
18/06/11 07:36:39 INFO mapreduce.Job:  map 55% reduce 18%
18/06/11 07:36:45 INFO mapreduce.Job:  map 58% reduce 18%
18/06/11 07:36:46 INFO mapreduce.Job:  map 61% reduce 18%
18/06/11 07:36:47 INFO mapreduce.Job:  map 65% reduce 18%
18/06/11 07:36:48 INFO mapreduce.Job:  map 65% reduce 22%
18/06/11 07:36:49 INFO mapreduce.Job:  map 71% reduce 22%
18/06/11 07:36:51 INFO mapreduce.Job:  map 71% reduce 24%
18/06/11 07:36:57 INFO mapreduce.Job:  map 74% reduce 24%
18/06/11 07:36:59 INFO mapreduce.Job:  map 77% reduce 24%
18/06/11 07:37:00 INFO mapreduce.Job:  map 77% reduce 26%
18/06/11 07:37:01 INFO mapreduce.Job:  map 84% reduce 26%
18/06/11 07:37:03 INFO mapreduce.Job:  map 87% reduce 28%
18/06/11 07:37:06 INFO mapreduce.Job:  map 87% reduce 29%
18/06/11 07:37:08 INFO mapreduce.Job:  map 90% reduce 29%
18/06/11 07:37:09 INFO mapreduce.Job:  map 94% reduce 29%
18/06/11 07:37:11 INFO mapreduce.Job:  map 100% reduce 29%
18/06/11 07:37:12 INFO mapreduce.Job:  map 100% reduce 100%
18/06/11 07:37:12 INFO mapreduce.Job: Job job_1528635021541_0007 completed successfully
18/06/11 07:37:12 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=345
		FILE: Number of bytes written=3697476
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=80529
		HDFS: Number of bytes written=437
		HDFS: Number of read operations=96
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters
		Launched map tasks=31
		Launched reduce tasks=1
		Data-local map tasks=31
		Total time spent by all maps in occupied slots (ms)=400881
		Total time spent by all reduces in occupied slots (ms)=52340
		Total time spent by all map tasks (ms)=400881
		Total time spent by all reduce tasks (ms)=52340
		Total vcore-seconds taken by all map tasks=400881
		Total vcore-seconds taken by all reduce tasks=52340
		Total megabyte-seconds taken by all map tasks=410502144
		Total megabyte-seconds taken by all reduce tasks=53596160
	Map-Reduce Framework
		Map input records=2060
		Map output records=24
		Map output bytes=590
		Map output materialized bytes=525
		Input split bytes=3812
		Combine input records=24
		Combine output records=13
		Reduce input groups=11
		Reduce shuffle bytes=525
		Reduce input records=13
		Reduce output records=11
		Spilled Records=26
		Shuffled Maps =31
		Failed Shuffles=0
		Merged Map outputs=31
		GC time elapsed (ms)=2299
		CPU time spent (ms)=11090
		Physical memory (bytes) snapshot=8178929664
		Virtual memory (bytes) snapshot=21830377472
		Total committed heap usage (bytes)=6461849600
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=76717
	File Output Format Counters
		Bytes Written=437
18/06/11 07:37:12 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/06/11 07:37:12 INFO input.FileInputFormat: Total input paths to process : 1
18/06/11 07:37:12 INFO mapreduce.JobSubmitter: number of splits:1
18/06/11 07:37:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528635021541_0008
18/06/11 07:37:12 INFO impl.YarnClientImpl: Submitted application application_1528635021541_0008
18/06/11 07:37:12 INFO mapreduce.Job: The url to track the job: http://e1bed6899d06:8088/proxy/application_1528635021541_0008/
18/06/11 07:37:12 INFO mapreduce.Job: Running job: job_1528635021541_0008
18/06/11 07:37:24 INFO mapreduce.Job: Job job_1528635021541_0008 running in uber mode : false
18/06/11 07:37:24 INFO mapreduce.Job:  map 0% reduce 0%
18/06/11 07:37:29 INFO mapreduce.Job:  map 100% reduce 0%
18/06/11 07:37:35 INFO mapreduce.Job:  map 100% reduce 100%
18/06/11 07:37:35 INFO mapreduce.Job: Job job_1528635021541_0008 completed successfully
18/06/11 07:37:35 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=291
		FILE: Number of bytes written=230541
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=569
		HDFS: Number of bytes written=197
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3210
		Total time spent by all reduces in occupied slots (ms)=3248
		Total time spent by all map tasks (ms)=3210
		Total time spent by all reduce tasks (ms)=3248
		Total vcore-seconds taken by all map tasks=3210
		Total vcore-seconds taken by all reduce tasks=3248
		Total megabyte-seconds taken by all map tasks=3287040
		Total megabyte-seconds taken by all reduce tasks=3325952
	Map-Reduce Framework
		Map input records=11
		Map output records=11
		Map output bytes=263
		Map output materialized bytes=291
		Input split bytes=132
		Combine input records=0
		Combine output records=0
		Reduce input groups=5
		Reduce shuffle bytes=291
		Reduce input records=11
		Reduce output records=11
		Spilled Records=22
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=55
		CPU time spent (ms)=1090
		Physical memory (bytes) snapshot=415494144
		Virtual memory (bytes) snapshot=1373601792
		Total committed heap usage (bytes)=354942976
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=437
	File Output Format Counters
		Bytes Written=197
View Code

相关文章: