起因
因为工作需要用到,所以需要学习hadoop,所以记录这篇文章,主要分享自己快速搭建hadoop环境与运行一个demo
搭建环境
网上搭建hadoop环境的例子我看蛮多的.但是我看都比较复杂,要求安装java,hadoop,然后各种设置..很多参数变量都不明白是啥意思...我的目标很简单,首先应该是用最简单的方法搭建好一个环境.各种变量呀参数呀这些我觉得一开始对我都不太重要..我只要能跑起来1个自己的简单demo就行.而且现实中基本上环境也不会让我来维护..所以对我来说简单就行.
刚好最近我一直在看docker..所以我就打算用docker来搭建这个环境.算是同时学习hadoop和docker吧.
首先安装docker....很简单...这里就不介绍了.官方有一键安装脚本...
docker hub中有1个官方的hadoop的例子.
https://hub.docker.com/r/sequenceiq/hadoop-docker/
我稍微修改了一下命令:
额外挂载了1个目录,因为我要上传我自己写的demo jar到docker里去用hadoop运行.
另外把这个container取名字为hadoop2,因为我跑了很多容器,取名字便于区分,而且后面可能要用多个hadoop容器来制作集群.
docker run -it -v /dockerVolumes/hadoop2:/dockerVolume --name hadoop2 sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash
运行好这个命令,这个容器就已经运行起来了.我们可以跑一下官方的example.
cd $HADOOP_PREFIX # run the mapreduce bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+' # check the output bin/hdfs dfs -cat output/*
输出内容:
bash-4.1# clear bash-4.1# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+' 18/06/11 07:35:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 18/06/11 07:35:39 INFO input.FileInputFormat: Total input paths to process : 31 18/06/11 07:35:39 INFO mapreduce.JobSubmitter: number of splits:31 18/06/11 07:35:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528635021541_0007 18/06/11 07:35:40 INFO impl.YarnClientImpl: Submitted application application_1528635021541_0007 18/06/11 07:35:40 INFO mapreduce.Job: The url to track the job: http://e1bed6899d06:8088/proxy/application_1528635021541_0007/ 18/06/11 07:35:40 INFO mapreduce.Job: Running job: job_1528635021541_0007 18/06/11 07:35:45 INFO mapreduce.Job: Job job_1528635021541_0007 running in uber mode : false 18/06/11 07:35:45 INFO mapreduce.Job: map 0% reduce 0% 18/06/11 07:36:02 INFO mapreduce.Job: map 10% reduce 0% 18/06/11 07:36:03 INFO mapreduce.Job: map 19% reduce 0% 18/06/11 07:36:19 INFO mapreduce.Job: map 35% reduce 0% 18/06/11 07:36:20 INFO mapreduce.Job: map 39% reduce 0% 18/06/11 07:36:33 INFO mapreduce.Job: map 42% reduce 0% 18/06/11 07:36:35 INFO mapreduce.Job: map 55% reduce 0% 18/06/11 07:36:36 INFO mapreduce.Job: map 55% reduce 15% 18/06/11 07:36:39 INFO mapreduce.Job: map 55% reduce 18% 18/06/11 07:36:45 INFO mapreduce.Job: map 58% reduce 18% 18/06/11 07:36:46 INFO mapreduce.Job: map 61% reduce 18% 18/06/11 07:36:47 INFO mapreduce.Job: map 65% reduce 18% 18/06/11 07:36:48 INFO mapreduce.Job: map 65% reduce 22% 18/06/11 07:36:49 INFO mapreduce.Job: map 71% reduce 22% 18/06/11 07:36:51 INFO mapreduce.Job: map 71% reduce 24% 18/06/11 07:36:57 INFO mapreduce.Job: map 74% reduce 24% 18/06/11 07:36:59 INFO mapreduce.Job: map 77% reduce 24% 18/06/11 07:37:00 INFO mapreduce.Job: map 77% reduce 26% 18/06/11 07:37:01 INFO mapreduce.Job: map 84% reduce 26% 18/06/11 07:37:03 INFO mapreduce.Job: map 87% reduce 28% 18/06/11 07:37:06 INFO mapreduce.Job: map 87% reduce 29% 18/06/11 07:37:08 INFO mapreduce.Job: map 90% reduce 29% 18/06/11 07:37:09 INFO mapreduce.Job: map 94% reduce 29% 18/06/11 07:37:11 INFO mapreduce.Job: map 100% reduce 29% 18/06/11 07:37:12 INFO mapreduce.Job: map 100% reduce 100% 18/06/11 07:37:12 INFO mapreduce.Job: Job job_1528635021541_0007 completed successfully 18/06/11 07:37:12 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=345 FILE: Number of bytes written=3697476 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=80529 HDFS: Number of bytes written=437 HDFS: Number of read operations=96 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=31 Launched reduce tasks=1 Data-local map tasks=31 Total time spent by all maps in occupied slots (ms)=400881 Total time spent by all reduces in occupied slots (ms)=52340 Total time spent by all map tasks (ms)=400881 Total time spent by all reduce tasks (ms)=52340 Total vcore-seconds taken by all map tasks=400881 Total vcore-seconds taken by all reduce tasks=52340 Total megabyte-seconds taken by all map tasks=410502144 Total megabyte-seconds taken by all reduce tasks=53596160 Map-Reduce Framework Map input records=2060 Map output records=24 Map output bytes=590 Map output materialized bytes=525 Input split bytes=3812 Combine input records=24 Combine output records=13 Reduce input groups=11 Reduce shuffle bytes=525 Reduce input records=13 Reduce output records=11 Spilled Records=26 Shuffled Maps =31 Failed Shuffles=0 Merged Map outputs=31 GC time elapsed (ms)=2299 CPU time spent (ms)=11090 Physical memory (bytes) snapshot=8178929664 Virtual memory (bytes) snapshot=21830377472 Total committed heap usage (bytes)=6461849600 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=76717 File Output Format Counters Bytes Written=437 18/06/11 07:37:12 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 18/06/11 07:37:12 INFO input.FileInputFormat: Total input paths to process : 1 18/06/11 07:37:12 INFO mapreduce.JobSubmitter: number of splits:1 18/06/11 07:37:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528635021541_0008 18/06/11 07:37:12 INFO impl.YarnClientImpl: Submitted application application_1528635021541_0008 18/06/11 07:37:12 INFO mapreduce.Job: The url to track the job: http://e1bed6899d06:8088/proxy/application_1528635021541_0008/ 18/06/11 07:37:12 INFO mapreduce.Job: Running job: job_1528635021541_0008 18/06/11 07:37:24 INFO mapreduce.Job: Job job_1528635021541_0008 running in uber mode : false 18/06/11 07:37:24 INFO mapreduce.Job: map 0% reduce 0% 18/06/11 07:37:29 INFO mapreduce.Job: map 100% reduce 0% 18/06/11 07:37:35 INFO mapreduce.Job: map 100% reduce 100% 18/06/11 07:37:35 INFO mapreduce.Job: Job job_1528635021541_0008 completed successfully 18/06/11 07:37:35 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=291 FILE: Number of bytes written=230541 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=569 HDFS: Number of bytes written=197 HDFS: Number of read operations=7 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=3210 Total time spent by all reduces in occupied slots (ms)=3248 Total time spent by all map tasks (ms)=3210 Total time spent by all reduce tasks (ms)=3248 Total vcore-seconds taken by all map tasks=3210 Total vcore-seconds taken by all reduce tasks=3248 Total megabyte-seconds taken by all map tasks=3287040 Total megabyte-seconds taken by all reduce tasks=3325952 Map-Reduce Framework Map input records=11 Map output records=11 Map output bytes=263 Map output materialized bytes=291 Input split bytes=132 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=291 Reduce input records=11 Reduce output records=11 Spilled Records=22 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=55 CPU time spent (ms)=1090 Physical memory (bytes) snapshot=415494144 Virtual memory (bytes) snapshot=1373601792 Total committed heap usage (bytes)=354942976 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=437 File Output Format Counters Bytes Written=197