概述:本文介绍YARN环境搭建过程,并实现塞缪尔·厄尔曼《青春》的词频统计
1、修改mapred-site.xml
cd app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2、修改yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3、启动yarn
cd /root/app/hadoop-2.6.0-cdh5.7.0/sbin
./start-yarn.sh
#查看进程 ResourceManager NodeManager
jps
#验证 本地浏览器访问
http://hadoop:8088/cluster
#停止yarn
./stop-yarn.sh
4、提交作用到yarn上执行(词频统计)
#上传作业到HDFS中
hadoop fs -mkdir -p /input/wc
hadoop fs -put Youth.txt /input/wc
#在bin路径下执行
hadoop jar /root/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar wordcount /input/wc/Youth.txt /output/wc
#任务完成后 再重复执行会报错,需要删除输出路径
hadoop fs -rm -r /output
部分统计结果: