Hadoop生态圈-Oozie部署实战
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
一.Oozie简介
1>.什么是Oozie
Oozie英文翻译为:驯象人。一个基于工作流引擎的开源框架,由Cloudera公司贡献给Apache,提供对Hadoop Mapreduce、Pig Jobs的任务调度与协调。Oozie需要部署到Java Servlet容器中运行。主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。
2>.Oozie的功能模块介绍
1>.Workflow 顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)。 2>.Coordinator 定时触发workflow,它类似与一个定时器。 3>.Bundle Job 绑定多个Coordinator,它是一个绑定任务的容器。
3>.Oozie的常用节点
1>.控制流节点(Control Flow Nodes) 控制流节点一般都是定义在工作流开始或者结束的位置,比如start,end,kill等。以及提供工作流的执行路径机制,如decision,fork,join等。 2>.动作节点(Action Nodes) 负责执行具体动作的节点,比如:拷贝文件,执行某个Shell脚本等等。
二.部署Hadoop测试环境
1>.下载hadoop版本
我把测试的版本放在百度云了,详细链接请参考:链接:https://pan.baidu.com/s/1w5G5ReKdJgDJe6931bA8Lw 密码:nal3
2>.解压CDH版本的hadoop
[yinzhengjie@s101 cdh]$ pwd /home/yinzhengjie/download/cdh [yinzhengjie@s101 cdh]$ [yinzhengjie@s101 cdh]$ ll total 1298112 -rw-r--r-- 1 yinzhengjie yinzhengjie 3759787 Sep 26 2016 cdh5.3.6-snappy-lib-natirve.tar.gz -rw-r--r-- 1 yinzhengjie yinzhengjie 293471952 Sep 19 2016 hadoop-2.5.0-cdh5.3.6.tar.gz -rw-r--r-- 1 yinzhengjie yinzhengjie 1032028646 Sep 19 2016 oozie-4.0.0-cdh5.3.6.tar.gz [yinzhengjie@s101 cdh]$ [yinzhengjie@s101 cdh]$ ll total 1298116 -rw-r--r-- 1 yinzhengjie yinzhengjie 3759787 Sep 26 2016 cdh5.3.6-snappy-lib-natirve.tar.gz drwxr-xr-x 14 yinzhengjie yinzhengjie 4096 Jul 28 2015 hadoop-2.5.0-cdh5.3.6 -rw-r--r-- 1 yinzhengjie yinzhengjie 293471952 Sep 19 2016 hadoop-2.5.0-cdh5.3.6.tar.gz -rw-r--r-- 1 yinzhengjie yinzhengjie 1032028646 Sep 19 2016 oozie-4.0.0-cdh5.3.6.tar.gz [yinzhengjie@s101 cdh]$ [yinzhengjie@s101 cdh]$ cd hadoop-2.5.0-cdh5.3.6/lib/native/ [yinzhengjie@s101 native]$ [yinzhengjie@s101 native]$ ll total 0 [yinzhengjie@s101 native]$
3>.解压snappy版本
[yinzhengjie@s101 native]$ tar -zxf /home/yinzhengjie/download/cdh/cdh5.3.6-snappy-lib-natirve.tar.gz -C ./ [yinzhengjie@s101 native]$ [yinzhengjie@s101 native]$ ll total 0 drwxrwxr-x 3 yinzhengjie yinzhengjie 19 Sep 13 2015 lib [yinzhengjie@s101 native]$ mv lib/native/* ./ [yinzhengjie@s101 native]$ [yinzhengjie@s101 native]$ ll total 15472 drwxrwxr-x 3 yinzhengjie yinzhengjie 19 Sep 13 2015 lib -rw-rw-r-- 1 yinzhengjie yinzhengjie 1279980 Sep 13 2015 libhadoop.a -rw-rw-r-- 1 yinzhengjie yinzhengjie 1487052 Sep 13 2015 libhadooppipes.a lrwxrwxrwx 1 yinzhengjie yinzhengjie 18 Sep 13 2015 libhadoop.so -> libhadoop.so.1.0.0 -rwxrwxr-x 1 yinzhengjie yinzhengjie 747310 Sep 13 2015 libhadoop.so.1.0.0 -rw-rw-r-- 1 yinzhengjie yinzhengjie 582056 Sep 13 2015 libhadooputils.a -rw-rw-r-- 1 yinzhengjie yinzhengjie 359770 Sep 13 2015 libhdfs.a lrwxrwxrwx 1 yinzhengjie yinzhengjie 16 Sep 13 2015 libhdfs.so -> libhdfs.so.0.0.0 -rwxrwxr-x 1 yinzhengjie yinzhengjie 228715 Sep 13 2015 libhdfs.so.0.0.0 -rw-rw-r-- 1 yinzhengjie yinzhengjie 7684148 Sep 13 2015 libnativetask.a lrwxrwxrwx 1 yinzhengjie yinzhengjie 22 Sep 13 2015 libnativetask.so -> libnativetask.so.1.0.0 -rwxrwxr-x 1 yinzhengjie yinzhengjie 3060775 Sep 13 2015 libnativetask.so.1.0.0 -rw-r--r-- 1 yinzhengjie yinzhengjie 233506 Sep 13 2015 libsnappy.a -rwxr-xr-x 1 yinzhengjie yinzhengjie 961 Sep 13 2015 libsnappy.la lrwxrwxrwx 1 yinzhengjie yinzhengjie 18 Sep 13 2015 libsnappy.so -> libsnappy.so.1.2.0 lrwxrwxrwx 1 yinzhengjie yinzhengjie 18 Sep 13 2015 libsnappy.so.1 -> libsnappy.so.1.2.0 -rwxr-xr-x 1 yinzhengjie yinzhengjie 147718 Sep 13 2015 libsnappy.so.1.2.0 [yinzhengjie@s101 native]$ rm -rf lib [yinzhengjie@s101 native]$
4>.编辑“mapred-site.xml”配置文件
[yinzhengjie@s101 hadoop-2.5.0-cdh5.3.6]$ pwd /home/yinzhengjie/download/cdh/hadoop-2.5.0-cdh5.3.6 [yinzhengjie@s101 hadoop-2.5.0-cdh5.3.6]$ ll total 20 drwxr-xr-x 2 yinzhengjie yinzhengjie 128 Jul 28 2015 bin drwxr-xr-x 2 yinzhengjie yinzhengjie 4096 Jul 28 2015 bin-mapreduce1 drwxr-xr-x 3 yinzhengjie yinzhengjie 4096 Jul 28 2015 cloudera drwxr-xr-x 6 yinzhengjie yinzhengjie 105 Jul 28 2015 etc drwxr-xr-x 5 yinzhengjie yinzhengjie 40 Jul 28 2015 examples drwxr-xr-x 3 yinzhengjie yinzhengjie 27 Jul 28 2015 examples-mapreduce1 drwxr-xr-x 2 yinzhengjie yinzhengjie 101 Jul 28 2015 include drwxr-xr-x 3 yinzhengjie yinzhengjie 19 Jul 28 2015 lib drwxr-xr-x 2 yinzhengjie yinzhengjie 4096 Jul 28 2015 libexec drwxr-xr-x 3 yinzhengjie yinzhengjie 4096 Jul 28 2015 sbin drwxr-xr-x 4 yinzhengjie yinzhengjie 29 Jul 28 2015 share drwxr-xr-x 17 yinzhengjie yinzhengjie 4096 Jul 28 2015 src [yinzhengjie@s101 hadoop-2.5.0-cdh5.3.6]$ cd etc/hadoop [yinzhengjie@s101 hadoop]$ [yinzhengjie@s101 hadoop]$ rm -rf *.cmd [yinzhengjie@s101 hadoop]$ [yinzhengjie@s101 hadoop]$ mv mapred-site.xml.template mapred-site.xml [yinzhengjie@s101 hadoop]$ [yinzhengjie@s101 hadoop]$ vi mapred-site.xml [yinzhengjie@s101 hadoop]$ [yinzhengjie@s101 hadoop]$ more mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 --> <property> <name>mapreduce.jobhistory.address</name> <value>s101:10020</value> </property> <!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>s101:19888</value> </property> </configuration> [yinzhengjie@s101 hadoop]$