Hadoop部署方式-完全分布式(Fully-Distributed Mode)
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
本博客搭建的虚拟机是伪分布式环境(https://www.cnblogs.com/yinzhengjie/p/9058415.html)链接克隆出来的,我们只需要修改一下配置文件就可以轻松实现完全分布式部署了,部署架构是一个NameNode和三个DataNode,如果身为一个专业的运维人员你可能会一眼看出来这个集群存在单点故障,别着急,关于高可用集群部署请参考:https://www.cnblogs.com/yinzhengjie/p/9070017.html。
如果你是mac用户推荐使用"Parallets ",如果你是windows系统推荐使用“VMware Workstation”,如果是Linux用户的小伙伴推荐使用“VirtualBox”。我的实验环境在windows上操作的,安装的是VMware Workstation。
一.实验环境准备
需要准备四台Linux操作系统的服务器,配置参数最好一样,由于我的虚拟机是之前伪分布式部署而来的,因此我的环境都一致,并且每天虚拟机默认都是Hadoop伪分布式哟!
1>.NameNode服务器(172.16.30.101)
2>.DataNode服务器(172.16.30.102)
3>.DataNode服务器(172.16.30.103)
4>.DataNode服务器(172.16.30.104)
二.修改Hadoop的配置文件
修改的配置文件路径是我之前拷贝的full目录,绝对路径是:“/soft/hadoop/etc/full”,修改这个目录下的文件之后,我们将hadoop目录连接过来即可,当你需要伪分布式或者本地模式的时候只需要改变软连接指向的目录即可,这样就轻松实现了三种模式配置文件和平相处的局面。
1>.core-site.xml 配置文件
[yinzhengjie@s101 ~]$ more /soft/hadoop/etc/hadoop/core-site.xml <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://s101:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/yinzhengjie/hadoop</value> </property> </configuration> <!-- core-site.xml配置文件的作用: 用于定义系统级别的参数,如HDFS URL、Hadoop的临时 目录以及用于rack-aware集群中的配置文件的配置等,此中的参 数定义会覆盖core-default.xml文件中的默认配置。 fs.defaultFS 参数的作用: #声明namenode的地址,相当于声明hdfs文件系统。 hadoop.tmp.dir 参数的作用: #声明hadoop工作目录的地址。 --> [yinzhengjie@s101 ~]$
2>.hdfs-site.xml 配置文件
[yinzhengjie@s101 ~]$ more /soft/hadoop/etc/hadoop/hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> <!-- hdfs-site.xml 配置文件的作用: #HDFS的相关设定,如文件副本的个数、块大小及是否使用强制权限 等,此中的参数定义会覆盖hdfs-default.xml文件中的默认配置. dfs.replication 参数的作用: #为了数据可用性及冗余的目的,HDFS会在多个节点上保存同一个数据 块的多个副本,其默认为3个。而只有一个节点的伪分布式环境中其仅用 保存一个副本即可,这可以通过dfs.replication属性进行定义。它是一个 软件级备份。 --> [yinzhengjie@s101 ~]$
3>.mapred-site.xml 配置文件
[yinzhengjie@s101 ~]$ more /soft/hadoop/etc/hadoop/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> <!-- mapred-site.xml 配置文件的作用: #HDFS的相关设定,如reduce任务的默认个数、任务所能够使用内存 的默认上下限等,此中的参数定义会覆盖mapred-default.xml文件中的 默认配置. mapreduce.framework.name 参数的作用: #指定MapReduce的计算框架,有三种可选,第一种:local(本地),第 二种是classic(hadoop一代执行框架),第三种是yarn(二代执行框架),我 们这里配置用目前版本最新的计算框架yarn即可。 --> [yinzhengjie@s101 ~]$
4>.yarn-site.xml配置文件
[yinzhengjie@s101 ~]$ more /soft/hadoop/etc/hadoop/yarn-site.xml <?xml version="1.0"?> <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>s101</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> <!-- yarn-site.xml配置文件的作用: #主要用于配置调度器级别的参数. yarn.resourcemanager.hostname 参数的作用: #指定资源管理器(resourcemanager)的主机名 yarn.nodemanager.aux-services 参数的作用: #指定nodemanager使用shuffle --> [yinzhengjie@s101 ~]$
5>.slaves配置文件
[yinzhengjie@s101 ~]$ more /soft/hadoop/etc/hadoop/slaves #该配置文件的作用:是NameNode用与记录需要连接哪些DataNode服务器节点,用与启动或停止服务时发送远程命令指令的目标主机。 s102 s103 s104 [yinzhengjie@s101 ~]$
三.在NameNode节点上配置免密码登录各DataNode节点
1>.在本地上生成公私秘钥对(生成之前,把上次部署伪分布式的秘钥删除掉)
[yinzhengjie@s101 ~]$ rm -rf ~/.ssh/* [yinzhengjie@s101 ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa Generating public/private rsa key pair. Your identification has been saved in /home/yinzhengjie/.ssh/id_rsa. Your public key has been saved in /home/yinzhengjie/.ssh/id_rsa.pub. The key fingerprint is: a3:a4:ae:d8:f7:7f:a2:b6:d6:15:74:29:de:fb:14:08 yinzhengjie@s101 The key's randomart image is: +--[ RSA 2048]----+ | . | | E o | | o = . | | o o . | | . S . . . | | o . .. . . | | . .. . o | | o .. o o . . | |. oo.+++.o | +-----------------+ [yinzhengjie@s101 ~]$
2>.使用ssh-copy-id命令分配公钥到DataNode服务器(172.16.30.101)
[yinzhengjie@s101 ~]$ ssh-copy-id yinzhengjie@s101 The authenticity of host 's101 (172.16.30.101)' can't be established. ECDSA key fingerprint is fa:25:bc:03:7e:99:eb:12:1e:bc:a8:c9:ce:39:ba:7b. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys yinzhengjie@s101's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'yinzhengjie@s101'" and check to make sure that only the key(s) you wanted were added. [yinzhengjie@s101 ~]$ ssh s101 Last login: Fri May 25 18:35:40 2018 from 172.16.30.1 [yinzhengjie@s101 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) yinzhengjie pts/1 2018-05-25 19:17 (s101) [yinzhengjie@s101 ~]$ exit logout Connection to s101 closed. [yinzhengjie@s101 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) [yinzhengjie@s101 ~]$
3>.使用ssh-copy-id命令分配公钥到DataNode服务器(172.16.30.102)
[yinzhengjie@s101 ~]$ ssh-copy-id yinzhengjie@s102 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys yinzhengjie@s102's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'yinzhengjie@s102'" and check to make sure that only the key(s) you wanted were added. [yinzhengjie@s101 ~]$ ssh s102 Last login: Fri May 25 18:35:42 2018 from 172.16.30.1 [yinzhengjie@s102 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) yinzhengjie pts/1 2018-05-25 19:19 (s101) [yinzhengjie@s102 ~]$ exit logout Connection to s102 closed. [yinzhengjie@s101 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) [yinzhengjie@s101 ~]$
4>.使用ssh-copy-id命令分配公钥到DataNode服务器(172.16.30.103)
[yinzhengjie@s101 ~]$ ssh-copy-id yinzhengjie@s103 The authenticity of host 's103 (172.16.30.103)' can't be established. ECDSA key fingerprint is fa:25:bc:03:7e:99:eb:12:1e:bc:a8:c9:ce:39:ba:7b. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys yinzhengjie@s103's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'yinzhengjie@s103'" and check to make sure that only the key(s) you wanted were added. [yinzhengjie@s101 ~]$ ssh s103 Last login: Fri May 25 18:35:45 2018 from 172.16.30.1 [yinzhengjie@s103 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) yinzhengjie pts/1 2018-05-25 19:19 (s101) [yinzhengjie@s103 ~]$ exit logout Connection to s103 closed. [yinzhengjie@s101 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) [yinzhengjie@s101 ~]$
5>.使用ssh-copy-id命令分配公钥到DataNode服务器(172.16.30.104)
[yinzhengjie@s101 ~]$ ssh-copy-id yinzhengjie@s104 The authenticity of host 's104 (172.16.30.104)' can't be established. ECDSA key fingerprint is fa:25:bc:03:7e:99:eb:12:1e:bc:a8:c9:ce:39:ba:7b. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys yinzhengjie@s104's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'yinzhengjie@s104'" and check to make sure that only the key(s) you wanted were added. [yinzhengjie@s101 ~]$ ssh s104 Last login: Fri May 25 18:35:47 2018 from 172.16.30.1 [yinzhengjie@s104 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) yinzhengjie pts/1 2018-05-25 19:20 (s101) [yinzhengjie@s104 ~]$ exit logout Connection to s104 closed. [yinzhengjie@s101 ~]$ who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) [yinzhengjie@s101 ~]$
注意:以上是普通使配置免密登录,root用户配置方法一致,最好也配置上root用户的免密登录,因为下文我会执行相应的shell脚本。
[yinzhengjie@s101 ~]$ su Password: [root@s101 yinzhengjie]# cd [root@s101 ~]# [root@s101 ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa Generating public/private rsa key pair. Created directory '/root/.ssh'. Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 9b:47:9a:ca:d2:f9:a5:57:79:35:40:be:07:3a:ed:40 root@s101 The key's randomart image is: +--[ RSA 2048]----+ | .. | | .. | | E o. | | . o o..| | S .+ + o.| | * * o | | . .= o. o | | ..o. +. | | .o.o. | +-----------------+ [root@s101 ~]# [root@s101 ~]# ssh-copy-id root@s101 The authenticity of host 's101 (172.16.30.101)' can't be established. ECDSA key fingerprint is fa:25:bc:03:7e:99:eb:12:1e:bc:a8:c9:ce:39:ba:7b. Are you sure you want to continue connecting (yes/no)? yes /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys root@s101's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'root@s101'" and check to make sure that only the key(s) you wanted were added. [root@s101 ~]# ssh s101 Last login: Fri May 25 19:44:37 2018 [root@s101 ~]# who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) root pts/1 2018-05-25 19:49 (s101) [root@s101 ~]# exit logout Connection to s101 closed. [root@s101 ~]# who yinzhengjie pts/0 2018-05-25 18:35 (172.16.30.1) [root@s101 ~]#