文章目录
- 安装目录
- 准备工作
- 下载Hadoop
- 安装与配置Hadoop
- 启动与停止Hadoop
- 配置.bash_profile
- 第一次启动hdfs需要格式化
- 启动HDFS
- 停止HDFS
- HDFS启动状态查看
- 验证HDFS
- 启动时遇到的坑
- 一、sh: connect to host localhost port 22: Connection refused
- 二、Unable to load native-hadoop library for your platform
- 三、An Ant BuildException has occured: exec returned
- 配置mapred-site.xml
- 配置yarn-site.xml
- yarn启动与停止
- 命令与验证
- 参考资料
2019-05-15 | 大数据学习之路系列01
本安装文档是在MacOS中安装单机版Hadoop。
安装目录
WZB-MacBook:50_bigdata wangzhibin$ pwd
/Users/wangzhibin/00_dev_suite/50_bigdata
准备工作
JDK
Mac安装JDK的过程略,参考:MAC下安装多版本JDK和切换几种方式
WZB-MacBook:50_bigdata wangzhibin$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
WZB-MacBook:50_bigdata wangzhibin$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home
下载Hadoop
brew install wget
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.8.4/hadoop-2.8.4.tar.gz
WZB-MacBook:50_bigdata wangzhibin$ tar -zxvf hadoop-2.8.4.tar.gz
安装与配置Hadoop
修改JDK配置
WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}改为
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home
验证Hadoop
WZB-MacBook:hadoop-2.8.4 wangzhibin$ bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
单机模式执行
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar grep input output 'dfs[a-z.]+'
$ cat output/*
1 dfsadmin
配置core-site.xml
WZB-MacBook:hadoop-2.8.4 wangzhibin$ mkdir -p hdfs/tmp
WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/core-site.xml
增加如下配置:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
配置hdfs-site.xml
WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/hdfs-site.xml
增加如下配置:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/data</value>
</property>
</configuration>
启动与停止Hadoop
配置.bash_profile
# set hadoop
export HADOOP_HOME=/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
第一次启动hdfs需要格式化
WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./bin/hdfs namenode -format
...
19/05/15 22:30:47 INFO common.Storage: Storage directory /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name has been successfully formatted.
...
启动HDFS
./sbin/start-dfs.sh
停止HDFS
./sbin/stop-dfs.sh
HDFS启动状态查看
- HDFS 状态:http://localhost:50070/dfshealth.html#tab-overview
- Secordary NameNode 状态:http://localhost:50090/status.html
- 本地官方文档:[API文档](file:///Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/share/doc/hadoop/index.html)
验证HDFS
简单的验证hadoop命令:
$ hadoop fs -mkdir /test
WZB-MacBook:hadoop wangzhibin$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - wangzhibin supergroup 0 2019-05-16 11:26 /test
启动时遇到的坑
一、sh: connect to host localhost port 22: Connection refused
此时可能会出现如下错误。是因为没有配置ssh免密登录。
WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./sbin/start-dfs.sh
19/05/15 22:38:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
Starting secondary namenodes [0.0.0.0]
0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused
19/05/15 22:38:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
采用如下方法解决:
1)解决方法是选择系统偏好设置->选择共享->点击远程登录
2)设置免密登录
$ ssh-****** -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ ssh localhost
二、Unable to load native-hadoop library for your platform
WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./sbin/start-dfs.sh
19/05/15 22:50:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-namenode-WZB-MacBook.local.out
localhost: starting datanode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-datanode-WZB-MacBook.local.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-secondarynamenode-WZB-MacBook.local.out
19/05/15 22:50:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
参考:
解决方案:重新编译hadoop,将编译后的hadoop-dist/target/hadoop-2.8.4/lib/native替换$HADOOP_HOME/lib/native。
- 安装基础组件
$ brew install gcc autoconf automake libtool cmake snappy gzip bzip2 zlib
- 安装protobuf。
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar zxvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure
make
make install
- 重新编译hadoop
wget http://apache.fayea.com/hadoop/common/hadoop-2.8.4/hadoop-2.8.4-src.tar.gz
tar zxvf hadoop-2.8.4-src.tar.gz
cd hadoop-2.8.4-src
mvn package -Pdist,native -DskipTests -Dtar -e
cp -r /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-dist/target/hadoop-2.8.4/lib/native .
三、An Ant BuildException has occured: exec returned
WZB-MacBook:hadoop-2.8.4-src wangzhibin$ mvn package -Pdist,native -DskipTests -Dtar -e
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-pipes: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/native" executable="cmake">... @ 5:152 in /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/antrun/build-main.xml
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-pipes: An Ant BuildException has occured: exec returned: 1
around Ant part ...<exec failonerror="true" dir="/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/native" executable="cmake">... @ 5:152 in /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/antrun/build-main.xml
参考:mac下编译Hadoop 2.8.1报错An Ant BuildException has occured: exec returned: 1,排错过程
解决方案:配置环境变量OPENSSL_ROOT_DIR、OPENSSL_INCLUDE_DIR。修改~/.bash_profile
# openssl
export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2r
export OPENSSL_INCLUDE_DIR=$OPENSSL_ROOT_DIR/include
##配置与启动yarn
配置mapred-site.xml
cd $HADOOP_HOME/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn-site.xml
vim yarn-site.xml
<configuration>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
yarn启动与停止
启动
cd $HADOOP_HOME
./sbin/start-yarn.sh
./sbin/stop-yarn.sh
浏览器查看:http://localhost:8088
jps查看进程
WZB-MacBook:hadoop wangzhibin$ jps
534 NutstoreGUI
49135 DataNode
49834 ResourceManager
49234 SecondaryNameNode
49973 Jps
67596
49912 NodeManager
49057 NameNode
到此,hadoop单机模式就配置成功了!
命令与验证
Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088/
Node Specific Info: http://localhost:8042/
Command
$ jps
$ yarn // For resource management more information than the web interface.
$ mapred // Detailed information about jobs
参考资料
- Hadoop: Setting up a Single Node Cluster.
- centos7 hadoop 单机模式安装配置
- Hadoop in OSX El-Capitan
- Installing Hadoop on Mac OS X 10.9.4
- macOS上搭建伪分布式Hadoop环境