系统环境

Heron集群提供了一个抽象的调度器概念,使其可以使用Aurora或Mesos作为其调度器。这两个调度器的使用都需要Heron部署运行在Mesos集群之上,同时因为Aurora需要运行在Mesos之上,因此这里介绍Mesos集群如何在Heron集群中安装和配置,为高可用集群后续的配置安装提供基础。

基础环境设置:

1. 各个主机中/etc/hostname和/etc/hosts文件中已经配置好对应主机名称和IP信息。

  • heron01: HERON01_IP
  • heron02: HERON02_IP
  • heron03: HERON03_IP

2. 配置三台主机之间SSH免密登录

3. 安装jdk1.8

4. 安装配置zookeeper集群环境:Ubuntu16.04安装配置使用Zookeeper集群

说明:该文章为Heron高可用集群配置的一部分内容,因此集群环境沿用集群配置中的内容。

 编译安装Mesos

注意:该过程需要在集群中每个节点中进行

 1. 下载Mesos

$ wget http://www.apache.org/dist/mesos/1.4.1/mesos-1.4.1.tar.gz
$ tar -zxf mesos-1.4.1.tar.gz /home/yitian

2. 安装依赖库

# Update the packages.
$ sudo apt-get update
 
# Install a few utility tools.
$ sudo apt-get install -y tar wget git
 
# Install the latest OpenJDK.
$ sudo apt-get install -y openjdk-8-jdk
 
# Install autotools (Only necessary if building from git repository).
$ sudo apt-get install -y autoconf libtool
 
# Install other Mesos dependencies.
$ sudo apt-get -y install build-essential python-dev python-six python-virtualenv libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev zlib1g-dev

3. 编译安装

进入/home/yitian/mesos-1.4.1的解压目录:

# Configure and build.
$ mkdir build
$ cd build
$ ../configure --prefix=/home/yitian/mesosinstall/ # --prefix参数指定mesos安装路径
$ make –j 2 # 这里的-j参数,为指定编译使用的CPU核心数# Run test suite.
$ make check
# Install (Optional).
$ make install –j 2 # 这里的-j参数,为指定编译使用的CPU核心数

注意:这里会花费较长一段时间,并且,最好将虚拟机的内存调大(这里为5GB),否则使用多核编译时可能会出现内存不够而出现的错误。

编译完成并进行测试

编译好后在/home/yitian/mesos-1.4.1/build/目录下,运行Mesos Document中提供的本地运行示例

官方给的例子为:

# Change into build directory.
$ cd build
# Start Mesos master (ensure work directory exists and has proper permissions).
$ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
# Start Mesos agent (ensure work directory exists and has proper permissions).
$ ./bin/mesos-agent.sh --master=127.0.0.1:5050 --work_dir=/var/lib/mesos
# Visit the Mesos web page.
$ http://127.0.0.1:5050
# Run Java framework (exits after successfully running some tasks).
$ ./src/examples/java/test-framework 127.0.0.1:5050

我的运行示例为:

1. 运行mesos-master:

[email protected]:~/mesos-1.4.1/build$ sudo ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos  

2. 运行mesos-slave:(重新打开一个Liunx终端)

[email protected]:~/mesos-1.4.1/build$ sudo ./bin/mesos-agent.sh --master=127.0.0.1:5050 --work_dir=/var/lib/mesos

3. 运行示例的Framework:

[email protected]:~/mesos-1.4.1/build$ ./src/examples/java/test-framework 127.0.0.1:5050
I0217 00:39:48.763849 15537 sched.cpp:232] Version: 1.4.1
I0217 00:39:48.775424 15555 sched.cpp:336] New master detected at [email protected]:5050
I0217 00:39:48.777709 15555 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0217 00:39:48.787648 15552 sched.cpp:759] Framework registered with 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-0000
Registered! ID = 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-0000
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0 with cpus: 4.0 and mem: 2898.0
Launching task 0 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Launching task 1 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Launching task 2 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Launching task 3 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O1 with cpus: 0.0 and mem: 2386.0
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O2 with cpus: 0.0 and mem: 2386.0
Status update: task 3 is in state TASK_RUNNING
Status update: task 0 is in state TASK_RUNNING
Status update: task 2 is in state TASK_RUNNING
Status update: task 1 is in state TASK_RUNNING
Status update: task 3 is in state TASK_FINISHED
Finished tasks: 1
Status update: task 0 is in state TASK_FINISHED
Finished tasks: 2
Status update: task 2 is in state TASK_FINISHED
Finished tasks: 3
Status update: task 1 is in state TASK_FINISHED
Finished tasks: 4
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O3 with cpus: 4.0 and mem: 2898.0
Launching task 4 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O3
Status update: task 4 is in state TASK_RUNNING
Status update: task 4 is in state TASK_FINISHED
Finished tasks: 5
I0217 00:39:53.944774 15552 sched.cpp:2021] Asked to stop the driver
I0217 00:39:53.945152 15552 sched.cpp:1203] Stopping framework 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-0000
I0217 00:39:53.948561 15537 sched.cpp:2021] Asked to stop the driver

4. 查看Mesos 的WebUI:http://127.0.0.1:5050

Heron集群中编译方式安装配置Mesos集群

Heron集群中编译方式安装配置Mesos集群

5. 查看主机中mesos的启动状态

[email protected]:~$ ps -e |grep mesos
  35043 pts/11   00:00:02 lt-mesos-master
  35106 pts/4    00:00:00 lt-mesos-agent

以上内容完成了一个主机中mesos的编译安装工作,并进行了验证,如果mesos webui中的显示如上图中的所示,则说明该主机中的mesos编译安装成功完成。下面进行Mesos cluster的配置。

配置Mesos Cluster

使用如上述相同的步骤,完成集群中另外两个节点中mesos的安装。然后分别进行如下的集群配置过程。修改集群中每个节点的Mesos配置文件(/home/yitian/mesosinstall/etc/mesos/)如下:

1. 使用模板文件创建集群配置文件:

[email protected]:~/mesosinstall/etc/mesos$ ll
total 20
drwxrwxr-x 2 yitian yitian 4096 Feb 17 05:46 ./
drwxrwxr-x 3 yitian yitian 4096 Feb 17 05:46 ../
-rw-r--r-- 1 yitian yitian  595 Feb 17 05:46 mesos-agent-env.sh.template
-rw-r--r-- 1 yitian yitian  339 Feb 17 05:46 mesos-deploy-env.sh.template
-rw-r--r-- 1 yitian yitian  319 Feb 17 05:46 mesos-master-env.sh.template
lrwxrwxrwx 1 yitian yitian   27 Feb 17 05:46 mesos-slave-env.sh.template -> mesos-agent-env.sh.template
[email protected]:~/mesosinstall/etc/mesos$ cp mesos-master-env.sh.template mesos-master-env.sh
[email protected]:~/mesosinstall/etc/mesos$ cp mesos-slave-env.sh.template mesos-slave-env.sh
[email protected]:~/mesosinstall/etc/mesos$ cp mesos-deploy-env.sh.template mesos-deploy-env.sh
[email protected]:~/mesosinstall/etc/mesos$ cp mesos-agent-env.sh.template mesos-agent-env.sh

2. 创建配置文件masters,并填写如下集群中用于作为master的配置项:

heron01

3. 创建配置文件slaves,并填写集群中用于作为slave的配置项:

heron02
heron03

4. 修改配置文件:mesos-master-env.sh

# This file contains environment variables that are passed to mesos-master.
# To get a description of all options run mesos-master --help; any option
# supported as a command-line option is also supported as an environment
# variable.
# Some options you're likely to want to set:
# export MESOS_log_dir=/var/log/mesos
 
export MESOS_log_dir=/home/yitian/mesosdata/log
export MESOS_work_dir=/home/yitian/mesosdata/data
export MESOS_ZK=zk://heron01:2181,heron02:2181,heron03:2181/mesos
export MESOS_quorum=1 # 在使用zookeeper时必须设置

5. 修改配置文件:mesos-slave-env.sh和mesos-agent-env.sh,两者内容相同:

# This file contains environment variables that are passed to mesos-agent.
# To get a description of all options run mesos-agent --help; any option
# supported as a command-line option is also supported as an environment
# variable.
# You must at least set MESOS_master.
# The mesos master URL to contact. Should be host:port for
# non-ZooKeeper based masters, otherwise a zk:// or file:// URL.
 
export MESOS_master=heron01:5050
export MESOS_log_dir=/home/yitian/mesosdata/log
export MESOS_work_dir=/home/yitian/mesosdata/run

#export MESOS_isolation=cgroups
# Other options you're likely to want to set:
# export MESOS_log_dir=/var/log/mesos
# export MESOS_work_dir=/var/run/mesos
# export MESOS_isolation=cgroups

6. 修改/home/yitian/mesosinstall/sbin/mesos-daemon.sh

# Increase the default number of open file descriptors.
# ulimit -n 8192
ulimit -n 1024

7. 在主机的系统配置文件中,添加mesos的环境变量,并进行生效:

# Mesos configuration
export MESOS_HOME=/home/yitian/mesosinstall
export PATH=${MESOS_HOME}/sbin:${MESOS_HOME}/bin:$PATH

# 使配置文件生效
source /etc/profile

再次注意:以上配置文件的修改和配置过程,需要在集群中每个节点中分别进行。

 启动Mesos集群

1. 由于Mesos集群依赖于zookeeper的启动,因此在启动mesos集群之前,需要先启动zookeeper(单节点或集群),成功启动后各个主机中的zookeeper运行状态如下(leader和follower所在主机可能不同,是因为zookeeper选举机制导致的)

# heron01
[email protected]:~$ ./zookeeper/zookeeper-3.4.10/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower

# heron02
[email protected]:~$ ./zookeeper/zookeeper-3.4.10/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower

# heron03
[email protected]:~$ ./zookeeper/zookeeper-3.4.10/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: leader

2. 在heron01(master节点)中使用root用户启动集群:

[email protected]:/home/yitian# ./mesosinstall/sbin/mesos-start-cluster.sh
Starting mesos-master on heron01
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron01 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-master </dev/null >/dev/null
Starting mesos-agent on heron02
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron02 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-agent </dev/null >/dev/null
Starting mesos-agent on heron03
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron03 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-agent </dev/null >/dev/null
Everything's started!

注:使用root用户启动集群的原因,是在/run/路径下的一些文件需要root用户的权限,才能访问和使用。因为这里需要使用root用户启动集群,因此在准备工作中,需要配置各个主机之间root用户的ssh无密码登录。

3. 成功启动后,在heron01 master主机中同样访问mesos的webui进行查看: http://heron01:5050

(1)**的agent:

Heron集群中编译方式安装配置Mesos集群

(2) 这里同时运行了aurora

Heron集群中编译方式安装配置Mesos集群

 

(3)这里的为agent主机的配置:

Heron集群中编译方式安装配置Mesos集群

(3)Roles:

Heron集群中编译方式安装配置Mesos集群

(4)Offers:

Heron集群中编译方式安装配置Mesos集群

常见问题

1. 编译Mesos时,出现:g++: internal compiler error: Killed (program cc1plus)。解决:主要原因是因为内存不足,临时使用交换分区来解决。

sudo dd if=/dev/zero of=/swapfile bs=64M count=16
sudo mkswap /swapfile
sudo swapon /swapfile
After compiling, you may wish to
Code:
sudo swapoff /swapfile
sudo rm /swapfile

2. 运行./mesos-start-cluster.sh启动集群时,出现如下信息:

[email protected]:~/mesosinstall/sbin$ ./mesos-start-cluster.sh
Starting mesos-master on heron01
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron01 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-master </dev/null >/dev/null
ssh: connect to host heron01 port 22: Connection timed out
Starting mesos-agent on heron02
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron02 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-agent </dev/null >/dev/null
ssh: connect to host heron02 port 22: Connection timed out
Everything's started!

解决:在集群中的主机中的防火墙中设置允许22端口服务或者关闭防火墙。

[email protected]buntu:~/mesosinstall/sbin$ sudo ufw status
Status: inactive
[email protected]:~/mesosinstall/sbin$ sudo ufw allow 22
Skipping adding existing rule
Skipping adding existing rule (v6)

注意,NAT模式下的虚拟机IP变化,SSH需要开发root权限等问题。

3. 编译Mesos时出现:virtual memory exhausted:cannot allocate memory

内存不足,尝试增加虚拟机内存,重新make。参考:virtual memory exhausted:cannot allocate memory

4. Slave节点不可用

Heron集群中编译方式安装配置Mesos集群

解决:相关问题见Mesos agent always in Deactivated state。之前出现这种情况的重要原因是三台主机只在/etc/hosts文件中配置了主机名和ip,但没有在/etc/hostname文件中进行配置本机的主机名,都默认使用了ubuntu为主机名,导致启动后agent节点不可用。因此,修改各个主机的/etc/hostname文件,或者将mesos的配置文件中的主机名都改为相应的IP地址进行配置,即可解决问题。

5. 启动集群时,Slave节点无法发现,在slave节点的ERROR日志中发现如下异常信息:

Log file created at: 2018/02/17 06:57:48
Running on machine: ubuntu
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0217 06:57:48.811517 46316 main.cpp:468] EXIT with status 1: Failed to initialize systemd: Failed to create systemd slice 'mesos_executors.slice': Failed to write systemd slice `/run/systemd/system/mesos_executors.slice`: Failed to open file '/run/systemd/system/mesos_executors.slice': Permission denied

解决方法:通过查看’/run/systemd/路面权限,发现该目录为root用户权限,因此,在启动Mesos集群时使用如下命令:sudo /home/yitian/mesosinstall/sbin/mesos-start-cluster.sh启动。使用该命令启动,需要设置主机间root允许ssh登陆以及无密码登陆。并且,之前的步骤中已经设置了无密登陆,但没有设置root用户登陆。这里进行root用户登陆ssh:修改/etc/ssh/sshd_config及配置文件:

# Authentication:
LoginGraceTime 120
# PermitRootLogin prohibit-password
PermitRootLogin yes
StrictModes yes

参考资料

相关文章:

  • 2021-09-13
  • 2022-12-23
  • 2021-11-18
  • 2021-11-19
  • 2022-01-10
  • 2021-05-24
  • 2021-07-08
猜你喜欢
  • 2021-12-28
  • 2021-06-16
  • 2022-01-15
  • 2022-01-10
  • 2022-01-05
  • 2021-04-14
  • 2021-11-19
相关资源
相似解决方案