corosync+pacemaker实现httpd高可用

corosync+pacemaker

官方网址

一、开源高可用了解

OPEN SOURCE HIGH AVAILABILITY CLUSTER STACK/开源高可用集群堆栈
1999年以来，Corosync,Pacemaker,DRBD,ScanCore和许多其他项目一起在生成集群中实现了对机器和应用程序级故障的检测和恢复；

二、概述

1、部署

支持多种部署方案，支持2节点到32节点 active/active配置，还可以通过几个主动/被动集群进行合并并共享统一备份节点来降低成本

2、监控

我们监视系统的硬件和软件故障。如果发生故障，我们将自动恢复您的应用程序，并确保可以从集群中剩余的一台机器上获得该应用程序。

3、恢复

发生故障后，我们将使用高级算法根据与其他群集服务一起运行的相对节点首选项和/或要求，快速确定服务的最佳位置（我们将其称为“约束”）。

corosync v2
Corosync是集群管理套件的一部分，他在传递信息的时候可以通过一个简单的配置文件来定义信息传递的方式和协议等，corosync是messaging layer 集群信息层软件，需要pacemaker资源管理器，才能构成一个完整的高可用集群，它也是运行于心跳层的开源软件（是集群框架引擎程序）
pacemaker
pacemaker也就是Cluster Resource Manager(简称CRM)，是一个集群资源管理器，它利用集群基础构建（corosync或heartbeat）提供消息和成员管理能力来探测并从节点或资源级别的故障中恢复，以实现集群服务的最大可用性

前者用于资源转已，后者用于心跳监测，结合起立使用，实现对高可用架构的自动管理，心跳监测使用监测服务器是否还在提供服务，若出现服务器异常，就认为它挂掉了，此时pacemaker将会对资源进行转移

生命周期管理工具：
pcs：agent（pcsd）
crmsh：agentless(pssh)，pacemaker的命令行工具

4、pacemaker核心组件

libQB-核心服务（日志，IPC等）
Corosync-成员资格，消息传递和仲裁
Resource agents-资源代理，与资源管理的基础服务交互的脚本集合
Fencing agents-隔离代理，与网络交互及san设备进行交互以隔离激情成员的脚本集合
Pacemaker itself-

三、部署前的准备

1、修改主机名及firewalld

all:

cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.211.55.6 node2
10.211.55.4 node1

2、免密登录

ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1

3、安装pcs并启动

[ALL] # yum install pacemaker pcs resource-agents -y

Installing:
 pacemaker            x86_64      1.1.20-5.el7_7.2            updates      472 k
 pcs                  x86_64      0.9.167-3.el7.centos.1      updates      4.1 M
 resource-agents      x86_64      4.1.1-30.el7_7.7            updates      453 k

Installed:
  pacemaker.x86_64 0:1.1.20-5.el7_7.2                  pcs.x86_64 0:0.9.167-3.el7.centos.1                  resource-agents.x86_64 0:4.1.1-30.el7_7.7                 

Dependency Installed:
  clufter-bin.x86_64 0:0.77.1-1.el7                           clufter-common.noarch 0:0.77.1-1.el7                corosync.x86_64 0:2.4.3-6.el7_7.1                 
  corosynclib.x86_64 0:2.4.3-6.el7_7.1                        libqb.x86_64 0:1.0.1-7.el7                          pacemaker-cli.x86_64 0:1.1.20-5.el7_7.2           
  pacemaker-cluster-libs.x86_64 0:1.1.20-5.el7_7.2            pacemaker-libs.x86_64 0:1.1.20-5.el7_7.2            python-clufter.noarch 0:0.77.1-1.el7              
  ruby.x86_64 0:2.0.0.648-36.el7                              ruby-irb.noarch 0:2.0.0.648-36.el7                  ruby-libs.x86_64 0:2.0.0.648-36.el7               
  rubygem-bigdecimal.x86_64 0:1.2.0-36.el7                    rubygem-io-console.x86_64 0:0.4.2-36.el7            rubygem-json.x86_64 0:1.7.7-36.el7                
  rubygem-psych.x86_64 0:2.0.0-36.el7                         rubygem-rdoc.noarch 0:4.0.0-36.el7                  rubygems.noarch 0:2.0.14.1-36.el7

启动及开机自启

[ALL] # systemctl start pcsd.service
[ALL] # systemctl enable pcsd.service

安装完成已经自动创建hacluster账户

4、认证节点

配置密码
[ALL]#echo CHANGEME | passwd --stdin hacluster
验证是否可以认证
[one]#pcs cluster auth node1 node2 -u hacluster -p CHANGEME --force

[root@node1 ~]# pcs cluster auth node1 node2 -u hacluster -p CHANGEME --force
node1: Authorized
node2: Authorized

四、集群部署

1、创建第一个集群

pcs cluster auth node1 node2 -u hacluster -p CHANGEME --force

node1: Authorized
node2: Authorized

2、启动集群

pcs cluster setup --force --name ha1 node1 node2

Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node2: Successfully destroyed cluster
node1: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node2: successful distribution of the file 'pacemaker_remote authkey'
node1: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded

Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success

3、集群自启动

pcs cluster start --all

node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...

4、查询状态

pcs cluster status

Cluster Status:
 Stack: corosync
 Current DC: node2 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
 Last updated: Wed Apr 22 00:19:54 2020
 Last change: Wed Apr 22 00:18:37 2020 by root via cibadmin on node1
 2 nodes configured
 0 resources configured

PCSD Status:
  node1: Online
  node2: Online
[root@node1 ~]#

5、设置集群

1）、设置fence设备

corosync默认启用了stonith，但是当前集群并没有stonith设备，因此此时先需要禁用
查询
crm_verify -L -V
pcs property set stonith-enabled=false

2）、配置apache服务

安装
yum install httpd -y

修改配置
vim /var/www/html/index.html //node1添加hello node1，node2添加hello node2

启动及开机自启
systemctl start httpd
systemctl enable httpd

3）、配置存储

可以使用本地磁盘来构建纯软件的镜像型集群系统，也可以使用专门的共享磁盘装置来

4）、vip设置

pcs resource create VIP ocf:heartbeat:IPaddr2 ip=10.211.55.100 cidr_netmask=24 op monitor interval=30s

pcs update VIP op monitor interval=15s

5）、资源添加到集群

pcs resource create WEB apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status"

6）、设置优先location

pcs constraint location WEB prefers node1=50
pcs constraint location WEB prefers node2=45

五、验证

1、访问vip地址

http://10.211.55.100/
ha-pacemaker-web1

2、node2 断掉网络测试

systemctl stop network

3、再次访问

http://10.211.55.100/
ha-pacemaker-web2

六、常用命令

常用命令汇总：

查看集群状态：#pcs status

查看集群当前配置：#pcs config

开机后集群自启动：#pcs cluster enable –all

启动集群：#pcs cluster start –all

查看集群资源状态：#pcs resource show

验证集群配置情况：#crm_verify -L -V

测试资源配置：#pcs resource debug-start resource

设置节点为备用状态：#pcs cluster standby node1