前言
最近在研究云监控的相关工具,之前写过Ganglia的安装步骤,这回来记录下Nagios的安装步骤。
本文不讲解相关原理,若想了解请参考其他资料.
本文目的: 即使之前未触过nagios,也能按照文中步骤搭建自己的nagios监控集群.
@Author duangr
@Website http://my.oschina.net/duangr/blog/183160
1. Nagios简介
Nagios是一个可运行在Linux/Unix平台之上的开源监视系统,可以用来监视系统运行状态和网络信息。Nagios可以监视所指定的本地或远程主机以及服务,同时提供异常通知功能。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。
2. 相关环境
| Host Name |
IP |
OS |
Arch |
| duangr-1 |
192.168.56.10 |
CentOS 6.4 |
x86_64 |
| duangr-2 |
192.168.56.11
|
CentOS 6.4
|
x86_64
|
| duangr-3 |
192.168.56.12
|
CentOS 6.4
|
x86_64
|
3. 部署规划
| 项 |
值 |
监控服务主节点(Master)
|
duangr-1
|
| 被监控从节点(Slave) |
duangr-2, duangr-3 |
Nagios主节点需要安装:
-
nagios
-
nagios-plugin
-
nrpe
-
php
-
apache
Nagios从节点需要安装:
安装路径规划
| 项 |
值 |
nagios安装路径
|
/usr/local/nagios
|
| php安装路径 |
/usr/local/php
|
| apache安装路径 |
/usr/local/apache2
|
4. 代码获取
5. 前提依赖
5.1 主机环境检查(全部节点)
| 1 |
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel |
| 4 |
glibc-common-2.14.1-6.x86_64 |
| 5 |
gd-2.0.35-11.el6.x86_64 |
| 6 |
package gd-devel is not installed |
| 7 |
package xinetd is not installed |
| 8 |
openssl-devel-1.0.0-27.el6.x86_64 |
若有缺失,请先安装. 可通过如下几个镜像网站下载相关安装包:
安装后再次检查如下:
| 1 |
# rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel |
| 4 |
glibc-common-2.14.1-6.x86_64 |
| 5 |
gd-2.0.35-11.el6.x86_64 |
| 6 |
gd-devel-2.0.35-11.el6.x86_64 |
| 7 |
xinetd-2.3.14-38.el6.x86_64 |
| 8 |
openssl-devel-1.0.0-27.el6.x86_64 |
6. 编译安装
6.1 创建用户nagios(全部节点)
| 1 |
useradd nagios -d /usr/local/nagios |
6.2 安装nagios主程序(主节点安装)
| 1 |
tar -zxf nagios-4.0.2.tar.gz |
| 3 |
./configure --prefix=/usr/local/nagios |
| 5 |
make install && make install-init && make install-commandmode && make install-config |
将nagios添加为服务
| 3 |
chkconfig --level 35 nagios on |
| 4 |
chkconfig --list nagios |
| 5 |
nagios 0:关闭 1:关闭 2:关闭 3:启用 4:关闭 5:启用 6:关闭 |
6.3 安装nagios插件(全部节点安装)
| 1 |
tar -zxf nagios-plugins-1.5.tar.gz |
| 3 |
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios |
如果出现mysql相关的编译错误,是mysql的默认安装路径被修改导致的,调整with-mysql后重新make
| 1 |
./configure --prefix=/usr/local/nagios --with-mysql=/usr/local/mysql |
6.4 安装NRPE(全部节点安装)
| 1 |
tar -zxf nrpe-2.15.tar.gz |
| 3 |
./configure --enable-command-args |
下面步骤只需要在被监控节点执行
| 1 |
make install-daemon && make install-daemon-config && make install-xinetd |
6.4.1 被监控节点配置
如果是被监控节点,需要配置NRPE已守护进程运行(通过xinetd来运行)
1、更改/etc/xinetd.d/nrpe文件,设置允许nagios主节点服务器连接
| 2 |
only_from = 127.0.0.1 192.168.56.10 |
2、在/etc/services结尾增加:
3、增加对参数的支持
| 1 |
vi /usr/local/nagios/etc/nrpe.cfg |
4、启动xinetd
5、验证nrpe是否监听
| 1 |
netstat -at | grep nrpe |
6、测试nrpe是否正常运行
| 1 |
/usr/local/nagios/libexec/check_nrpe -H localhost |
6.4.2 主节点配置
如果是监控服务主节点,在全部被监控节点NRPE配置完成后,可以依次做下检测
| 1 |
/usr/local/nagios/libexec/check_nrpe -H 192.168.56.11 |
| 3 |
/usr/local/nagios/libexec/check_nrpe -H 192.168.56.12 |
6.5 安装Apache(主节点安装)
| 1 |
tar -zxf httpd-2.2.23.tar.gz |
| 3 |
./configure --prefix=/usr/local/apache2 |
6.6 安装PHP(主节点安装)
| 1 |
cd /export/home/tools/soft/php |
| 2 |
tar -zxf php-5.4.10.tar.gz |
| 4 |
./configure --prefix=/usr/local/php --with-apxs2=/usr/local/apache2/bin/apxs |
6.7 使用apache 发布PHP的WEB
vi /usr/local/apache2/conf/httpd.conf
| 05 |
DirectoryIndex index.html index.php |
| 06 |
AddType application/x-httpd-php .php |
| 10 |
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin" |
| 11 |
<Directory "/usr/local/nagios/sbin"> |
| 17 |
AuthName "Nagios Access" |
| 18 |
AuthUserFile /usr/local/nagios/etc/htpasswd |
| 21 |
Alias /nagios "/usr/local/nagios/share" |
| 22 |
<Directory "/usr/local/nagios/share"> |
| 28 |
AuthName "nagios Access" |
| 29 |
AuthUserFile /usr/local/nagios/etc/htpasswd |
为web访问时添加用户名和密码(此处用户名为admin,可自定义)
| 1 |
/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd admin |
启动apache
| 1 |
/usr/local/apache2/bin/apachectl start |
访问页面:
http://192.168.56.10/nagios/
7. 配置Nagios
7.1 配置远程被监控节点
7.1.1 修改配置文件
| 2 |
$ vi /usr/local/nagios/etc/nrpe.cfg |
修改为如下配置内容:
| 1 |
command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$ |
| 2 |
command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$ |
| 3 |
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ |
| 4 |
command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ |
| 5 |
command[check_procs_args]=/usr/local/nagios/libexec/check_procs $ARG1$ |
| 6 |
command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$ |
以上监控命令功能:
7.1.2 重启xinetd服务
配置完上述命令后,重启 xinetd服务
7.1.3 校验配置
检查监控命令配置是否ok
| 1 |
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users -a 5 10 |
| 2 |
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load -a 15,10,5 30,25,20 |
| 3 |
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_disk -a 20% 10% / |
| 4 |
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT |
| 5 |
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_swap -a 20% 10% |
7.2 配置监控服务主节点
7.2.1 cgi.cfg(控制CGI访问的配置文件)
(使用 nagios 用户)
vi /usr/local/nagios/etc/cgi.cfg
修改如下内容,为admin用户增加权限:
| 1 |
default_user_name=admin |
| 2 |
authorized_for_system_information=nagiosadmin,admin |
| 3 |
authorized_for_configuration_information=nagiosadmin,admin |
| 4 |
authorized_for_system_commands=nagiosadmin,admin |
| 5 |
authorized_for_all_services=nagiosadmin,admin |
| 6 |
authorized_for_all_hosts=nagiosadmin,admin |
| 7 |
authorized_for_all_service_commands=nagiosadmin,admin |
| 8 |
authorized_for_all_host_commands=nagiosadmin,admin |
7.2.2 nagios.cfg(nagios主配置文件)
(使用 nagios 用户)
vi /usr/local/nagios/etc/nagios.cfg
| 1 |
#cfg_file=/export/home/nagios/etc/objects/localhost.cfg (注释掉) |
| 2 |
cfg_dir=/export/home/nagios/etc/servers |
主配置文件声明了监控脚本的存储路径为 ./servers, 默认没有此目录,需要手工创建
nagios 会读取 servers 目录下面后缀为.cfg的全部文件作为配置文件
| 1 |
cd /usr/local/nagios/etc |
7.2.3 定义监控的主机组
声明一个监控的主机组,将主机环境中提到的三台主机全部加入监控
vi /export/home/nagios/etc/servers/group.cfg
新文件,内容如下:
| 2 |
hostgroup_name duangr-server |
| 4 |
members duangr-1,duangr-2,duangr-3 |
解释下上面的配置:
-
hostgroup_name: 主机组的名称,可随意指定
-
alias: 主机组别名,可随意指定
-
members: 主机组成员,多个主机名称之前使用逗号分隔.另外主机名称必须与 define host 中host_name 一致.
主机的定义,后面会说到.
7.2.4 定义监控的主机
下面开始定义具体的主机
7.2.4.1 本地主机监控配置
先定义本地主机 duangr-1
vi /export/home/nagios/etc/servers/duangr-1.cfg
新文件,内容如下:
| 11 |
service_description Host Alive |
| 12 |
check_command check-host-alive |
| 17 |
service_description Users |
| 18 |
check_command check_local_users!20!50 |
| 23 |
service_description CPU |
| 24 |
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 |
| 29 |
service_description Disk Root |
| 30 |
check_command check_local_disk!20%!10%!/ |
| 35 |
service_description Disk Home |
| 36 |
check_command check_local_disk!20%!10%!/export/home |
| 41 |
service_description Zombie Procs |
| 42 |
check_command check_local_procs!5!10!Z |
| 47 |
service_description Total Procs |
| 48 |
check_command check_local_procs!250!400!RSZDT |
| 53 |
service_description Swap Usage |
| 54 |
check_command check_local_swap!20!10 |
说明下,由于是此主机也是监控服务主节点所在主机,因此可以使用check_local_* 的相关命令来进行监控.
这个文件中已经将常用的监控项配置进去.
7.2.4.2 远程主机监控配置
再定义远程主机duangr-2和duangr-3
定义远程主机的监控之前,需要先定义check_nrpe命令
vi /usr/local/nagios/etc/objects/commands.cfg
在文件的最后面添加如下内容:
| 1 |
# 'check_nrpe' command definition |
| 3 |
command_name check_nrpe |
| 4 |
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ |
| 7 |
command_name check_nrpe_args |
| 8 |
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$ |
定义duangr-2主机的监控配置
$ vi /usr/local/nagios/etc/servers/duangr-2.cfg
新文件,内容如下:
| 005 |
address 192.168.56.11 |
| 011 |
service_description Host Alive |
| 012 |
check_command check-host-alive |
| 017 |
service_description Users |
| 018 |
check_command check_nrpe_args!check_users!5 10 |
| 023 |
service_description CPU |
| 024 |
check_command check_nrpe_args!check_load!15,10,5 30,25,20 |
| 029 |
service_description Disk Root |
| 030 |
check_command check_nrpe_args!check_disk!20% 10% / |
| 035 |
service_description Disk /export/home |
| 036 |
check_command check_nrpe_args!check_disk!20% 10% /export/home |
| 041 |
service_description Procs Zombie |
| 042 |
check_command check_nrpe_args!check_procs!5 10 Z |
| 047 |
service_description Procs Total |
| 048 |
check_command check_nrpe_args!check_procs_args!"-w400 -c600" |
| 053 |
service_description Swap Usage |
| 054 |
check_command check_nrpe_args!check_swap!20% 10% |
| 057 |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
| 058 |
;; 下面是一些常用进程的监控,主要是云平台相关进程 |
| 059 |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
| 064 |
service_description PS: crond |
| 065 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Ccrond" |
| 071 |
service_description PS: QuorumPeerMain |
| 072 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.quorum.QuorumPeerMain" |
| 078 |
service_description PS: supervisor |
| 079 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.supervisor" |
| 085 |
service_description PS: nimbus |
| 086 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.nimbus" |
| 092 |
service_description PS: MetaQ |
| 093 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -ametamorphosis-server-w" |
| 099 |
service_description PS: redis-server |
| 100 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Credis-server" |
| 102 |
;; 监控hadoop主节点NameNode进程 |
| 106 |
service_description PS: NameNode |
| 107 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.NameNode" |
| 109 |
;; 监控hadoop主节点SecondaryNameNode进程 |
| 113 |
service_description PS: SecondaryNameNode |
| 114 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.SecondaryNameNode" |
| 116 |
;; 监控hadoop主节点ResourceManager进程 |
| 120 |
service_description PS: ResourceManager |
| 121 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.resourcemanager.ResourceManager" |
| 123 |
;; 监控hadoop从节点DataNode进程 |
| 127 |
service_description PS: DataNode |
| 128 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.datanode.DataNode" |
| 130 |
;;监控hadoop从节点NodeManager进程 |
| 134 |
service_description PS: NodeManager |
| 135 |
check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.nodemanager.NodeManager" |
说明下,由于duangr-2是远程主机,因此使用check_nrpe_args命令来监控.
这个文件中已经将常用的监控项配置进去, 同时还包含了hadoop、storm、zookeeper、metaq、redis的相关进程监控,主要的监控思路是判断进程是否存在。
定义duangr-3主机的监控配置
vi duangr-3.cfg
内容与duangr-2.cfg类似,只需要修改 host_name 、alias、 address即可.
7.2.4.3 邮件监控
定义监控人邮件地址
vi /usr/local/nagios/etc/objects/contacts.cfg
| 2 |
contact_name nagiosadmin ; Short name of user |
| 3 |
use generic-contact ; Inherit default values from generic-contact template (defined above) |
| 4 |
alias Nagios Admin ; Full name of user |
| 6 |
; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** |
除了配置监控邮件的接收人外,还要确保:
7.2.4.4 校验配置
| 1 |
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg |
7.2.4.5 启动
| 1 |
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg |
nagios已经是一个服务,也可以执行如下操作:
| 1 |
service nagios start/stop/restart/status |
8. 监控页面
http://192.168.56.10/nagios
9. 相关链接