【问题标题】:Monit EXEC not working when monitored process dies当受监控的进程终止时,Monit EXEC 无法正常工作
【发布时间】:2015-12-10 08:59:10
【问题描述】:

在 FreeBSD 10.2 上使用 Monit 5.15:

set daemon  5
set logfile syslog
set pidfile /var/run/monit.pid
set idfile /var/.monit.id
set statefile /var/.monit.state
set alert x@y.z
set mailserver localhost
set httpd port 2812 and
     use address 192.168.40.72
     allow 192.168.20.0/24
     allow admin:monit

check process haproxy with pidfile /var/run/haproxy.pid
     if failed host 192.168.40.72 port 9090 type tcp
       then exec "/bin/sh -c '/bin/echo `/bin/date` >> /tmp/monit.test'"

当我使用 -vI 运行 monit 并杀死 haproxy 时,我有以下输出:

Adding net allow '192.168.20.0/24'
Adding credentials for user 'admin'
Runtime constants:
 Control file       = /usr/local/etc/monitrc
 Log file           = syslog
 Pid file           = /var/run/monit.pid
 Id file            = /var/.monit.id
 State file         = /var/.monit.state
 Debug              = True
 Log                = True
 Use syslog         = True
 Is Daemon          = True
 Use process engine = True
 Poll time          = 5 seconds with start delay 0 seconds
 Expect buffer      = 256 bytes
 Mail server(s)     = localhost:25 with timeout 30 seconds
 Mail from          = (not defined)
 Mail subject       = (not defined)
 Mail message       = (not defined)
 Start monit httpd  = True
 httpd bind address = 192.168.40.72
 httpd portnumber   = 2812
 httpd ssl          = Disabled
 httpd signature    = Enabled
 httpd auth. style  = Basic Authentication and Host/Net allow list
 Alert mail to      = root@localhost
   Alert on         = All events

The service list contains the following entries:

Process Name          = haproxy
 Pid file             = /var/run/haproxy.pid
 Monitoring mode      = active
 Existence            = if does not exist then restart
 Port                 = if failed [192.168.40.72]:9090 type TCP/IP protocol DEFAULT with timeout 5 seconds then exec '/bin/sh -c /bin/echo `/bin/date` >> /tmp/monit.test'

System Name           = appsrv01
 Monitoring mode      = active

-------------------------------------------------------------------------------
pidfile '/var/run/monit.pid' does not exist
Starting Monit 5.15 daemon with http interface at [192.168.40.72]:2812
Starting Monit HTTP server at [192.168.40.72]:2812
Monit HTTP server started
'appsrv01' Monit 5.15 started
Sending Monit instance changed notification to root@localhost
'haproxy' process is running with pid 42999
'haproxy' zombie check succeeded
'haproxy' succeeded testing protocol [DEFAULT] at [192.168.40.72]:9090 [TCP/IP]
'haproxy' connection succeeded to [192.168.40.72]:9090 [TCP/IP]
'haproxy' process is running with pid 42999
'haproxy' zombie check succeeded
'haproxy' succeeded testing protocol [DEFAULT] at [192.168.40.72]:9090 [TCP/IP]
'haproxy' connection succeeded to [192.168.40.72]:9090 [TCP/IP]
'haproxy' process is running with pid 42999
'haproxy' zombie check succeeded
'haproxy' succeeded testing protocol [DEFAULT] at [192.168.40.72]:9090 [TCP/IP]
'haproxy' connection succeeded to [192.168.40.72]:9090 [TCP/IP]
'haproxy' process test failed [pid=42999] -- No such process
'haproxy' process is not running
Sending Does not exist notification to root@localhost
'haproxy' trying to restart
'haproxy' stop skipped -- method not defined
'haproxy' start method not defined
'haproxy' monitoring enabled
'haproxy' process test failed [pid=42999] -- No such process
'haproxy' process is not running
'haproxy' trying to restart
'haproxy' stop skipped -- method not defined
'haproxy' start method not defined
'haproxy' monitoring enabled
^CShutting down Monit HTTP server
Monit HTTP server stopped
Monit daemon with pid [48685] stopped
'appsrv01' Monit 5.15 stopped
Sending Monit instance changed notification to root@localhost

EXEC 行永远不会被执行,我在 /tmp/monit.test 中没有看到任何新行

如果我将选中的端口从 9090 更改为某个无效端口,比如说 9190 并启动 monit(haproxy 正在运行!),我明白了:

Starting Monit 5.15 daemon with http interface at [192.168.40.72]:2812
Starting Monit HTTP server at [192.168.40.72]:2812
Monit HTTP server started
'appsrv01' Monit 5.15 started
Sending Monit instance changed notification to root@localhost
'haproxy' process is running with pid 50703
'haproxy' zombie check succeeded
Socket test failed for [192.168.40.72]:9190 -- Connection refused
'haproxy' failed protocol test [DEFAULT] at [192.168.40.72]:9190 [TCP/IP] -- Connection refused
Sending Connection failed notification to root@localhost
'haproxy' exec: /bin/sh
'haproxy' process is running with pid 50703
'haproxy' zombie check succeeded
Socket test failed for [192.168.40.72]:9190 -- Connection refused
'haproxy' failed protocol test [DEFAULT] at [192.168.40.72]:9190 [TCP/IP] -- Connection refused
'haproxy' exec: /bin/sh

为什么 EXEC 行在这里工作,但当我杀死 -9 haproxy 时却不行? 我想要做的是让 monit 运行 exec 以防 haproxy 失败。然后 exec 行将包含将 CARP IP 切换到另一台主机的命令。 haproxy 本身是使用 zabbix 监控的,所以 NOC 可以稍后调查失败的原因。

【问题讨论】:

    标签: monit


    【解决方案1】:

    当您kill -9 haproxy 时,您正在杀死守护进程。因此,当 monit 执行此“检查进程”块时,它会检测到该进程不存在并重新启动该进程。它不会对该端口执行检查,因为它发现该进程不存在。

    当你给它一个无效的端口时它可以工作,因为进程仍然存在。当它执行端口检查时,它将失败并运行脚本。

    你应该在这个检查块中添加一个额外的行,上面写着

    check process haproxy with pidfile /var/run/haproxy.pid
         if failed host 192.168.40.72 port 9090 type tcp 
             then exec "/bin/sh -c '/bin/echo `/bin/date` >> /tmp/monit.test'"
         if restarted then exec "/bin/sh -c '/bin/echo `/bin/date` >>/tmp/monit.test'"
    

    这应该在重启和失败的主机上运行 shell 命令。

    【讨论】:

    • 嗨多米尼克,我试过if restarted,但它似乎不起作用,if 1 restart within 1 cycle 为我工作。无论如何,谢谢!
    猜你喜欢
    • 2015-09-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-05-05
    • 1970-01-01
    • 1970-01-01
    • 2016-10-24
    相关资源
    最近更新 更多