【问题标题】:Automatic process monitoring/management with Python使用 Python 进行自动流程监控/管理
【发布时间】:2015-10-04 00:55:32
【问题描述】:

是的,所以我有一个持续运行的 python 进程,甚至可能在 Supervisor 上。实现以下监控的最佳方式是什么?

  • 如果进程崩溃,发送警报并重新启动。我想在每次进程崩溃时自动接收信号并自动重新启动它。
  • 如果进程过时(即 1 分钟内没有处理任何内容)发送警报并重新启动。
  • 按需重启

我希望通过 Python 实现以上所有功能。我知道 Supervisord 会完成大部分工作,但我想看看是否可以通过 Python 本身完成。

【问题讨论】:

    标签: python ubuntu process subprocess monitoring


    【解决方案1】:

    我认为您正在寻找的是主管事件。 http://supervisord.org/events.html

    还可以查看 Superlance,它是一个插件实用程序包,用于监视和控制在 supervisor 下运行的进程。 [https://superlance.readthedocs.org/en/latest/]

    您可以配置崩溃电子邮件、崩溃短信、内存消耗警报、HTTP 挂钩等内容。

    【讨论】:

      【解决方案2】:

      好吧,如果你想要一个本土解决方案,这就是我能想到的。

      在 redis 中维护实际和预期的进程状态。您可以通过制作 Web 界面来检查实际状态并更改预期状态,以您想要的方式对其进行监控。

      在 crontab 中运行 python 脚本以检查状态并在需要时采取适当的措施。在这里,我每 3 秒检查一次,并使用 SES 通过电子邮件提醒管理员。

      免责声明:代码尚未运行或测试。我现在才写,太容易出错了。

      打开 crontab 文件:

      $crontab -e
      

      在其末尾添加这一行,使 run_process.sh 每分钟运行一次。

      #Runs this process every 1 minute.
      */1 * * * * bash ~/path/to/run_monitor.sh
      

      run_moniter.sh 运行 python 脚本。它每 3 秒运行一次 for 循环。

      这样做是因为 crontab 给出的最小时间间隔为 1 分钟。我们希望每 3 秒检查一次进程,共 20 次(3 秒 * 20 = 1 分钟)。所以它会在 crontab 再次运行之前运行一分钟。

      run_monitor.sh

      for count in {0..20}
      do
          cd '/path/to/check_status'
          /usr/local/bin/python check_status.py "myprocessname" "python startcommand.py"
          sleep 3 #check every 3 seconds.
      done
      

      这里我假设:

      *state 0 = 停止或停止(预期与实际)

      *state -1 = 重启

      *状态 1 = 运行或运行

      您可以根据自己的方便添加更多状态,陈旧的过程也可以是一个状态。

      我已经使用 processname 来杀死或启动或检查进程,您可以轻松修改它以读取特定的 PID 文件。

      check_status.py

      import sys
      import redis
      import subprocess
      
      import sys
      import boto.ses
      
      
      def send_mail(recipients, message_subject, message_body):
          """
          uses AWS SES to send mail.
          """
          SENDER_MAIL = 'xxx@yyy.com'
          AWS_KEY = 'xxxxxxxxxxxxxxxxxxx'
          AWS_SECRET = 'xxxxxxxxxxxxxxxxxxx'
          AWS_REGION = 'xx-xxxx-x'
      
          mail_conn = boto.ses.connect_to_region(AWS_REGION, 
                                                 aws_access_key_id=AWS_KEY, 
                                                 aws_secret_access_key=AWS_SECRET
                                                 )
      
          mail_conn.send_email(SENDER_MAIL, message_subject, message_body, recipient, format='html')
          return True
      
      class Shell(object):
          '''
          Convinient Wrapper over Subprocess.
          '''
          def __init__(self, command, raise_on_error=True):
              self.command = command
              self.output = None
              self.error = None
              self.return_code
      
          def run(self):
              try:
                  process = subprocess.Popen(self.command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
                  self.return_code = process.wait()
                  self.output, self.error = process.communicate()
                  if self.return_code and self.raise_on_error:
                      print self.error
                      raise Exception("Error while executing %s::%s"%(self.command, self.error))    
              except subprocess.CalledProcessError:
                  print self.error
                  raise Exception("Error while executing %s::%s"%(self.command, self.error))
      
      
      redis_client = redis.Redis('xxxredis_hostxxx')
      
      def get_state(process_name, state_type): #state_type will be expected or actual.
          state = redis.get('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type)) #value could be 0 or 1
          return state
      
      def set_state(process_name, state_type, state): #state_type will be expected or actual.
          state = redis.set('{process_name}_{state_type}_state'.format(process_name=process_name, state_type=state_type), state)
          return state
      
      def get_stale_state(process_name):
          state = redis.get('{process_name}_stale_state'.format(process_name=process_name)) #value could be 0 or 1
          return state
      
      def check_running_status(process_name):
          command = "ps -ef|grep {process_name}|wc -l".format(process_name=process_name)
          shell = Shell(command = command)
          shell.run()
          if shell.output=='0':
              return False
          return True
      
      def start_process(start_command): #pass start_command with a '&' so the process starts in the background.
          shell = Shell(command = command)
          shell.run()
      
      def stop_process(process_name):
          command = "ps -ef| grep {process_name}| awk '{print $2}'".format(process_name=process_name)
          shell = Shell(command = command, raise_on_error=False)
          shell.run()
          if not shell.output:
              return
          process_ids = shell.output.strip().split()
          for process_id in process_ids:
              command = 'kill {process_id}'.format(process_id=process_id)
              shell = Shell(command=command, raise_on_error=False)
              shel.run()
      
      
      def check_process(process_name, start_command):
          expected_state = get_state(process_name, 'expected')
          if expected_state == 0: #stop
              stop_process(process_name)
              set_state(process_name, 'actual', 0)
      
          else if expected_state == -1: #restart
              stop_process(process_name)
              set_state(process_name, 'actual', 0)
              start_process(start_command)
              set_state(process_name, 'actual', 1)
              set_state(process_name, 'expected', 1) #set expected back to 1 so we dont keep on restarting.
      
          elif expected_state == 1:
              running = check_running_status(process_name)
              if not running:
                  set_state(process_name, 'actual', 0)
                  send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is Down. Trying to restart")
                  start_process(start_command)
                  running = check_running_status(process_name)
                  if running:
                      send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is was restarted.")
                      set_state(process_name, 'actual', 1)
                  else:
                      send_mail(reciepients=["abc@admin.com", "xyz@admin.com"], message_subject="Alert", message_body="Your process is could not be restarted.")
      
      
      if __name__ == '__main__':
          args = sys.argv[1:]
          process_name = args[0]
          start_command = args[1]
          check_process(process_name, start_command)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2012-11-15
        • 2020-02-08
        • 1970-01-01
        • 2014-02-10
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多