【问题标题】:Is it possible to get a stack trace when a gunicorn worker hits a timeout?当 gunicorn 工作人员超时时,是否可以获得堆栈跟踪?
【发布时间】:2019-12-01 16:22:12
【问题描述】:

具体来说,我正在使用 gunicorn 中的默认工作人员运行 Flask 应用程序。我试图弄清楚如何调试/跟踪当工作人员在服务请求时因超时而被杀死时发生的事情。有没有办法获取堆栈跟踪或分析请求以进行调试?

【问题讨论】:

    标签: flask timeout gunicorn


    【解决方案1】:

    只是想补充一下,通过使用 v3.3 中引入的内置 Python 3 模块 faulthandler https://docs.python.org/dev/library/faulthandler.html,还有另一种方法可以获得类似的结果

    您可以在接受的答案中提到的 wsgi.py 中添加以下两行:

    import faulthandler
    faulthandler.enable()
    

    当超时发生时,以下输出将被打印到标准错误:

    [2020-12-24 13:38:32 +0000] [31304] [INFO] Starting gunicorn 20.0.4
    [2020-12-24 13:38:32 +0000] [31304] [INFO] Listening at: http://0.0.0.0:8888 (31304)
    [2020-12-24 13:38:32 +0000] [31304] [INFO] Using worker: sync
    [2020-12-24 13:38:32 +0000] [31307] [INFO] Booting worker with pid: 31307
    [2020-12-24 13:38:55 +0000] [31304] [CRITICAL] WORKER TIMEOUT (pid:31307)
    Fatal Python error: Aborted
    
    Current thread 0x00007f411d781700 (most recent call first):
      File "/tmp/app.py", line 14 in c
      File "/tmp/app.py", line 9 in b
      File "/tmp/app.py", line 6 in a
      File "/tmp/app.py", line 20 in catch_all
      File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1936 in dispatch_request
      File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 1950 in full_dispatch_request
      File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2447 in wsgi_app
      File "/tmp/venv/lib/python3.5/site-packages/flask/app.py", line 2464 in __call__
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 175 in handle_request
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 134 in handle
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 29 in accept
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 67 in run_for_one
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 123 in run
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/workers/base.py", line 140 in init_process
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583 in spawn_worker
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616 in spawn_workers
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545 in manage_workers
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/arbiter.py", line 202 in run
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 72 in run
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/base.py", line 228 in run
      File "/tmp/venv/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 58 in run
      File "/tmp/venv/bin/gunicorn", line 8 in <module>
    [2020-12-24 13:38:55 +0000] [31307] [INFO] Worker exiting (pid: 31307)
    [2020-12-24 13:38:55 +0000] [31503] [INFO] Booting worker with pid: 31503
    

    【讨论】:

      【解决方案2】:

      老实说,这是一个非常好的问题,我不确定是否有可能实现相同的目标。但是当我开始挖掘时,我发现了很多有趣的线程

      Showing the stack trace from a running Python application

      Get stacktrace from stuck python process

      https://github.com/khamidou/lptrace

      https://gist.github.com/reywood/e221c4061bbf2eccea885c9b2e4ef496

      所以首先我用下面的代码创建了一个简单的烧瓶应用程序

      app.py

      from flask import Flask
      app = Flask(__name__)
      import time
      
      def a():
        b()
      
      def b():
        c ()
      
      def c():
         i = 0
         while i < 900:
             time.sleep(1)
             i += 1
      
      @app.route('/', defaults={'path': ''})
      @app.route('/<path:path>')
      def catch_all(path):
          a()
          return 'You want path: %s' % path
      
      if __name__ == '__main__':
          app.run()
      

      wsgi.py

      from app import app
      
      if __name__ == "__main__":
          app.run()
      

      现在像下面这样运行应用程序并执行curl localhost:8000/abc

      $ gunicorn wsgi:app
      [2019-08-01 08:19:06 +0000] [26825] [INFO] Starting gunicorn 19.9.0
      [2019-08-01 08:19:06 +0000] [26825] [INFO] Listening at: http://127.0.0.1:8000 (26825)
      [2019-08-01 08:19:06 +0000] [26825] [INFO] Using worker: sync
      [2019-08-01 08:19:06 +0000] [26828] [INFO] Booting worker with pid: 26828
      [2019-08-01 08:19:40 +0000] [26825] [CRITICAL] WORKER TIMEOUT (pid:26828)
      [2019-08-01 08:19:40 +0000] [26828] [INFO] Worker exiting (pid: 26828)
      [2019-08-01 08:19:40 +0000] [26832] [INFO] Booting worker with pid: 26832
      

      现在我们需要一个可以在工作被杀死之前调用的钩子。 gunicorn 支持配置文件中的服务器事件

      所以现在我们创建一个配置文件

      gunicorn_config.py

      timeout = 3
      
      def worker_abort(worker):
          pid = worker.pid
          print("worker is being killed - {}".format(pid))
      

      我们现在的输出是

      $ gunicorn -c gunicorn_config.py wsgi:app
      [2019-08-01 08:22:17 +0000] [26837] [INFO] Starting gunicorn 19.9.0
      [2019-08-01 08:22:17 +0000] [26837] [INFO] Listening at: http://127.0.0.1:8000 (26837)
      [2019-08-01 08:22:17 +0000] [26837] [INFO] Using worker: sync
      [2019-08-01 08:22:17 +0000] [26840] [INFO] Booting worker with pid: 26840
      [2019-08-01 08:22:22 +0000] [26837] [CRITICAL] WORKER TIMEOUT (pid:26840)
      worker is being killed - 26840
      [2019-08-01 08:22:22 +0000] [26840] [INFO] Worker exiting (pid: 26840)
      [2019-08-01 08:22:22 +0000] [26844] [INFO] Booting worker with pid: 26844
      

      这很好,现在我们需要混合我们以前对pyrasite 的知识和这个来获得堆栈。所以我们更新了如下配置文件

      gunicorn_config.py

      timeout = 3
      
      __code_dump_stack__ = """
      import sys, traceback
      
      for thread, frame in sys._current_frames().items():
          print('Thread 0x%x' % thread)
          traceback.print_stack(frame)
          print()
      """
      
      def dump_stack_for_process(pid):
          import pyrasite
      
          ipc = pyrasite.PyrasiteIPC(pid)
          ipc.connect()
          print(ipc.cmd(__code_dump_stack__))
          ipc.close()
      
      def worker_abort(worker):
          pid = worker.pid
          print("worker is being killed - {}".format(pid))
          dump_stack_for_process(pid)
      

      现在我们的输出是

      $ [2019-08-01 08:25:29 +0000] [26848] [INFO] Starting gunicorn 19.9.0
      [2019-08-01 08:25:29 +0000] [26848] [INFO] Listening at: http://127.0.0.1:8000 (26848)
      [2019-08-01 08:25:29 +0000] [26848] [INFO] Using worker: sync
      [2019-08-01 08:25:29 +0000] [26851] [INFO] Booting worker with pid: 26851
      [2019-08-01 08:25:38 +0000] [26848] [CRITICAL] WORKER TIMEOUT (pid:26851)
      worker is being killed - 26851
      Thread 0x7ff0a7a4b700
        File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
          self._bootstrap_inner()
        File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
          self.run()
        File "<string>", line 72, in run
        File "<string>", line 92, in on_command
        File "<string>", line 6, in <module>
      
      Thread 0x7ff0ac512700
        File "/home/vagrant/.local/bin/gunicorn", line 11, in <module>
          sys.exit(run())
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
          WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 223, in run
          super(Application, self).run()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/app/base.py", line 72, in run
          Arbiter(self).run()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 203, in run
          self.manage_workers()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
          self.spawn_workers()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
          self.spawn_worker()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
          worker.init_process()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 134, in init_process
          self.run()
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 124, in run
          self.run_for_one(timeout)
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 68, in run_for_one
          self.accept(listener)
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 30, in accept
          self.handle(listener, client, addr)
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 135, in handle
          self.handle_request(listener, req, client, addr)
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 176, in handle_request
          respiter = self.wsgi(environ, resp.start_response)
        File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2463, in __call__
          return self.wsgi_app(environ, start_response)
        File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 2446, in wsgi_app
          response = self.full_dispatch_request()
        File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1949, in full_dispatch_request
          rv = self.dispatch_request()
        File "/home/vagrant/.local/lib/python3.5/site-packages/flask/app.py", line 1935, in dispatch_request
          return self.view_functions[rule.endpoint](**req.view_args)
        File "/home/vagrant/remotedebug/app.py", line 20, in catch_all
          a()
        File "/home/vagrant/remotedebug/app.py", line 6, in a
          b()
        File "/home/vagrant/remotedebug/app.py", line 9, in b
          c ()
        File "/home/vagrant/remotedebug/app.py", line 14, in c
          time.sleep(1)
        File "/home/vagrant/.local/lib/python3.5/site-packages/gunicorn/workers/base.py", line 195, in handle_abort
          self.cfg.worker_abort(self)
        File "gunicorn_config.py", line 23, in worker_abort
          dump_stack_for_process(pid)
        File "gunicorn_config.py", line 17, in dump_stack_for_process
          print(ipc.cmd(__code_dump_stack__))
        File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 161, in cmd
          return self.recv()
        File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 174, in recv
          header_data = self.recv_bytes(4)
        File "/home/vagrant/.local/lib/python3.5/site-packages/pyrasite/ipc.py", line 187, in recv_bytes
          chunk = self.sock.recv(n - len(data))
      
      
      [2019-08-01 08:25:38 +0000] [26851] [INFO] Worker exiting (pid: 26851)
      [2019-08-01 08:25:38 +0000] [26862] [INFO] Booting worker with pid: 26862
      

      堆栈跟踪很大,但它为我们提供了我们需要的东西

      【讨论】:

      • 这里需要黄铁矿吗?从一些实验来看,worker_abort 似乎在工作进程上运行
      • 如果没有pyrasite,您是否能够从中获取堆栈跟踪?
      • 对于我刚刚尝试使用exec(__code_dump_stack__) 而不是几个pyrasite 行并打印正确的堆栈跟踪的价值。文档还提到 worker_abort 在工作进程中运行,所以这是有道理的。
      猜你喜欢
      • 1970-01-01
      • 2015-11-09
      • 1970-01-01
      • 2012-06-27
      • 2018-07-19
      • 1970-01-01
      • 2013-03-02
      • 2011-03-16
      • 1970-01-01
      相关资源
      最近更新 更多