【问题标题】：Gunicorn does not repondes more than 6 requests at a timeGunicorn 一次响应的请求不超过 6 个
【发布时间】：2018-03-23 16:15:05
【问题描述】：

给你一些背景：

我有两个服务器环境运行同一个应用程序。第一个是我打算放弃的标准 Google App Engine 环境，它有很多限制。第二个是使用 Gunicorn 运行我的 Python 应用程序的 Google Kubernetes 集群。

并发

在第一台服务器上，我可以向应用程序发送多个请求，它会同时响应多个请求。我在两个环境中对应用程序运行两批同时请求。在 Google App Engine 上，第一批和第二批同时响应，第一批不会阻止第二批。

在 Kubernetes，服务器只同时响应 6 个，第一批阻塞第二批。我读过一些关于如何使用 gevent 或多线程实现 Gunicorn 并发的帖子，他们都说我需要 CPU 内核，但问题是无论我投入多少 cpu，限制仍在继续。我试过从 1VCPU 到 8VCPU 的 Google 节点，变化不大。

你们能给我一些关于我可能缺少什么的想法吗？也许是谷歌集群节点限制？

Kubernetes 响应瀑布

如您所见，第二批仅在第一批开始完成后才开始响应。

App Engine 响应瀑布

【问题讨论】：

关于设置的问题 - 在这两种情况下，所有请求都针对同一个域执行？
是的！全部。
这可能是相关的：stackoverflow.com/questions/8404464/… 如果需要，可以共享请求和响应标头。我有一种感觉，我知道会发生什么。
如果问题出在 Chrome 上，为什么它会在 AppEngine 上起作用？
我猜一个上的流量是 HTTP 1.1，另一个是 HTTP 2.0，这会影响建立连接的方式。如果你能揭开标题，它会有所帮助:) 见这里：daniel.haxx.se/blog/2016/08/18/http2-connection-coalescing

标签： python concurrency kubernetes gunicorn

【解决方案1】：

您所描述的似乎表明您使用 sync worker 类为 I/O 绑定应用程序运行 Gunicorn 服务器。你能分享你的 Gunicorn 配置吗？

是否有可能 Google 的平台有某种自动缩放功能（我不太熟悉他们的服务），而您的 Kubernetes 配置没有触发？

一般来说，增加单个实例的核心数量只有在您还增加为处理传入请求而产生的工作人员数量时才会有所帮助。请参阅Gunicorn's design documentation，其中特别强调了工作器类型部分（以及为什么sync 工作器对于 I/O 绑定的应用程序不是最理想的） - 这是一本很好的读物，并提供了有关此问题的更详细说明。

只是为了好玩，这里有一个小练习来比较这两种方法：

import time

def app(env, start_response):
    time.sleep(1) # takes 1 second to process the request
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return [b'Hello World']

使用 4 个同步工作者运行 Gunicorn：gunicorn --bind '127.0.0.1:9001' --workers 4 --worker-class sync --chdir app app:app

让我们同时触发8个请求：ab -n 8 -c 8 "http://localhost:9001/"

This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        gunicorn/19.8.1
Server Hostname:        localhost
Server Port:            9001

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      8
Time taken for tests:   2.007 seconds
Complete requests:      8
Failed requests:        0
Total transferred:      1096 bytes
HTML transferred:       88 bytes
Requests per second:    3.99 [#/sec] (mean)
Time per request:       2006.938 [ms] (mean)
Time per request:       250.867 [ms] (mean, across all concurrent requests)
Transfer rate:          0.53 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.2      1       1
Processing:  1003 1504 535.7   2005    2005
Waiting:     1002 1504 535.8   2005    2005
Total:       1003 1505 535.8   2006    2006

Percentage of the requests served within a certain time (ms)
  50%   2006
  66%   2006
  75%   2006
  80%   2006
  90%   2006
  95%   2006
  98%   2006
  99%   2006
 100%   2006 (longest request)

完成测试大约需要 2 秒。这就是您在测试中遇到的行为 - 前 4 个请求让您的工作人员忙碌，第二批排队等待第一批处理完毕。

同样的测试，但让我们告诉 Gunicorn 使用异步工作者：unicorn --bind '127.0.0.1:9001' --workers 4 --worker-class gevent --chdir app app:app

与上述相同的测试：ab -n 8 -c 8 "http://localhost:9001/"

This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        gunicorn/19.8.1
Server Hostname:        localhost
Server Port:            9001

Document Path:          /
Document Length:        11 bytes

Concurrency Level:      8
Time taken for tests:   1.005 seconds
Complete requests:      8
Failed requests:        0
Total transferred:      1096 bytes
HTML transferred:       88 bytes
Requests per second:    7.96 [#/sec] (mean)
Time per request:       1005.463 [ms] (mean)
Time per request:       125.683 [ms] (mean, across all concurrent requests)
Transfer rate:          1.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.4      1       2
Processing:  1002 1003   0.6   1003    1004
Waiting:     1001 1003   0.9   1003    1004
Total:       1002 1004   0.9   1004    1005

Percentage of the requests served within a certain time (ms)
  50%   1004
  66%   1005
  75%   1005
  80%   1005
  90%   1005
  95%   1005
  98%   1005
  99%   1005
 100%   1005 (longest request)

实际上，我们在这里将应用程序的吞吐量翻了一番——回复所有请求只用了大约 1 秒。

要了解发生了什么，Gevent 有一个关于其架构的great tutorial，this article 有一个关于协程的更深入的解释。

如果对您的问题的实际原因有疑问，我提前道歉（我相信您最初的评论中缺少一些额外的信息，任何人都无法得到结论性的答案）。如果不是你，我希望这对其他人有帮助。 :)

还请注意，我已经过分简化了很多事情（我的示例是一个简单的概念证明），调整 HTTP 服务器配置主要是试错练习 - 这完全取决于应用程序的工作负载类型和它所在的硬件。

【讨论】：