您所描述的似乎表明您使用 sync worker 类为 I/O 绑定应用程序运行 Gunicorn 服务器。你能分享你的 Gunicorn 配置吗?
是否有可能 Google 的平台有某种自动缩放功能(我不太熟悉他们的服务),而您的 Kubernetes 配置没有触发?
一般来说,增加单个实例的核心数量只有在您还增加为处理传入请求而产生的工作人员数量时才会有所帮助。请参阅Gunicorn's design documentation,其中特别强调了工作器类型部分(以及为什么sync 工作器对于 I/O 绑定的应用程序不是最理想的) - 这是一本很好的读物,并提供了有关此问题的更详细说明。
只是为了好玩,这里有一个小练习来比较这两种方法:
import time
def app(env, start_response):
time.sleep(1) # takes 1 second to process the request
start_response('200 OK', [('Content-Type', 'text/plain')])
return [b'Hello World']
使用 4 个同步工作者运行 Gunicorn:gunicorn --bind '127.0.0.1:9001' --workers 4 --worker-class sync --chdir app app:app
让我们同时触发8个请求:ab -n 8 -c 8 "http://localhost:9001/"
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: gunicorn/19.8.1
Server Hostname: localhost
Server Port: 9001
Document Path: /
Document Length: 11 bytes
Concurrency Level: 8
Time taken for tests: 2.007 seconds
Complete requests: 8
Failed requests: 0
Total transferred: 1096 bytes
HTML transferred: 88 bytes
Requests per second: 3.99 [#/sec] (mean)
Time per request: 2006.938 [ms] (mean)
Time per request: 250.867 [ms] (mean, across all concurrent requests)
Transfer rate: 0.53 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.2 1 1
Processing: 1003 1504 535.7 2005 2005
Waiting: 1002 1504 535.8 2005 2005
Total: 1003 1505 535.8 2006 2006
Percentage of the requests served within a certain time (ms)
50% 2006
66% 2006
75% 2006
80% 2006
90% 2006
95% 2006
98% 2006
99% 2006
100% 2006 (longest request)
完成测试大约需要 2 秒。这就是您在测试中遇到的行为 - 前 4 个请求让您的工作人员忙碌,第二批排队等待第一批处理完毕。
同样的测试,但让我们告诉 Gunicorn 使用异步工作者:unicorn --bind '127.0.0.1:9001' --workers 4 --worker-class gevent --chdir app app:app
与上述相同的测试:ab -n 8 -c 8 "http://localhost:9001/"
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient).....done
Server Software: gunicorn/19.8.1
Server Hostname: localhost
Server Port: 9001
Document Path: /
Document Length: 11 bytes
Concurrency Level: 8
Time taken for tests: 1.005 seconds
Complete requests: 8
Failed requests: 0
Total transferred: 1096 bytes
HTML transferred: 88 bytes
Requests per second: 7.96 [#/sec] (mean)
Time per request: 1005.463 [ms] (mean)
Time per request: 125.683 [ms] (mean, across all concurrent requests)
Transfer rate: 1.06 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.4 1 2
Processing: 1002 1003 0.6 1003 1004
Waiting: 1001 1003 0.9 1003 1004
Total: 1002 1004 0.9 1004 1005
Percentage of the requests served within a certain time (ms)
50% 1004
66% 1005
75% 1005
80% 1005
90% 1005
95% 1005
98% 1005
99% 1005
100% 1005 (longest request)
实际上,我们在这里将应用程序的吞吐量翻了一番——回复所有请求只用了大约 1 秒。
要了解发生了什么,Gevent 有一个关于其架构的great tutorial,this article 有一个关于协程的更深入的解释。
如果对您的问题的实际原因有疑问,我提前道歉(我相信您最初的评论中缺少一些额外的信息,任何人都无法得到结论性的答案)。如果不是你,我希望这对其他人有帮助。 :)
还请注意,我已经过分简化了很多事情(我的示例是一个简单的概念证明),调整 HTTP 服务器配置主要是试错练习 - 这完全取决于应用程序的工作负载类型和它所在的硬件。