【发布时间】:2016-02-18 11:23:31
【问题描述】:
我已将我的服务器从 apache2+fcgi 移动到 nginx+fpm,因为我想要一个更轻的环境,而 apache 的内存占用量很大。服务器是具有 8G 内存的 双核(我知道,不是很多)。它还运行一个相当繁忙的 FreeRadius 服务器和相关的 MySQL。 CPU 平均负载约为 1,有一些明显的峰值。
当我从某些受控设备收到 web pings 时,每 30 分钟就会出现一次峰值。使用 Apache,服务器负载急剧增加,一切都变慢了。现在使用 nginx 的过程要快得多(我还在代码中做了一些优化),现在很难我错过了其中的一些连接。我将 nginx 和 fpm 都配置为我认为应该足够的配置,但我必须遗漏一些东西,因为在这些时刻 php (显然)无法回复 nginx。这是配置的回顾:
nginx/1.8.1
user www-data;
worker_processes auto;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
# multi_accept on;
}
client_body_buffer_size 10K;
client_header_buffer_size 1k;
client_max_body_size 20m;
large_client_header_buffers 2 1k;
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(.*)$;
set $fsn /$yii_bootstrap;
if (-f $document_root$fastcgi_script_name){
set $fsn $fastcgi_script_name;
}
fastcgi_pass 127.0.0.1:9011;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fsn;
fastcgi_param PATH_INFO $fastcgi_path_info;
fastcgi_param PATH_TRANSLATED $document_root$fsn;
fastcgi_read_timeout 150s;
}
php5-fpm 5.4.45-1~dotdeb+6.1
[pool01]
listen = 127.0.0.1:9011
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 150
pm.start_servers = 2
pm.min_spare_servers = 2
pm.max_spare_servers = 8
pm.max_requests = 2000
pm.process_idle_timeout = 10s
当峰值到来时,我开始在 fpm 日志中看到这一点:
[18-Feb-2016 11:30:04] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 c
hildren, there are 0 idle, and 13 total children
[18-Feb-2016 11:30:05] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16
children, there are 0 idle, and 15 total children
[18-Feb-2016 11:30:06] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32
children, there are 0 idle, and 17 total children
[18-Feb-2016 11:30:07] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32
children, there are 0 idle, and 19 total children
在 nginx 的 error.log 中更糟
2016/02/18 11:30:22 [error] 23400#23400: *209920 connect() failed (110: Connection timed out) while connecting to upstream, client: 79.1.1.9,
server: host.domain.com, request: "GET /ping/?whoami=abc02 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209923 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.1.9.71,
server: host.domain.com, request: "GET /utilz/pingme.php?whoami=abc01 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209925 connect() failed (110: Connection timed out) while connecting to upstream, client: 3.7.0.4,
server: host.domain.com, request: "GET /ping/?whoami=abc03 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209926 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.7.2.1
, server: host.domain.com, request: "GET /ping/?whoami=abc04 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
那些连接丢失了!
第一个问题,如果 fastcgi_read_timeout 设置为 150 秒,为什么 nginx 在 22 秒内返回超时(pings 是在每小时的 00 和 30 分钟进行的)?
第二个问题:为什么我会收到这么多 fpm 警告?显示的孩子总数从未达到pm.max_children。我知道警告不是错误,但我得到警告...这些消息和 nginx 的超时之间有关系吗?
鉴于服务器可以完美地处理常规流量,并且在这些高峰时段它对 ram 和 swap 都没有问题(它总是有大约 1.5G 或更多空闲),是否有更好的调整来处理这些 ping 连接(不涉及更改时间表)?我应该提高pm.start_servers 和/或pm.min_spare_servers 吗?
【问题讨论】:
-
如果 php 没有响应 nginx,与上游的连接将超时,即使 fpm 的超时时间更长。此超时由 nginx 决定,而不是上游提供者(因为它可能已关闭)。
-
根据文档fastcgi_read_timeout是nginx等待上游服务器的时间...
-
但
fastcgi_connect_timeout可能是有趣的配置! -
抱歉,我想的是
fastcgi_connect_timeout,而不是fastcgi_read_timeout。最新的只适用于php回复且耗时太长的情况,而第一种显然是启动连接。 -
看起来你正在执行 fast_cgi 即使是静态文件,所以 php 可能会过载。我没有看到任何位置块,所以我不确定。