nginx：到 php5-fpm 的连接超时答案

【问题标题】：nginx: connection timeout to php5-fpmnginx：到 php5-fpm 的连接超时
【发布时间】：2016-02-18 11:23:31
【问题描述】：

我已将我的服务器从 apache2+fcgi 移动到 nginx+fpm，因为我想要一个更轻的环境，而 apache 的内存占用量很大。服务器是具有 8G 内存的双核（我知道，不是很多）。它还运行一个相当繁忙的 FreeRadius 服务器和相关的 MySQL。 CPU 平均负载约为 1，有一些明显的峰值。

当我从某些受控设备收到 web pings 时，每 30 分钟就会出现一次峰值。使用 Apache，服务器负载急剧增加，一切都变慢了。现在使用 nginx 的过程要快得多（我还在代码中做了一些优化），现在很难我错过了其中的一些连接。我将 nginx 和 fpm 都配置为我认为应该足够的配置，但我必须遗漏一些东西，因为在这些时刻 php （显然）无法回复 nginx。这是配置的回顾：

nginx/1.8.1

user www-data;
worker_processes auto;
pid /var/run/nginx.pid;

events {
        worker_connections  1024;
        # multi_accept on;
}

client_body_buffer_size 10K;
client_header_buffer_size 1k; 
client_max_body_size 20m;
large_client_header_buffers 2 1k; 
location ~ \.php$ {
  fastcgi_split_path_info  ^(.+\.php)(.*)$;
  set $fsn /$yii_bootstrap;
  if (-f $document_root$fastcgi_script_name){
    set $fsn $fastcgi_script_name;
  }

  fastcgi_pass 127.0.0.1:9011;
  include fastcgi_params;
  fastcgi_param  SCRIPT_FILENAME  $document_root$fsn;
  fastcgi_param  PATH_INFO        $fastcgi_path_info;
  fastcgi_param  PATH_TRANSLATED  $document_root$fsn;
  fastcgi_read_timeout 150s;
}

php5-fpm 5.4.45-1~dotdeb+6.1

[pool01]
listen = 127.0.0.1:9011
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 150 
pm.start_servers = 2 
pm.min_spare_servers = 2 
pm.max_spare_servers = 8 
pm.max_requests = 2000
pm.process_idle_timeout = 10s

当峰值到来时，我开始在 fpm 日志中看到这一点：

[18-Feb-2016 11:30:04] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 c
hildren, there are 0 idle, and 13 total children
[18-Feb-2016 11:30:05] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 
children, there are 0 idle, and 15 total children
[18-Feb-2016 11:30:06] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 
children, there are 0 idle, and 17 total children
[18-Feb-2016 11:30:07] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 
children, there are 0 idle, and 19 total children

在 nginx 的 error.log 中更糟

2016/02/18 11:30:22 [error] 23400#23400: *209920 connect() failed (110: Connection timed out) while connecting to upstream, client: 79.1.1.9, 
server: host.domain.com, request: "GET /ping/?whoami=abc02 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209923 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.1.9.71, 
server: host.domain.com, request: "GET /utilz/pingme.php?whoami=abc01 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209925 connect() failed (110: Connection timed out) while connecting to upstream, client: 3.7.0.4,
 server: host.domain.com, request: "GET /ping/?whoami=abc03 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209926 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.7.2.1
, server: host.domain.com, request: "GET /ping/?whoami=abc04 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"

那些连接丢失了！

第一个问题，如果 fastcgi_read_timeout 设置为 150 秒，为什么 nginx 在 22 秒内返回超时（pings 是在每小时的 00 和 30 分钟进行的）？

第二个问题：为什么我会收到这么多 fpm 警告？显示的孩子总数从未达到pm.max_children。我知道警告不是错误，但我得到警告...这些消息和 nginx 的超时之间有关系吗？

鉴于服务器可以完美地处理常规流量，并且在这些高峰时段它对 ram 和 swap 都没有问题（它总是有大约 1.5G 或更多空闲），是否有更好的调整来处理这些 ping 连接（不涉及更改时间表）？我应该提高pm.start_servers 和/或pm.min_spare_servers 吗？

【问题讨论】：

如果 php 没有响应 nginx，与上游的连接将超时，即使 fpm 的超时时间更长。此超时由 nginx 决定，而不是上游提供者（因为它可能已关闭）。
根据文档fastcgi_read_timeout是nginx等待上游服务器的时间...
但fastcgi_connect_timeout 可能是有趣的配置！
抱歉，我想的是fastcgi_connect_timeout，而不是fastcgi_read_timeout。最新的只适用于php回复且耗时太长的情况，而第一种显然是启动连接。
看起来你正在执行 fast_cgi 即使是静态文件，所以 php 可能会过载。我没有看到任何位置块，所以我不确定。

标签： php nginx

【解决方案1】：

您需要一些更改 + 我建议将您的 php 升级到 5.6。

Nginx 调优：/etc/nginx/nginx.conf

user www-data;
pid /var/run/nginx.pid;
error_log /var/log/nginx/error.log crit;

# NOTE: Max simultaneous requests =    worker_processes*worker_connections/(keepalive_timeout*2)
worker_processes 1;
worker_rlimit_nofile 750000;

# handles connection stuff
events { 
worker_connections 50000;
multi_accept on;
use epoll;
}


# http request stuff
http {
access_log off;
log_format  main  '$remote_addr $host $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $ssl_cipher $request_time';

types_hash_max_size 2048;
server_tokens off;
fastcgi_read_timeout 180;
keepalive_timeout 20;
keepalive_requests 1000;
reset_timedout_connection on;
client_body_timeout 20;
client_header_timeout 10;
send_timeout 10;
tcp_nodelay on;
tcp_nopush on;
sendfile on; 
directio 100m;
client_max_body_size 100m;
server_names_hash_bucket_size 100;
include /etc/nginx/mime.types;
default_type application/octet-stream;

# index default files
index              index.html index.htm index.php;

# use hhvm with php-fpm as backup
upstream php {
        keepalive 30;
        server 127.0.0.1:9001; # php5-fpm (check your port)
}

# Virtual Host Configs
include /etc/nginx/sites-available/*;

}

对于默认服务器，创建并添加到 /etc/nginx/sites-available/default.conf

# default virtual host
server {
listen   80;
server_name localhost;
root /path/to/your/files;
access_log        off;
log_not_found     off;

# handle staic files first
location / {
index index.html index.htm index.php ;
}

# serve static content directly by nginx without logs
location ~* \.(jpg|jpeg|gif|png|bmp|css|js|ico|txt|pdf|swf|flv|mp4|mp3)$ {
access_log off;
log_not_found off;
expires   7d;

# Enable gzip for some static content only
gzip on;
gzip_comp_level 6;
gzip_vary on;
gzip_types text/plain text/css application/json application/x-javascript application/javascript text/javascript image/svg+xml application/vnd.ms-fontobject application/x-font-ttf font/opentype;
}

# no cache for xml files
location ~* \.(xml)$ {
access_log off;
log_not_found off;
expires   0s;
add_header Pragma no-cache;
add_header Cache-Control "no-cache, no-store, must-revalidate, post-check=0, pre-check=0";
gzip on;
gzip_comp_level 6;
gzip_vary on;
gzip_types text/plain text/xml application/xml application/rss+xml;
}


# run php only when needed
location ~ .php$ {
# basic php params
fastcgi_pass php; 
fastcgi_index   index.php;
fastcgi_keep_conn on;
fastcgi_connect_timeout 20s; 
fastcgi_send_timeout 30s; 
fastcgi_read_timeout 30s;

# fast cgi params
include fastcgi_params;
fastcgi_param  SCRIPT_FILENAME    $document_root$fastcgi_script_name;
fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
}

}

理想情况下，如果 php5-fpm 开始失败，您希望它自动重启，因此您可以在 /etc/php5/fpm/php-fpm.conf 中这样做

emergency_restart_threshold = 60
emergency_restart_interval = 1m
process_control_timeout = 10s

更改/etc/php5/fpm/pool.d/www.conf

[www]
user = www-data
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
listen = 127.0.0.1:9001
listen.allowed_clients = 127.0.0.1
listen.backlog = 65000
pm = dynamic
pm.start_servers = 8
pm.min_spare_servers = 4
pm.max_spare_servers = 16

; maxnumber of simultaneous requests that will be served (if each php page needs 32 Mb, then 128x32 = 4G RAM)
pm.max_children = 128

; We want to keep it hight (10k to 50k) to prevent server respawn, however if there are memory leak on PHP code we will have a problem.
pm.max_requests = 10000

【讨论】：

使用 5.6 有什么好处？