Ping a Specific Port

Question

Grumpy

Asked: 2014-03-24 21:31:36 +0800 CST2014-03-24 21:31:36 +0800 CST 2014-03-24 21:31:36 +0800 CST

nginx upstream timed out. Multiple servers at the same time

772

I have several servers serving a single site.

Main server runs nginx and php-fpm. And all the other servers run php-fpm. The server that runs both nginx and php-fpm connects via a unix socket and the others via tcp.

Roughly once an hour (not exactly, sometimes more frequent), there's a strange behavior. All connection of nginx to php-fpm servers timeout. It fails to make a connection.

2014/03/24 04:59:09 [error] 2123#0: *925153 upstream timed out (110: Connection timed out) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://192.168.1.5:9000", host: "www.example.com", referrer: "http://www.example.com/some/address/here"
2014/03/24 04:59:09 [error] 2124#0: *926742 connect() to unix:/tmp/php-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://unix:/tmp/php-fpm.sock:", host: "www.example.com", referrer: "http://www.example.com/some/address/here"
2014/03/24 04:59:09 [error] 2123#0: *925159 upstream timed out (110: Connection timed out) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://192.168.1.2:9000", host: "www.example.com", referrer: "http://www.example.com/some/address/here"
2014/03/24 04:59:09 [error] 2123#0: *923874 upstream timed out (110: Connection timed out) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://192.168.1.3:9000", host: "www.example.com", referrer: "http://www.example.com/some/address/here"
2014/03/24 04:59:09 [error] 2123#0: *925164 upstream timed out (110: Connection timed out) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://192.168.1.4:9000", host: "www.example.com", referrer: "http://www.example.com/some/address/here"
2014/03/24 04:59:09 [error] 2124#0: *909392 upstream timed out (110: Connection timed out) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://192.168.1.3:9000", host: "www.example.com", referrer: "http://www.example.com/some/address/here"
2014/03/24 04:59:09 [error] 2124#0: *923098 upstream timed out (110: Connection timed out) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://192.168.1.5:9000", host: "www.example.com", referrer: "http://www.example.com/some/address/here"
2014/03/24 04:59:09 [error] 2125#0: *923309 upstream timed out (110: Connection timed out) while connecting to upstream, client: <<client ip removed>>, server: www.example.com, request: "GET /some/address/here HTTP/1.1", upstream: "fastcgi://192.168.1.4:9000", host: "www.example.com", referrer: "http://www.example.com/some/address/here"

As this is a fairly busy site, the log like above gets populated quite fast.

This lasts for roughly 10~15 seconds and everything goes back to normal. Besides the connection timed out errors posted here, there doesn't seem to be any other errors.

I suspect the problem lies with nginx since it happens simultaneously across all the php-fpm servers.

What would cause this? And how could this be resolved?

My nginx config is...

user  nginx;
worker_processes  4;
worker_rlimit_nofile 30000;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  4096;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;

    keepalive_timeout  5;
    fastcgi_buffers 256 4k;
    gzip on;
    gzip_disable     "msie6";

    fastcgi_cache_path /dev/shm/caches/  levels=1:2 keys_zone=zoneone:4000m max_size=4000m inactive=30m;

    fastcgi_temp_path /var/www/tmp 1 2;
    fastcgi_cache_key "$scheme$proxy_host$request_uri";

    fastcgi_connect_timeout 3s;
    limit_req_zone  $binary_remote_addr  zone=limitone:200m   rate=1r/s;
    limit_req_zone  $binary_remote_addr  zone=limitcomic:500m   rate=40r/m;

    upstream partone {
        server unix:/tmp/php-fpm.sock;
    }

    upstream parttwo {
        server 192.168.1.3:9000 weight=10 max_fails=0 fail_timeout=2s;
        server 192.168.1.4:9000 weight=10 max_fails=0 fail_timeout=2s;
        server 192.168.1.5:9000 weight=10 max_fails=0 fail_timeout=2s;
    }

    upstream parttre {
        server 192.168.1.2:9000 weight=8 max_fails=0 fail_timeout=2s;
        server 192.168.1.3:9000 weight=10 max_fails=0 fail_timeout=2s;
        server 192.168.1.4:9000 weight=10 max_fails=0 fail_timeout=2s;
        server 192.168.1.5:9000 weight=10 max_fails=0 fail_timeout=2s;
    }
... stuff with server, locations and such...
}

You can see that I don't even use all 5 servers in the same context.

nginx version: nginx/1.4.5

1 Answers

Voted

Tero Kilkanen · Answer 1 · 2014-03-25T03:04:45+08:00

Tero Kilkanen

2014-03-25T03:04:45+08:002014-03-25T03:04:45+08:00

This is an educated guess. The problem could be caused by exhaustion of local TCP ports for connections to the upstream servers.

You can check the range of allowed ports with:

sysctl net.ipv4.ip_local_port_range

The default on my Debian installation is 32768 - 61000.

You can expand the range with entering the following command as root:

echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range

If you are running a Debian or derived distribution, you can persist this setting across reboots by editing /etc/sysctl.d/99-local.conf and entering this into the file:

net.ipv4.ip_local_port_range = 1024 65535

3

nginx upstream timed out. Multiple servers at the same time

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?