Ping a Specific Port

Question

xendi

Asked: 2017-09-21 04:56:02 +0800 CST2017-09-21 04:56:02 +0800 CST 2017-09-21 04:56:02 +0800 CST

Configuring Nginx + PHP-FPM For High Traffic Load

772

My nginx keep crashing and reporting "bad gateway" errors in the browser. Nginx and PHP-FPM don't come preconfigured to handle large traffic loads. I had to put a systemctl restart php7.0-fpm cron job in place each hour just to make sure my sites don't stay down for too long when they go. Let's just get down to it.

Some errors I get from /var/log/php7.0-fpm.log:

[20-Sep-2017 12:08:21] NOTICE: [pool web3] child 3495 started
[20-Sep-2017 12:08:21] NOTICE: [pool web3] child 2642 exited with code 0 after 499.814492 seconds from start

[20-Sep-2017 12:32:28] WARNING: [pool web3] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 7 idle, and 57 total children

Nothing jumps out at me in the nginx log. If I leave it running for too long without restarting it (PHP-FPM), I will get gateway errors. I've tried following tutorials 3 times now tweaking settings but it's still no good. Right now I've got all kinds of settings probably way off but it never works either way I do it.

/etc/nginx/nginx.conf:

user www-data;
worker_processes auto;
pid /run/nginx.pid;

worker_rlimit_nofile 100000;

events {
        worker_connections 4096;
        use epoll;
        multi_accept on;
}


http {
        sendfile on;
        reset_timedout_connection on;
        client_body_timeout 10;
        send_timeout 2;
        keepalive_timeout 30;
        keepalive_requests 100000;
        tcp_nopush on;
        tcp_nodelay on;
        types_hash_max_size 2048;
        fastcgi_read_timeout 300000;
        client_max_body_size 9000m;
        include /etc/nginx/mime.types;
        default_type application/octet-stream;
        ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;
        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;
        gzip on;
        gzip_disable "msie6";
        gzip_vary on;
        gzip_proxied any;
        gzip_comp_level 6;
        gzip_buffers 16 8k;
        gzip_http_version 1.1;
        gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;
        open_file_cache max=200000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 2;
        open_file_cache_errors on;

        access_log off;
}

/etc/php/7.0/fpm/php-fpm.conf:

    [www]

    pm = dynamic
    pm.max_spare_servers = 200
    pm.min_spare_servers = 100
    pm.start_servers = 100
    pm.max_children = 300

    [global]
    pid = /run/php/php7.0-fpm.pid
    error_log = /var/log/php7.0-fpm.log
    include=/etc/php/7.0/fpm/pool.d/*.conf

/etc/php/7.0/fpm/pool.d/www.conf:

[www]

user = www-data
group = www-data
listen = /run/php/php7.0-fpm.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 300
pm.start_servers = 100
pm.min_spare_servers = 100
pm.max_spare_servers = 200
pm.max_requests = 500

One of my sites (/etc/php/7.0/fpm/pool.d/web3.conf):

[web3]

listen = /var/lib/php7.0-fpm/web3.sock
listen.owner = web3
listen.group = www-data
listen.mode = 0660

user = web3
group = client1

pm = dynamic
pm.max_children = 141
pm.start_servers = 20
pm.min_spare_servers = 20
pm.max_spare_servers = 35
pm.max_requests = 500

chdir = /

env[HOSTNAME] = $HOSTNAME
env[TMP] = /var/www/clients/client1/web3/tmp
env[TMPDIR] = /var/www/clients/client1/web3/tmp
env[TEMP] = /var/www/clients/client1/web3/tmp
env[PATH] = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Resource/proc usage from htop:

6 Answers

Voted

Tero Kilkanen · Answer 1 · 2017-09-22T12:46:57+08:00

The issue is with your database access. You have several MySQL processes using CPU, which indicates that database queries take long to execute.

You need to look into your application, looking for the following things:

Database queries are properly optimised.
Database design is efficient, and proper indexing is in place.
Application has proper data caches in place.

The slow database queries then cause PHP-FPM to run out of available child processes which process the client requests. This will cause 502 Bad Gateway errors. You can try to increase pm.max_children setting for web3 pool, since that is causing the errors. This can remove scalability symptoms, but does not fix the root cause which is application / database inefficiency.

If you are not using the www pool, you can remove it to save the resources it uses.

The ideal setting for pm.max_requests is zero, that is, PHP workers should never be restarted. If your PHP workers don't leak memory due to bad coding of libraries, then you can use zero over there. Otherwise you can use whichever value that keeps the memory usage of the workers decent. There really isn't any other good advice to give regarding this setting.

There isn't that much you can do with nginx settings here, since it is the PHP-FPM that is not available sometimes. You could change gzip_comp_level to 1, which makes nginx spend a little less CPU compressing output. But this has really small effect compared to application optimisation.

symcbean · Answer 2 · 2017-09-22T14:59:21+08:00

symcbean

2017-09-22T14:59:21+08:002017-09-22T14:59:21+08:00

(this should be a comment, but its a bit long)

my sites keep crashing

....is not a capacity issue unless your server is so badly configured that the oom killer is kicking in. And is not the error you've quoted from your logs.

Why do you have half a gig of swap on a box with 12 gig of RAM?

Your keepalive is too high.

You have disabled access logging (your logs are the place to start looking for capacity issues).

The top output hints at problems with mysql performance.

Your pm.max_requests is too low.

You've not capped the listen_backlog.

Everything you've shown us here has issues and its just the tip of the iceberg. Voting to close

1

KIsmay · Answer 3 · 2017-09-24T07:24:48+08:00

Is it the web3 site that is going offline? This log entry seems to be suggesting the cause:

[20-Sep-2017 12:32:28] WARNING: [pool web3] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers)

You've got really high values for start_servers / max_spare_servers for the www site, but much lower values for web3.

You don't seem to be out of memory, so giving mysql more may help. Unless your php app never queries mysql, leaving mysql out of your optimization process is a mistake.

To start, you'll want to look at your mysql config. I believe most distributions are fairly conservative in memory setup, and number of threads. Look for the mysql example configs, eg: my-large.cnf my-medium.cnf and compare them to yours. Debian based distros have them in /usr/share/doc/mysql-server-x.y/examples/ (where x.y is the major version)

When adjusting the various knobs, I'd recommend small adjustments. For example, change a value from 8M to 16M.

If it's your php app, you'll also want to look at slow query log as suggested by Tero Kilkanen's answer.

Hope that helps.

SEO DEVS · Answer 4 · 2017-09-24T11:30:31+08:00

SEO DEVS

2017-09-24T11:30:31+08:002017-09-24T11:30:31+08:00

In my experience especially with a large site is that php-fpm uses alot of processor power. this happens if there is no cache available and it has to wait for your page to load and render locally and then cache it then server the cache. I've had the same issues with a large sites before. the best thing to do is use httrack to crawl your site, set speed limits in httrack so not to overload your server. This will build your nginx cache then once the cache is built then you will see instant loading of pages and very little cpu or ram usage. the main cause really is down to page rendering that can be caused by to much JS or CSS or most likely to many SQL requests or a poorly configured sql database. make sure to index database tables that are used frequently.

0

Wilson Hauck · Answer 5 · 2017-09-24T12:40:56+08:00

htop appears to indicate each of the 15 PID's that are MySQL associated have used TIME of more than 1:nn.nn and each has at least 1G of VIRT RAM in use. Since you have 12 GB RAM in total, is it time for you to share with us your

SHOW GLOBAL STATUS;
SHOW GLOBAL VARIABLES;
SHOW ENGINE INNODB STATUS;

to allow some reasonable checks on your MySQL configuration, even though it is not a problem? Uptime of 1 day, 11 hours is encouraging.

Any idea what the PID 6148 was doing that has TIME of 28:+ invested in the effort?

From an earlier response today of @xendi .... "Whenever this happens, all pages on all sites, no matter what scripts or content, error out with the gateway error. This happens to all pages and sites"
have you looked at php.ini session.gc_maxlifetime = nnnn garbage collection seconds as being a possible cause?

09/24/2017 nginx.conf questions that may have an impact

client_max_body_size 9000m;    # really 9G in one body?
client_body_timeout 10;   # seconds to receive the client body seems short.
open_file_cache max=200000 inactive=20s;   may be causing churn at 20s

https://www.linode.com/docs/web-servers/nginx/configure-nginx-for-optimized-performance/

possibly a helpful link.

pbacterio · Answer 6 · 2017-09-26T03:50:31+08:00

pbacterio

2017-09-26T03:50:31+08:002017-09-26T03:50:31+08:00

The seems to be all about the memory.

Try to decrease the number of php servers and limit the memory of php and mysql server.

0

Configuring Nginx + PHP-FPM For High Traffic Load

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?