I have been in situation where I am not able to get PHP-FPM work for me even under slight increase in traffic. Have been trying to trace actual cause from a while and no success so far.
It started with particular site giving 502 error, looking into PHP-fpm logs I get this :
WARNING: [pool www-userA] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 39 idle, and 49 total children
Next I checked server resources, top says its almost 0 with only 2-4% resources utilization. Next was tweaking PHP-FPM pool :
pm = dynamic
pm.max_children = 800
pm.process_idle_timeout = 5s
pm.start_servers = 40
pm.min_spare_servers = 40
pm.max_spare_servers = 80
pm.max_requests = 500
php_admin_value[max_execution_time] = 60
;Added later to troubleshoot further
request_slowlog_timeout = 5s
slowlog = /var/log/pool_userA_fpm_slow_log
;Added later to compensate if there queue issue for troubleshooting
listen.backlog =24000
I have been almost to every PHP-FPM post related to this topic including : https://stackoverflow.com/questions/25097179/warning-pool-www-seems-busy-you-may-need-to-increase-pm-start-servers-or-pm
This server as around 12GB of RAM and 8 core processor only for nginx+php-fpm. My each PHP process is about 15-20MB each.
Tried increasing pm.max_children = 1500
but after a while again went see same error for "Pool being Busy".
I then enabled slowlog in php-fpm and also enabled slow log for mysql.
- In php-fpm slow log, I found few php pages taking about 5 second to complete ,
- In mysql slow log , found some queries examining 2-5 million rows (taking about 5 seconds to complete)
Assuming that PHP script might be causing queue or backlog, so I added listen.backlog =24000
as well as in /etc/security/limits.conf added soft and hard limit for this particular user so there is space for slow scripts ,
userA soft nofile 4096
userA hard nofile 65536
Further in sysctl ie.
echo "net.core.somaxconn=65536" >> /etc/sysctl.conf
Further in php-fpm master php-fpm.conf added, ie. outside pool conf:
rlimit_files = 65536
rlimit_core = 0
My ulimit -Hn
says:
524288
Further since php-fpm was getting busy, I found that I can add following directives in php-fpm in order to restart in case of being busy, but it is not happening, I have to restart manually php-fpm to get site working again :
[global]
emergency_restart_threshold 10
emergency_restart_interval 1m
process_control_timeout 10s
As said, above directive is not making restart in case of pool being busy error in php-fpm.log
So far my guess it that due to slow PHP script, my php-fpm children are being exhausted and causing 502 error. I have no control over PHP and I need to present solution by adjusting server config for it.
I tried to increase pm.max_children = 2000
but still same issue. Sometimes getting 504 Gateway Time-out
errors.
On other side, if I changed pm = ondemand
I get following notice first :
listen.backlog(25000) was too low for the ondemand process manager. I updated it for you to 65535
Later got this error and again this time 504 error :
[11-Nov-2021 06:56:45] WARNING: [pool userA] server reached max_children setting (800), consider raising it.
One thing to note in all is that there is almost no load on server in all cases, 2-4% usage of resources. So my guess is that its more configuration issue than resource usage.
I have been to almost all PHP-FPM related topics here on serverfault and lots of docs but still no gain. Here hoping someone can point me in right direction.
Thanks