I should note that I'm not a sysadmin. You'll figure that out very shortly. :)
In a nutshell: Apache keeps taking a breather during heavy loads and all processes go idle. This is a polling server that is used by applications. The polls come from a lot of different endpoints. From time to time (every 4-5 minutes) if I'm watching top, HTTPD processes go idle all at the same time, stalling traffic for 10 seconds or so. It then recovers. The delay is problematic.
- Server is serving a lot of traffic. These are application polls via HTTPS, not web pages (though I doubt Apache knows the difference)
- The pauses noted above cause the traffic to become lopsided: after some time, I get a WHOLE BUNCH OF TRAFFIC, then a lull, then a WHOLE BUNCH OF TRAFFIC again
- Each poll requires a small database dip
Apache logs
Sometimes, but not always (mostly after a restart), I get these messages in error_log. Most of the time when it happens, I see nothing in the error_log.
[Mon Jun 30 17:55:17 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 31 idle, and 98 total children [Mon Jun 30 17:55:18 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 14 idle, and 98 total children [Mon Jun 30 17:55:44 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 74 idle, and 99 total children [Mon Jun 30 17:55:54 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 61 idle, and 99 total children [Mon Jun 30 17:56:00 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 0 idle, and 97 total children [Mon Jun 30 17:56:02 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 36 idle, and 99 total children [Mon Jun 30 17:56:03 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 39 idle, and 99 total children [Mon Jun 30 18:08:17 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 8 children, there are 18 idle, and 99 total children [Mon Jun 30 18:08:18 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 63 idle, and 98 total children [Mon Jun 30 18:08:19 2014] [info] server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 32 children, there are 74 idle, and 97 total children
Apache Config (old config commented out)
just showing config items that I suspect are relevant
#Timeout 60 Timeout 20 KeepAlive on MaxKeepAliveRequests 1000 KeepAliveTimeout 2 IfModule prefork.c StartServers 85 MinSpareServers 85 MaxSpareServers 100 ServerLimit 100 MaxClients 100 #StartServers 60 #MinSpareServers 60 #MaxSpareServers 85 #ServerLimit 85 #MaxClients 85 MaxRequestsPerChild 1000 /IfModule
Note that there's no difference between old and new configs in behavior.
Environment EC2, c1.medium, mod_perl, persistent database connections, separate RDS server, no errors showing in MySQL error logs and no errors showing in Apache logs
As an aside, I've seen suggestions to install mod-status, but i haven't figured out how to do so, and I don't know what to look for if I do.