I have a website, which queries a Varnish server, which queries an Apache server, which queries a db server.
At 07:00:00, a request is send to the Apache server, which triggers a db request that takes over 30 seconds to process. While the db server is "locked", concurrent db requests are piling up, causing apache requests to pile up as well. So far, this is not my issue.
In the meantime, Varnish polls Apache every 5 second, with a 1 second timeout. The probe target is an empty html file.
Apache log tells me that every poll is answered with a 200 status code.
I get the following results from combined Varnish/Apache log :
Polled at Served at Delay (s)
07:00:26 07:00:26 0
07:00:31 07:00:34 3
07:00:37 07:01:01 24
07:00:43 07:01:01 18
07:00:49 07:01:01 12
07:00:55 07:01:01 6
07:01:01 07:01:01 0
07:01:06 07:01:06 0
What I don't understand is the following :
- Given that Apache serves every polling requests, it should means that the MaxClients has not been reached. Otherwise, I guess Apache would reject any new incoming polling requests. Am I right ?
- If Apache can accept connections for the polling requests, why is the response delayed ? Serving an empty html file should be as fast as usual, even if many concurrent requests are still waiting for the db to "unlock". The timing looks like Apache needs somehow the db to unlock, and other processes to be served, so it can process the polling request.
The delay causes Varnish to believe that my server is "unhealthy", thus causing automatic rejection of all following requests, while they could all be served within a 30 seconds delay.
Varnish config :
backend foo {
.timeout = 60s;
.probe = {
.url = "/check.html";
.interval = 5s;
.timeout = 1s;
.window = 10;
.threshold = 8;
}
}
Apache configuration :
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
<IfModule mpm_prefork_module>
StartServers 5
MinSpareServers 5
MaxSpareServers 20
ServerLimit 200
MaxClients 200
MaxRequestsPerChild 0
</IfModule>
Don't hesitate to ask for more configuration informations or logs.
Apache does allow queue of pending connections to build up if all its http worker threads ore busy. This is controlled by the ListenBackLog directive:
https://httpd.apache.org/docs/current/mod/mpm_common.html#listenbacklog
So it's possible that the requests are entering this queue when all the other requests backup and that is causing your delay.
I would also enable the /server-status handler and monitor it 'while you server gets busy' as opposed to when it's already busy as Apache wont be able to server the server-status page.
Another trick is to add %D to your access log format as that will tell you the time (in microseconds) Apache took to serve a request from when it first received it to when it completed the request.