In New Relic, one of the metrics they display as part of the application response time is "Request Queue".:
To collect request queuing time, you need to mark the HTTP request with a timestamp when queuing starts. [1]
This is done by adding an HTTP header in the Apache httpd.conf:
RequestHeader set X-Request-Start "%t"
New Relic mention that:
For the request queuing bucket, a site operator can provision more application instances.
However we have seen adding new application instances (i.e. web nodes) doesn't affect the request queuing time - it stays constant. We're seeing this measured at around 250ms.
What factors affect the request queue length and how can it be reduced?
[1] http://support.newrelic.com/help/kb/features/tracking-front-end-time
I think the best way to do this is increase the 'Server Limit' and 'Max Clients' parameters in the Apache Config
This dictates how many threads Apache can process simultaneously. If you have a 'Max Clients' value of 100, Apache can process up to 100 requests at the same time.
It's probably also worth noting that Apache is good for small files (Text, maybe CSS/JS etc.) but not so great at larger files like images, video, flash etc. This is because each one of those requires a new request (Unless you're using Keep-alive but that doesn't improve it too much). So if you've got a page with 49 external resources (so 50 requests in total) that takes 1 second to load, and your max clients is set at 100, you can only process two page views a second before requests start being queued.
You can get round this in many ways, try offloading your content to a CDN (Pricing starts from about $0.10/GB but if your data transfer is high it might be worth getting in touch with Edgecast or Akami as their pricing is a lot cheaper in bulk). That means your server doesn't have to worry about any of the static resources required to load a page, so in our example above you're now up to 100 page views a second before requests start queuing.
If you don't want to shell out on a CDN, I'd suggest getting two IPs on your server and attaching one to Apache and one to NGINX. NGINX is a very high performance server capable of handling thousands of times more connections than Apache can, NGINX doesn't use request queuing like Apache because it's non blocking. Unfortunately NGINX doesn't have all the features of Apache, you can't, for example, run PHP directly via NGINX without proxying to Apache/FCGI/HipHop etc.
As an add-on to that, in your question you say "web nodes", would I be right in thinking you're using Apache as a front-end load balancer/proxy server to these nodes? If so I'd suggest you test out something like NGINX, Varnish, HAProxy etc. as they are much better suited to doing things like that and handling simultaneous connections.
--
EDIT:
I thought this might interest you with regards to frontend LB servers.
We were using Apache as a frontend proxying to 16 application nodes split across two servers. The proxy server was running on a quad core Intel Core i5 server (so by no means under-spec). We started noticing an parabolic relationship between the number of requests/second and the response time. At about 2000 requests per second CPU load would shoot up and each response would be taking about 800ms to complete, by 3000 r/s each response would be taking about 2 seconds. We switched to NGINX and we've hit 5000 r/s while only adding about 50ms average to the latency and CPU load was a quarter of what it was with Apache.
Obviously it entirely depends on your situation, what you're doing and what resources you have available but I just thought I'd give you my take on it =)
I have to ask the obvious question: the documentation states that you should use the http header X-Queue-Start (or X-Queue-Time) but you've mentioned that you're using X-Request-Start. Are you adding the correct header?