I am stuck on this topic for a while now: How can I get more details on where response time burns.
My Problem is the extreme variance in response times. Sometimes it takes the server 5 or 10 seconds or more to respond (esp. for the first call). Firebug marks this time mostly as "waiting". When I check localhost/server-status (where this delay occurs as well), most slots are occupied - but half a second later, they're free, again. I can hardly imagine that there are so many load spikes to explain this behavior.
Another strange thing: There are requests for 100K JPG images that sometimes - according to server-status - take 1, 2, or even 10 seconds to perform (column Req). At the same time, PHP scripts that include some CPU load, are handled in 100 ms or less (well, others also need 1 or 2 seconds). Requests to other (smaller) GIF or PNG images are even listed with a time of 0 ms.
This is where I am stuck: Is there any way to see what takes 10 seconds to send a simple JPG image?
Thanks for your good ideas!
-
System: I am talking about an Apache 2 webserver on Debian Linux (Sequeeze) that mostly delivers PHP scripted pages and images. The server is running on a VPS at a professional Germany server hoster. There is no memory swapping on the server (as far as I can see from the stats) and CPU load is not especially high (uptime reports a value around 3 that can rise to about 32 under extreme load - I think it should be an 8-CPU system). Of course, I can never be sure what the other VPSs on the server do.
Special Settings: Notably the server is sending all data via SSL. I further reduced keep-alive time to 1 seconds, because users typically spend very much time on each page (30-60 sec.) and keeping these connections alive after the image(s) are retrieved would quickly exhaust the server's memory (or the 2 GB I may use on the VPS). Due to larger PHP scripts, a typical thread takes up 20 MB of RAM. Therefore there are only 50 server slots (MaxClient) of which 35 support keep-alive.
Material: I created a test page (https://www.soscisurvey.de/example/?debug&password=demo) that is observed by the server site24x7.com (usually reponds in 1.4 seconds, but regularly there are spikes up to 20 or 30 seconds). To cross-check the results, I sent it to Load Impact es well: http://loadimpact.com/load-test/www.soscisurvey.de-35648bef3b84d3269e1fc7cb11bf1721
The TamperData plugin for Firefox will show you explicitly what you're downloading from the server and how long each item is taking:
https://addons.mozilla.org/en-US/firefox/addon/tamper-data/
However, you may also have some other issues resloving DNS if it's taking 10 seconds to dowload.
You may also want to check into apachetop. Install it on your Apache webserver. I have it installed on mine and check it from time to time. It will show you the pages with the highest load:
http://www.howtogeek.com/howto/ubuntu/monitor-your-website-in-real-time-with-apachetop/
Adding this as an answer rather than just the comment since this is what it turned out to be
The issue sounded like a disk latency issue. There were some reasons I thought of this being the problem
As you are not in control of the hardware, you have limited ways to solve this problem. You can contact the provider to have them try to fix it, use a RAM backed filesystem or in-memory cache (which you experimented with), or switch providers.