I don't understand the performance I'm seeing from apache. I would expect that more concurrent apache requests would perform better than fewer, up to a point, but beyond 3 concurrent requests, overall performance is flat. For example, I see the same requests / sec if I've got 3 or 4 concurrent requests. With each additional concurrent request, the avg response time increases so that the overall request handling rate stays the same.
To test this, I created a new ubuntu 10.04 vm on slicehost. This is a 4 core vm. I set it up with
aptitude update
aptitude install apache2 apache2-utils curl
curl localhost/ # verify hello world static page works
Then I benchmarked the response time and reqs / sec.
Edit 4: I benchmarked with something like "for x in $(seq 1 40); do ab -n 10000 -c $x -q localhost/ | grep whatever; done".
The exact commands and data are at https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AurdDQB5QBe7dGtiLUc1SWdOeWQ4dGo3VDI5Yk8zbWc&output=html
Cpu usage was about 25% on each core while running the tests.
Edit 2: Memory usage was at 45 / 245 MB according to htop.
Edit 1: I just tried the same thing on an ubuntu 11.04 vm and the overall issue is the same, but the performance is even worse: it gets around 2100 reqs / sec for most levels of concurrency and uses about 50% cpu on each core.
Edit 3: I tried it on real hardware and saw a peak reqs / sec around 4 or 5 concurrent requests and then it dropped a little and flattened out.
Can anyone explain why this is happening, how I can figure out what the bottleneck is, or how I can improve this? I've done some searching and haven't found any answers.
It sounds like you're seeing exactly what you said you expected. More concurrent requests causes Apache to perform better, up to a point, and then performance is flat. What it seems has surprised you is that the point occurs with a low number of concurrent requests.
I'm not sure why you find that surprising. There's no real disk I/O, since the page is surely in RAM. So it's purely a CPU bound and network bound activity. Once you have enough requests that you can tie up all cores and fill the network down with one request will another request is going up, there's no reason more connections waiting would make things any better.
So that really only leaves the question of what the limiting factor is. It's hard to tell without more details, but I'd look at the amount of system CPU usage and the network bandwidth. Most likely, either the CPU or the network interface is maxing out.
You are likely seeing the impact of overhead in the network stack. With increased concurrency, you will have more simultaneous connections open, so the system and apache has to work harder to open and close these connections. This typically degrades Apache performance and result in a longer average time per request at concurrency levels.
I also suspect you had more Apache child processes running at higher concurrency levels. This requires time to spin these up and down.
Network issues can be further complicated if you are running the test on the same system as the web server.
Tuning your TCP/IP stack, KeepAlive settings (if on), and Timeouts could improve this.
However, this is a long known issue with scaling apache.
Here's a classic article on the topic. PDF: http://www.stdlib.net/~colmmacc/Apachecon-EU2005/scaling-apache-handout.pdf
Please checkout the (not yet official) Performance documentation in the Apache httpd wiki:
http://wiki.apache.org/httpd/PerformanceScalingUp
A closing word: I don't know what "VM" implies in your case, but it could be a performance bottleneck.