I have a small RoR app that generates images using RMagick and the performance is not anywhere near where I thought it would be. The server configuration is a 64-bit Ubuntu 11.04 image on EC2 running Rails 3.1RC4 / Ruby 1.9.2 / Nginx / Passenger. I've tried several different Amazon instance types, from small (with a 32-bit image) to c1.xlarge and the results are very similar.
Firing up ApacheBench (ab -n 10 -c 1) gives me an acceptable average response time of 300ms and the test finishes in about 2 seconds, but increasing this to 5 concurrent or even 2 concurrent requests drops the performance to 1500ms and the test takes longer. Even on the big instance type a ab run (ab -n 10 -c 10) slows the system down to 5000ms. I would expect that response times would stay pretty much consistent but the the total time would drop. Is this an incorrect assumption? On every test the memory never climbs very high at all (< 1GB) but the CPUs are working 100%. My MacBook Pro running development mode can match these numbers.
It almost seems like something is single threaded somewhere. The app is almost as simple as you can get and the only complex thing is the RMagick calls. Is there something with RMagick that is causing threading problems? Is there a better RoR app server for this type of thing (Unicorn, Mongrel cluster, etc.)? Am I using ApacheBench incorrectly?
UPDATE
I have added some new text-only routes to the config and they perform very well. Returning 32K of plain text barely causes the CPU any problems and I can reach 72 req/sec (which is probably limited by my internet connection and not the EC2 server). Returning 5 bytes gets me over 250 req/sec.
There are so many possibilities... the fact that there's an apparent lack of concurrency would point to a lack of proper Passenger configuration (
passenger_max_pool_size
is the key variable), but with rmagick in the mix, the problem could be disk I/O (EBS volumes have horrendous -- and variable -- performance). On the other hand, the fact that your system stats say that CPU is maxing out would point to either rmagick chewing CPU like crazy (doing what?), or some other inefficiency in your code causing the CPUs to peg (although if you're managing to pull that on a c1.xlarge, I'm impressed).Increase your passenger pool size and collect better per-process and system-wide statistics into what's actually happening, and the answer should present itself.
We just debugged the reason for performance problem with Rails 3 + Nginx + Passenger + PostgreSQL stack on EC2 micro instances. They do something called CPU throttling, which is explained quite well by this blog post: http://gregsramblings.com/2011/02/07/amazon-ec2-micro-instance-cpu-steal/
We ran a bunch of ab stress tests and we couldn't really find a bottleneck, but the whole system was slowing down. It's then when we saw CPU steal at 100%.
The solution is to change to small instance type, which doesn't seem to do throttling or try other hosting service such as Rackspace.
The answer to this riddle was to recompile ImageMagick and turn off OpenMP. Apparently, the threads were all fighting for control and the process switching was completely killing performance. After the recompile a single ECU could handle more requests than 16 ECUs with OpenMP turned on. Amazing!