I'm testing out various Ruby on Rails hosting solutions, including nginx, apache, a couple of various ISPs and cloud computing systems, etc.
I'm noticing that, when there's only one or two simultaneous requests being handled, the average response time for those requests is often tiny (<10ms). However, I can only handle so much traffic like that. But if I am trying to maximize the number of requests per second, the average response time grows quite quickly. For instance, one one server, I found that the greatest number of requests/second was reached at around 16 simultaneous requests going on at any one moment. However, at this point, the average response time was over 200ms.
I wonder, what tricks and tips do you web server gurus have to balance between response time and requests per second?
you should read up on the C10K Problem. Generally it is about application architecture, scalability and how to achieve massive parallelism. My addition to that would be: start out small(ish), don't overengineer the subject, most web services won't need this scalability right away. Trying to start out C10k-capable could take too much time to get "right" and you should never neglect the core content of your service. having a C10k-capable site without the content equals a dead site.
First thing to determine, is it your app or the server?
How is static file performance from each solution? Are you able to pull a 50k image with reasonable results? Are you sure it is the ruby stack that is causing the problems?
There are a number of pieces that could be causing problems even including the application, backend database, file storage, etc. Depending on your code, there is quite a bit that can introduce concurrency issues. Try writing a very basic test app to see what sort of benchmarks you can get, then, add in pieces to see what causes the problem. Take a look at some of the profiling gems to see if you can use those to identify where your app is spending the most time.