We have this new client for whom we're reviewing our server infrastructure.
I know pretty well the web API because I've helped building it and now I'm on my own maintaining and pushing it forward, so big challenge and very interesting.
It is based on an Amazon m1.large instance, nginx (+ssl), django, amazon RDS (with MySQL) and a self hosted memcached for now.
The thing is we had some inputs from our client saying that they expect something like a max of 2500 users connecting to the API for a range of four hours, twice a day at least.
We have no idea of when exactly those connections will arise and we should not make any assumptions, so the thing I ended up thinking was that our server better has to support the 2500 connections at one point in time.
I've been playing around with apache benchmark sending 2500 concurrent connections while connecting/disconnecting memcache or some nginx settings, just to see the performances changes.
The best I came with was around 100 requests per second but the longest requests take more than 20 seconds (for 2500 concurrent connections, with only 100 the requests take max 1s). From a user point of view, I wouldn't like to wait more than 1 or 2 seconds for getting my result...
I'd like to play more with all the settings I can tune on nginx, django, mysql or memcache but at this point I think I need a methodology and more than a methodology, I need a goal to reach.
Searching on the web I see blog posts talking about services that reach several hundreds of requests per second. I'm far from that.
Seeing all those numbers coming out from apachebench are just giving me the impression that I am launching tests, seeing the results, but that I don't really understand them and don't really know what to do with them to improve our API.
So what would be a good methodology, a good approach to reach the goal of having a web API able to cope with this number of connections as fast as possible?
If you need more details just ask!