We have this new client for whom we're reviewing our server infrastructure.
I know pretty well the web API because I've helped building it and now I'm on my own maintaining and pushing it forward, so big challenge and very interesting.
It is based on an Amazon m1.large instance, nginx (+ssl), django, amazon RDS (with MySQL) and a self hosted memcached for now.
The thing is we had some inputs from our client saying that they expect something like a max of 2500 users connecting to the API for a range of four hours, twice a day at least.
We have no idea of when exactly those connections will arise and we should not make any assumptions, so the thing I ended up thinking was that our server better has to support the 2500 connections at one point in time.
I've been playing around with apache benchmark sending 2500 concurrent connections while connecting/disconnecting memcache or some nginx settings, just to see the performances changes.
The best I came with was around 100 requests per second but the longest requests take more than 20 seconds (for 2500 concurrent connections, with only 100 the requests take max 1s). From a user point of view, I wouldn't like to wait more than 1 or 2 seconds for getting my result...
I'd like to play more with all the settings I can tune on nginx, django, mysql or memcache but at this point I think I need a methodology and more than a methodology, I need a goal to reach.
Searching on the web I see blog posts talking about services that reach several hundreds of requests per second. I'm far from that.
Seeing all those numbers coming out from apachebench are just giving me the impression that I am launching tests, seeing the results, but that I don't really understand them and don't really know what to do with them to improve our API.
So what would be a good methodology, a good approach to reach the goal of having a web API able to cope with this number of connections as fast as possible?
If you need more details just ask!
I have never worked with a Django setup so may not be able to get into Django specifics. It would be great if you could provide details on the CPU, IO, Memory stats when you hit 100 requests per second. You may get those 20 second delays due to varied reasons based on the nature of your resource crunch. You may not be able to make sense of the performance statistics without knowing the health of your system under stress. Good place to start with could be Amazon CloudWatch metrics and/or to enable monitoring with Munin, Nagios or similar with an appropriate graphing tool such as Graphite or Ganglia. Even tracking
vmstat
output could reveal a lot of things.The key to identifying your problem is to gather enough data about your system's health and follow it. You could simply graph your traffic trend on Graphite along with other stats such as CPU usage, IO waits, Context switches, number of Interrupts, Memory available and try to co-relate this data. You could even split your request cycle into database, middleware and render phases and track time spent in each phase.
vmstat
. This may happen if you choose to run more number of worker processes than the number of available cores. The server may also be waiting on block IO. Some have experienced block IO latency when on EBS, although I'm cool with it.Hope this helps.
First you need to establish what's the bottleneck for this web service. It's probably slow DB queries and/or poor django performance. Please note that most frameworks for fast web application development (Django performance) are not really optimized for speed. Unless you can afford using many servers and load balancing you can't really expect great performance.
Anyways...for starters I'd try to: