I'm maintaining and planning an EC2 server instance running Apache2 on Ubuntu which receives currently up to about 10000 (very simple) requests per hour. It's just some data coming in by POST and a dummy plain text hyphen is responded with.
The amount of requests will rise gradually with time up to about a million per hour.
How do I (reliably>) detect that the server is brought to its knees and is no longer able to handle the incoming requests?
What I am doing at the moment is simply checking memory and CPU load in htop
- and if those aren't bordering full capacity then I assume everything is fine.
Regarding quantifying the performance of apache to the end users, the time taken to serve a response is useful and it will increase with load on the server. I typically combine logging of this value with some web analytics software such as awstats or webalizer.
Unfortunately the default log format does not show this, and so I use a custom log format in my apache, this is an example;
Custom log format directive %D gives request time;
Apache documentation:
http://httpd.apache.org/docs/current/mod/mod_log_config.html
Example from here;
http://www.moeding.net/archives/33-Logging-Apache-response-times.html
I think you're right watching system resources. Load average, IO load, memory, swap, CPU, etc...
You'll probably benefit from some detail on Apache's internal status, like what it's processes are actually doing.
http://www.tecmint.com/monitor-apache-web-server-load-and-page-statistics/
An example of what mod_status can show you from www.apache.org
http://www.apache.org/server-status
This might help collect information over time to look at later as a whole
https://httpd.apache.org/docs/2.4/programs/log_server_status.html
Depending on your setup you'll need to watch independently for performance of backend services that Apache is using like database servers and such.
This is how I do capacity planning for web servers.
The first consideration is that using Htop/top aren't suitable for this kind of analysis. They only show the most extreme short-term view, which is great for performance analysis, but stinks for capacity planning. You really need to look over a longer timeframe to accurately judge this. It's like sampling a few people to determine the religous make-up of a country. How do you know that the short period that you were watching was representative of what is going on with the server the rest of the time? What about when you're asleep? Do you even know if it is the peak time, or not?
Htop also only uses a 2 second interval by default. So spikes come and go, and they may or may not be important. Really, the minimum interval for capacity planning would be 5 minutes, but I prefer 1 hour. That will smooth out the spikes to show the underlying trend. That's what you need to plan against. However if you have reason to believe that there are shorter-term trends (for example, if all of the transfers occur for 10 minutes at the start of each hour) then by all means look at that.
Step 1: Collect the data and store it.
MOSOplot uses the collectd agent, which is able to collect apache metrics as well as system metrics.
To get the apache statistics (response time and volumes), you can use the collectd tail plugin, which will read in the apache log files and extract the data that you need. There is a dedicated apache plugin, but it doesn't get the response times.
Something like this should be a start, along with the config changes outlined by Tom H to include %D.
Step 2: Look at the key resource metrics
The number one metric to look at is the response time - unless your application is a batch system, in which case you can ignore this.
Try to correlate CPU and Memory utilisation with the response time. This will give you an idea of when your system will break. Response time might be able to grow to 5x normal, but after that it could rapidly deteriorate. Some applications might not even handle that though, so it depends.
Here's a link to a chart that shows an example of comparing CPU to webserver hits/minute. In this case, volumes are low and it looks like we could comfortably reach 15 hits/minute. cpu v hits/minute
Step 3: when to upgrade
If you weren't able to collect response times, then you need to use fixed thresholds. Definitely DO NOT wait until CPU and/or memory are bordering on full capacity. For a start, it would probably start to degrade at around 60% (hourly average). Secondly, Amazon uses burst mode on some of their systems at the 60% mark, which is a good indication that they consider this to be a good threshold. If you are frequently exceeding the burst level, then you should think about upgrading.
Memory should be OK up to around 90%, but given that apache can sometimes have OOM issues (nginx is better at this), then I'd be happier with 70% peak usage.
Apache tends to be quite volatile with memory usage. On one webserver that we monitor, we ended up changing to nginx because it was a small instance with only 1GB ram, and apache was suffering OOM errors. Here's a chart showing how much the RAM usage was jumping around under apache compared to nginx. Memory variable with apache v nginx
Another thing to think about here is how quickly can you get approval to upgrade. While actually upgrading on AWS might be quick, assuming that your app is scalable, getting the approval to upgrade at most of the clients that I work with is, well, glacial! Give yourself a few week/months headroom, if that's what you need.