So, I have been playing around with my auto scaling config and my Cloudwatch alarms to try and keep all my instances purring but not roaring.
I can't seem to get rid of a constant yoyo. CPU usage goes up, introduce an instance, CPU usage goes down, kill an instance. Rinse and repeat.
I'm currently basing my alarm on 3 x 1min intervals of average CPU >= 40%. Maybe I can base it on something else? CPU is a tricky one as when this graph is spiking (high) I can see some instances with idle CPU so the average is being raised by a single instance.
I'm finding some people are getting 502's when I'm getting 200's. Obviously I would like this to be consistent and stop this spiking all the times.
Thanks in advance.
EDIT 1: I have adjusted the Cloudwatch metric to be 20% cpu over 2 mins and also found an nginx error that may also have attributed to some additional load. Current graph looks like the below.
EDIT 2: Monitoring on load is so much better. See below for the load alarm. I'm getting alerts far less frequently and everything is running much nicer.
This is what I'm running in cron every minute;
/usr/local/bin/aws cloudwatch put-metric-data --namespace="NS" --metric-name="GroupLoad" --value `cat /proc/loadavg | awk '{print $1}'` --dimensions AutoScalingWebGroup=NS-WebGroup
Instead of AutoScaling based on CPU try Server Load.
AWS AutoScaling can operate on any CloudWatch metric, and you can write your own custom CloudWatch metrics.
More Information on how AutoScaling works: http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html
Creating a custom metric
http://aws.amazon.com/blogs/aws/amazon-cloudwatch-user-defined-metrics/