I'd like to hear your approaches for monitoring Linux instances running in EC2. I'm very accustomed to using Nagios to monitor all manner of aspects of a Web-based application's ecosystem, but its model doesn't seem to lend itself particularly well to machines that are fairly frequently destroyed and recreated. My EC2 instances are intermediated by RightScale, which has its own monitoring scheme that I'm not finding hugely useful -- though I do plan to look into their monitoring some more.
The instances in question run normal open-source stuff: MySQL, Apache, Passenger, Rails.
Many thanks in advance.
It is possible to use the ec2 tools in a script to dynamically generate a nagios config. If all EC2 instances need the same services, then you associate the service with a hostgroup rather than a host, and dynamically generate the host/hostgroup definitions with the script run via cron. You can then do a kill -HUP (or /etc/init.d/nagios reload or svcadm nagios refresh) and have nagios reload the new config. This is a lightweight operation (doesn't require a restart) and so can be done pretty often. The script would have to read a list of active instances and their addresses, and generate a host definition for each one.
Do you want to monitor each EC2 instance or overall uptime and performance?
We do not really care what each instance does, but rather monitor our overall web application response time and functionality. There are a few tools for this. We like AlertFox, which runs pretty complex iMacros based transaction monitoring scripts for us every 15 min.
It might be worth looking at cloudkick. It will depend on exactly the kind of monitoring you need to do, but it's specifically designed for EC2:
https://www.cloudkick.com/
I use Ganglia to monitor my cluster:
http://ganglia.info/
Just make sure to configure it to use unicast and drop dead hosts after some amount of time.