I'm going to be running dozens of Amazon Web Service spot instances. They go up and down based on market price, and are good for workloads where you care more about cost than speed. I want to monitor their performance, tracking both generic system metrics and some specific metrics for my jobs.
The monitoring packages I'm familiar with require manual per-instance configuration, and expect systems to be up all the time. I'd like the machine to be able to add itself to the monitoring set, and if it is automatically terminated, I don't want that to be seen as a problem. I'd also like some rollup stats, like total tasks completed per hour across all machines.
What monitoring packages should I look at for this brave new world where a server might only exist for a few hours?
Take a look at Ganglia (http://ganglia.sourceforge.net/). The configuration file would be the same for all of your instances ("send metrics via UDP to host a.b.c.d"). You get a variety of basic system metrics out of the box, and it's very easy to collect new metrics (there's a "gmetric" command line tool for doing this, and you can also interface with the metric collection daemon via Python modules). You don't need to do any configuration on the server side to accept new metrics; It Just Works.
Note that Ganglia is a metric collection tool; it doesn't do any sort of alerting (but it's very easy to integrate with, say, Nagios if you want that kind of thing).
If Ganglia thinks your host is down it may stop displaying metrics (for that host), but they'll all come back when the server is back online. You can fake it out (i.e., make it thing a host is up when it's down) using the spoofing capabilities of the gmetric tool.
Ganglia uses rrdtool on the backend.
Don't know about total tasks completed per hour. You probably will want to create a plugin that does that. I'd suggest looking at Nagios Exchange. You'll see a lot of samples, one of which will be a good starting point. The problem is your up/down requirement will be a pain with Nagios and similar approaches. You need the customization available from Nagios plugins but in a cloud monitor model. Not sure if Cacti or Ganglia fits. Pretty sure appfirst can do this.