I'm going to be running dozens of Amazon Web Service spot instances. They go up and down based on market price, and are good for workloads where you care more about cost than speed. I want to monitor their performance, tracking both generic system metrics and some specific metrics for my jobs.
The monitoring packages I'm familiar with require manual per-instance configuration, and expect systems to be up all the time. I'd like the machine to be able to add itself to the monitoring set, and if it is automatically terminated, I don't want that to be seen as a problem. I'd also like some rollup stats, like total tasks completed per hour across all machines.
What monitoring packages should I look at for this brave new world where a server might only exist for a few hours?