I need a monitoring system, much like ganglia / nagios that is build for the cloud.
I need it to support :
- Adding / removing nodes dynamically. (Node shuts down, dose not imply node failure...)
- Dynamic node based categorization, meaning node can identify them self as being part of group X (ganglia gets this almost right, but lacks the dynamic part...)
- Does not require multicast support (generally not allowed in cloud based setups)
- Plugins for recent cool stuff such as Hadoop, Cassandra, Mongo would be cool.
More features include: External API, web interface and co.
I've looked at Ganglia, munin and they both seem be almost there (but not exactly). I would also go for reasonably priced Software as Service solution.
I'm currently doing research, so Suggestions are highly appreciated.
Thank you,
Maxim
I was going to suggest ganglia but I see you already considered it. There's also Icinga and Reconnoiter, that one being a very new addition. Reconnoiter has a hosted version with an amazing interface, called Circonus.
Really what you are talking about is some sort of config mgmt system to manage your monitoring. In essence you need monitoring to be provisioned when you provision a new host and deprovisioned when you shutdown a host. In house we are using puppet to provision our hosts into nagios and then manually removing as that is not a common task. As time goes by the deprovision process will be automated as well.
I would look at chef and puppet as the top contenders in this space.