Many of you probably have completed or are contemplating Green-IT projects with the goal to power off idle or unneeded systems when demand for computer resources is low:
How you did handle this situation in your system monitoring? I'm especially interested in solutions for Nagios.
One idea is to schedule downtime in Nagios for the poweroff hosts. However, the drawback of this solution is that the hosts would still be listed in the 'Problems' view of the Nagios web interfaces. Is there a better solution without this "pollution" (i.e. were the 'Problems' view only shows real problems that require maintenance from a system administrator).
A clean solution would be a new 'Green-IT poweroff' host state. But AFAIK this does not exist, does it? Do you have any other recommendations or solutions? What's the best way to monitor a dynamic IT environment?
The easy way:
There are built-in filters for the status view, at the top of the page. You can just have the admins watch "unacknowledged" problems, or problems on hosts that are not in scheduled downtime. Or any other number of combinations.
If you really want to go wild with filtering the CGI view, see the "HOST AND SERVICE FILTER PROPERTIES" section of cgiutils.h in the source code for a full list of filters that are available.
The hard way:
See the docs on adaptive monitoring. With this, you can change the nagios conf, on the fly, as systems are automatically powered off/on. For example, you can adjust the check periods, change the check commands to a check_dummy variant, enable/disable event handlers, etc.
I think you need a bit of custom development to create a new status view that removes hosts with scheduled downtime from the list of problem servers. I suspect someone in the nagios dev community would be available to do this for a fee.