We're using nagios to watch our servers health. Now we have a task to add server which will be up only for certain time. And during that time we have to ensure that all services are up and running. Unfortunately we don't know when will be the host down. So we need some automatic way to achieve this.
- Is there a way (configuration directive) to not report if host goes down. I mean even in nagios clients like nagstamon. I don't like the idea of black icon in systray all day.
- Is there a way to not report any of services running on a host, while the host is down?
- While achieving points 1. and 2. is there a way to monitor all host services when and only when the host is up?
Let me take the points in the wrong order.
2) NAGIOS should already do this; if a host is down, service alerts will not be sent.
1) I was thinking you could do this with flexible downtime: this is downtime of a given window duration which doesn't start at a known time; instead, the window starts automatically when the host goes down.
But then it occurred to me: all you really need to do is send no alerts when the host is down. If you manage that, then
When the host is down, service alerts will not be sent. You don't care that the host is down, because as you say, you don't know when it'll come and go, so the absence of a host alert is immaterial. The HOST DOWN will still be logged, allowing you to retrospectively see what has gone on, but alerts will not be sent.
When the host is up, service alerts will be sent anyway.
That's what you want, isn't it? If so, you need to add to the host definition
I think that's also dealt with problem 3, as that's what happens normally. I can't speak for non-core clients like nagstamon. In my experience, these are usually screen-scrapers, and their decisions about what to notify aren't based on NAGIOS' notification logic. If your client honours NAGIOS' built-in rules, it should be fine; otherwise, you'll have to work with that particular tool to add a similar logic.
You can define custom time periods using the timeperiod definition in your timeperiods.cfg. Here is an example
Then use this for the check_period value in your host and service definitions.
This answer will not cover third-party monitoring addins like nagstamon or the firefox nagios plugin, because it will vary wildly.
A few ways that I can think of off the top of my head:
Schedule downtime for the host and all services (see above link).
What you could also do is put service dependencies to use. If you check the host via PING, then add a PING service, and a service_dependency for all other services on that host to depend on PING, and then shut the ping notifications off. This will look something like;
What this means in essence is that when PING is in warning, unknown or critical, PING will notify, none of the dependent services will. (And again, shut notifications off for PING!) Also, when PING is in warning, unknown, or critical state, the dependent services will not even execute.
I can't speak for NagStaMon, but the Firefox NAGIOS plugin has preferences that essentially says "ignore acknowledged services", meaning that if you acknowledge or schedule a service in downtime, have notifications off, or any other modification to a service, it will not render as "warning/critical" in the status bar even if it's in that state. I don't know what NagStaMon does or doesn't have in this manner.
Here is my idea for using passive checks for this, but I need to state an assumptions first. That you don't want to monitor host uptime at all. Just that when the host is up, that the appropriate services are running.
On your random uptime hosts, you can run something like the following shell script https://gist.github.com/746998. This example would monitor SMTP but it's fairly simple. You will need to have this run as a user that can ssh into your nagios host using a keypair, and securing that is left as an exercise to the reader (or post a new question). I haven't tested this but it should work. The Passive check documentation ( http://nagios.sourceforge.net/docs/3_0/passivechecks.html ) should be helpful.
This won't automate provisioning the hosts on your Nagios server, but you can use something like puppet for that.
For alerting you should check out Nagios BPI (Business Process Intelegence) Nagios Enterprises just released this new addon. go to Nagios dot come, and check out there roadmap.