Nagios check notification intervals must be >= to a check interval because this prevents Nagios from sending out false alarm notifications should a service return to an UP status between checks. I understand the reasoning behind that.
We have a number of checks that run every 30 minutes. This means that if a check fails only one notification is sent out each time the service is checked after the retries are used up.
What I need is to be able to keep pestering the duty admin pager every two minutes after a check has gone HARD DOWN/CRITICAL. I can't do this because the next notification will only go out on the next check i.e. in another 30 minutes.
A feature we had on our old monitoring system was to set a new lower check interval as soon as the check had gone HARD DOWN/CRITICAL. This meant we could keep rechecking every two minutes (and sending alerts) until the alert was acknowledged by a human or changed its status to UP, after which the check interval would revert to 30 minutes.
Is there a way to facilitate this on Nagios?
I've had some thoughts about writing an event handler which will reschedule a check for two minutes in the future after a check has gone HARD DOWN/CRITICAL (by directly sending a command to Nagios).
I'm wondering if anyone else has had to do a similar thing?
I'm running Nagios Core 3.2.3.
You can do it by using CHANGE_NORMAL_SVC_CHECK_INTERVAL and CHANGE_NORMAL_HOST_CHECK_INTERVAL.
Add an event handler for your service:
The
change_check_interval
was defined incommands.cfg
:The content of
change_check_interval.sh
:Make sure that external commands is enabled in
nagios.cfg
: