I'm attempting to add email alerts to an existing Nagios install. I've been using the web interface to keep an eye on some non-critical systems for a few months and it's been running well; warnings and critical problems are detected without issue.
My next step is to enable the alerting functionality but despite hours of fiddling I've been unable to get even the simplest alert to fire. I'm flat out of ideas as to what could be going wrong. It's almost certainly something simple that I've just failed to pick up on so hopefully one of you guys will spot it with ease.
The command I'm testing with is dead simple. Initially I'm just trying to write to a file:
define command{
command_name alerter
command_line echo "Alerter command fired by Nagios" >> /usr/local/nagios/var/alerter.log
}
I've tested the nagios user can execute this command using sudo. All seems well.
The hosts and services all refer to the 'admins' contact group. These are the templates they use, none of them override any of these settings.
define host{
name generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_period 24x7
check_interval 1
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 120
notification_options d,u,r,s,f
contact_groups admins
register 0
}
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 1
retry_check_interval 1
contact_groups admins
notification_options w,u,c,r
notification_interval 120
notification_period 24x7
register 0
}
The contact and contact group are configured as such:
define contact{
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r,f,s
host_notification_options d,u,r,f,s
service_notification_commands alerter
host_notification_commands alerter
register 0
}
define contact{
contact_name nagiosadmin
use generic-contact
alias Nagios Admin
email [email protected]
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
When I cause an outage Nagios picks it up and logs it like this...
[1315210448] SERVICE ALERT: ifs.aleph;Test service;CRITICAL;HARD;3;HTTP CRITICAL: HTTP/1.1 400 Bad Request - string 'Blah blah' not found on 'http://aleph.tekretic.com.au:80/' - 168 bytes in 0.369 second response time
[1315210653] SERVICE ALERT: ifs.aleph;Test service;OK;HARD;3;HTTP OK: HTTP/1.1 200 OK - 416 bytes in 0.364 second response time
.. but nothing is logged to my 'alerter.log' file. It's as though the alerter command is never fired.
What am I missing??
Make sure that you have the following in
nagios.cfg
:Also try to increate the
debug_level
to 32 for notifications to see what it says: