Nagios is not sending emails. I see the logs that it shows alerts but no emails are going out. Any suggestions to debug the issue?
/var/log/maillog doesn't show any log entry. Manually sending emails through command line do reach my inbox.
Log and config:
[1549711074] SERVICE FLAPPING ALERT: host001;Disk Space - /boot;STARTED; Service appears to have started flapping (21.6% change >= 20.0% threshold)
[1549711074] SERVICE FLAPPING ALERT: host001;Disk Space Warn Only - /boot;STARTED; Service appears to have started flapping (21.6% change >= 20.0% threshold)
[1549711194] SERVICE ALERT: host001;Disk Space - /boot;CRITICAL;SOFT;1;/boot: 100%used(98MB/99MB) (>90%) : CRITICAL
[1549711194] SERVICE ALERT: host001;Disk Space Warn Only - /boot;CRITICAL;SOFT;1;/boot: 100%used(98MB/99MB) (>90%) : CRITICAL
[1549711254] SERVICE ALERT: host001;Disk Space - /boot;CRITICAL;SOFT;2;/boot: 100%used(98MB/99MB) (>90%) : CRITICAL
[1549711254] SERVICE ALERT: host001;Disk Space Warn Only - /boot;CRITICAL;SOFT;2;/boot: 100%used(98MB/99MB) (>90%) : CRITICAL
[1549711314] SERVICE ALERT: host001;Disk Space - /boot;CRITICAL;HARD;3;/boot: 100%used(98MB/99MB) (>90%) : CRITICAL
[1549711314] SERVICE ALERT: host001;Disk Space Warn Only - /boot;CRITICAL;HARD;3;/boot: 100%used(98MB/99MB) (>90%) : CRITICAL
[1549711387] Caught SIGTERM, shutting down...
[1549711387] Successfully shutdown... (PID=28697)
[1549711387] Warning: aggregate_status_updates directive ignored. All status file updates are now aggregated.
[1549711387] Nagios 3.0.6 starting... (PID=29699)
[1549711387] Local time is Sat Feb 09 03:23:07 PST 2019
[1549711387] LOG VERSION: 2.0
[1549711387] Finished daemonizing... (New PID=29700)
[1549711387] SERVICE FLAPPING ALERT: host001;Disk Space - /boot;STARTED; Service appears to have started flapping (27.3% change >= 20.0% threshold)
[1549711387] SERVICE FLAPPING ALERT: host001;Disk Space Warn Only - /boot;STARTED; Service appears to have started flapping (27.3% change >= 20.0% threshold)
[1549712107] SERVICE ALERT: mysql-db03;eth0 status;UNKNOWN;SOFT;1;ERROR: No snmp response from 10.49.64.62 (alarm)
[1549712107] SERVICE ALERT: mysql-db03;eth1 status;UNKNOWN;HARD;3;ERROR: No snmp response from 10.49.64.62 (alarm)
[1549712157] SERVICE ALERT: mysql-db03;eth0 status;OK;SOFT;2;OK: Interface eth0 (index 2) is up.
[1549712277] SERVICE ALERT: mysql-db03;eth1 status;CRITICAL;HARD;3;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712277] SERVICE NOTIFICATION: rt;mysql-db03;eth1 status;CRITICAL;ngmail;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712292] SERVICE NOTIFICATION: 724_shift11;mysql-db03;eth1 status;CRITICAL;ngmail;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712307] SERVICE NOTIFICATION: skytel1;mysql-db03;eth1 status;CRITICAL;ngmail;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712322] SERVICE NOTIFICATION: skytel2;mysql-db03;eth1 status;CRITICAL;ngmail;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712337] SERVICE NOTIFICATION: skytel4;mysql-db03;eth1 status;CRITICAL;ngmail;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712352] SERVICE NOTIFICATION: skytel6;mysql-db03;eth1 status;CRITICAL;ngmail;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712367] SERVICE NOTIFICATION: skytel7;mysql-db03;eth1 status;CRITICAL;ngmail;CRITICAL: Interface eth1 (index 3) is administratively down.
[1549712382] SERVICE NOTIFICATION: pubfolders;mysql-db03;eth1 status;CRITICAL;notify-by-email;CRITICAL: Interface eth1 (index 3) is administratively down.
and notification config:
# 'notify-by-email' command definition
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nNotification Number : $NOTIFICATIONNUMBER$\nProblem Duration: $SERVICEDURATION$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $SHORTDATETIME$\n\nAdditional Info:\n$SERVICEOUTPUT$\n\n" | /bin/mail -r $ADMINEMAIL$ -s "**$NOTIFICATIONTYPE$ alert #$NOTIFICATIONNUMBER$ - $HOSTALIAS$:$SERVICEDESC$ is $SERVICESTATE$**" $CONTACTEMAIL$
}
# 'host-notify-by-email' command definition
define command{
command_name host-notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nNotification Number : $NOTIFICATIONNUMBER$\nProblem Duration: $HOSTDURATION$\n\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nDate/Time: $SHORTDATETIME$\n\nAdditional Info: \n$HOSTOUTPUT$\n\n" | /bin/mail -r $ADMINEMAIL$ -s "HOST DOWN alert #$NOTIFICATIONNUMBER$ - $HOSTNAME$ is $HOSTSTATE$" $CONTACTEMAIL$
Contacts.cfg
define contact{
contact_name ops
alias Ops Email
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email [email protected]
}
When I stopped the nrpe service it sent the alert. It seems the contact.cfg only was set to send Down, Unreachable and Recovery alert.