I wrote a small bash script to use with nagios to check if nrpe is running.
The check works locally when run as root, but fails from the monitoring host.
From the host I'm trying to monitor, I have this line in my nrpe.conf:
command[check_nrpe]=/usr/lib64/nagios/plugins/check_nrpe.sh
And made sure the check script is owned by the nagios user:
[root@ops:~] #ls -l /usr/lib64/nagios/plugins/check_nrpe.sh
-rwxr-xr-x. 1 **nagios nagios** 203 Jun 9 20:29 **/usr/lib64/nagios/plugins/check_nrpe.sh**
And if I run the script as the root user I get the correct result:
[root@ops:~] #/usr/lib64/nagios/plugins/check_nrpe.sh OK: NRPE is running with pid: 24538
24538
But when I run it from the nagios host the check produces the opposite result:
[root@monitor1:~] #/usr/local/nagios/libexec/check_nrpe -H ops.mydomain.com -c check_nrpe
**CRITICAL: NRPE is **NOT** Running**
If I go back to the host I'm trying to monitor and become the nagios user I get the same incorrect result as I do on the nagios host.
[root@ops:~] #su - nagios
Last login: Tue Jun 9 20:43:42 UTC 2015 on pts/3
-bash-4.2$ /usr/lib64/nagios/plugins/check_nrpe.sh
**CRITICAL: NRPE is **NOT** Running**
If I give the nagios user sudo access to that script, I can get the correct result as the nagios user on the local host.
In /etc/sudoers I give the nagios user access to the command and disabled tty by putting:
nagios ALL=(ALL) NOPASSWD: /usr/lib64/nagios/plugins/check_nrpe.sh !requiretty
And now if I become the nagios user on the local host and use sudo the check produces the correct result.
[root@ops:~] #su - nagios
Last login: Tue Jun 9 23:37:09 UTC 2015 on pts/0
-bash-4.2$ sudo /usr/lib64/nagios/plugins/check_nrpe.sh
**OK: NRPE is running with pid: 24538**
24538
If I then edit my nrpe conf file on the local host to use sudo before command. In nrpe.conf I put:
[root@ops:~] #grep check_nrpe /etc/nagios/nrpe.cfg
command[check_nrpe]=/bin/sudo /usr/lib64/nagios/plugins/check_nrpe.sh
And restarted the nrpe service:
[root@ops:~] #systemctl restart nrpe
[root@ops:~] #lsof -i :5666
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nrpe 6137 nrpe 4u IPv4 493404 0t0 TCP *:5666 (LISTEN)
nrpe 6137 nrpe 5u IPv6 493405 0t0 TCP *:5666 (LISTEN)
But when I go back to the nagios host and run the check again, I get an output error:
[root@monitor1:~] #/usr/local/nagios/libexec/check_nrpe -H ops.jokefire.com -c check_nrpe
**NRPE: Unable to read output**
This is the contents of my check nrpe script:
[root@ops:~] #cat /usr/lib64/nagios/plugins/check_nrpe.sh
#!/bin/bash
pid=$(lsof -i :5666 | awk '{print $2}' | grep -i -v pid)
if [[ $pid ]]
then
echo "OK: NRPE is running with pid: $pid"
exit 0
else
echo "CRITICAL: NRPE is **NOT** Running"
exit 2
fi
HELP!! How do I get this check to return the correct result from the nagios host?
Thanks
0 Answers