Following virtualisation of a datacentre, I'm replacing an elderly internal-use-only Nagios server with a new one on a new VM. For simplicity and because we know it works I've simply replicated the old system, including re-installing Nagios 3.
Most of the detailed checks on the hosts are done using check_by_ssh, accessing remote systems using ssh keys. This always worked perfectly on the old system, however on the new, all of the checks are giving (for example) "Remote command check_disk -w5% -c3% -p /data -u GB returned status 3" in the nagios.log file and on screen.
Having set the keys up I can run the commands manually and they return the expected values, for instance:
ssh -i ~ng3/.ssh/id_rsa user@server "/usr/local/nagios/libexec/check_disk -w25% -c10% -p /store-web -u MB"
returns
DISK OK - free space: /store-web 1405862 MB (70%);| /store-web=609875MB;1511802;1814162;0;2015736
But the log and front end says
UNKNOWN - check_by_ssh: Remote command 'check_disk -w5% -c3% -p /store-web -u MB' returned status 3
Can anyone suggest what could be wrong? There are no SSH banners interrupting the data, passwordless ssh has been checked and full access is available to the correct users, and the commands are either specified in full or in the path - and running them manually from the Nagios box works fine. Results are the same whether the private key is explicitly specified or the just assumed by the user.
Were this a home-grown plugin I could see the problem being incorrect (or no) exit codes being issued but as results are the same from the official Nagios plugins (in this case check_disk) I'm assuming it's not that.
0 Answers