I have amazingly strange issue with monitoring a CIFS (SMB) shared folder mounted to Linux machines by Nagios + NRPE.
NRPE process runs on the Linux machines under dedicated user nrpe
:
# systemctl status nrpe
nrpe.service - Nagios Remote Program Executor
Loaded: loaded (/usr/lib/systemd/system/nrpe.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2023-05-02 14:46:47 IDT; 20h ago
Docs: http://www.nagios.org/documentation
Process: 30216 ExecStopPost=/bin/rm -f /run/nrpe/nrpe.pid (code=exited, status=0/SUCCESS)
Main PID: 30218 (nrpe)
CGroup: /system.slice/nrpe.service
└─30218 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f
# ps -ef | grep nrpe
nrpe 30218 1 0 May02 ? 00:00:05 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f
The monitoring command is defined in its configuration /etc/nagios/nrpe.cfg
file this way:
command[check_backups_share]=/usr/lib64/nagios/plugins/check_disk -w 7% -c 5% -p /mnt/backups
If I run the command manually as nrpe
user on all machines, it succeeds:
# sudo -u nrpe bash
bash-4.2$ /usr/lib64/nagios/plugins/check_disk -w 7% -c 5% -p /mnt/backups
DISK OK - free space: /mnt/backups 2571991 MiB (61.32% inode=-);| /mnt/backups=1622248MiB;3900643;3984528;0;4194240
However, if I call it remotely from Nagios, it succeeds on one machine and fails on another:
$ /usr/local/nagios/libexec/check_nrpe -2 -H Machine01 -c check_backups_share
DISK OK - free space: /mnt/backups 2575536 MiB (61.40% inode=-);| /mnt/backups=1618703MiB;3900643;3984528;0;4194240
$ /usr/local/nagios/libexec/check_nrpe -2 -H Machine02 -c check_backups_share
DISK CRITICAL - /mnt/backups is not accessible: Permission denied
All other remote NRPE commands on Machine02
succeed. Even more, if I unmount the /mnt/backups
folder on Machine02
, it also succeeds (for root filesystem).
But when it's mounted, I get this Permission denied
error.
The folder is mounted identically on all machines, with the same credentials and options. In /etc/fstab
file:
//Backups-Server/backups /mnt/backups cifs vers=3.0,credentials=/path/to/creds 0 0
So:
- all credentials, permissions, users, groups are the same;
- command executed locally on all machines under the same user produces the same results;
- but when executed remotely, it fails on one machine complaining on permissions, but succeeds on all others,
- while the executing
nrpe
process configured the same way on all machines and has the same permissions.
So what on earth could this be?
Update:
Solved, see below.