I'd like to monitor NFS mounts and the NFS server process using Monit.
On the server, I'd need a PID file, but I can't seem to find a way of getting that created with existing configuration files. Is there a way to do this, or has anyone monitored the server in a different way (checking if port 53 is active, etc).
On clients, I was thinking of making Monit simply look for a specific file in an NFS mount, and if it's accessible, all is well. Problem is, if the NFS server does go down, file requests usually hang (perhaps even indefinitely, not sure). How would one get around this issue with monit?
Any configuration examples would be greatly appreciated!
As for the "hanging" of the Monit process during NFS server faults, this can be circumvented by two methods.
hard
tosoft
, which causes the NFS layer to issue an I/O error to the accessing application afterretrans
retries. As this can introduce other problems with respect to data integrity (your writing applications need to be able to cope with I/O errors or at least exit cleanly, without corrupting the file written), you may also try to:Hope this helps!
The general approach would be (assuming none of the Monit built-in rules are applicable)
Let Monit test those scripts (example is from official documentation):
For the specific problem, this could mean
Server: It probably depends on your OS, linux distro, NFS 3 or 4 etc, but it should be easy to figure out. E.g. on Ubuntu 12.04, I would test whether NFS server is running via
Create a shell script returning 0 if both commands return 'running'.
Client: To check whether a certain NFS share is currently mounted, I mostly use df -h. So the corresponding shell script would look like
Did you check the init scripts for nfs already? I'd suspect that they are creating a pid file and sticking it somewhere for future restart or stop operations. If not, it should be pretty simple to modify them to do so.
As far as checking the mount goes, take a look at section 4.3.1 at http://nfs.sourceforge.net/nfs-howto/ar01s04.html#mounting_remote_dirs . If you mount it with the 'soft' option you will get behavior that lets you monitor it, but this should not be used for the actual mount. Perhaps you want a second mount just for monitoring?
I’m directly using the
df
test without a specific script:I wanted to reply to claasz, but I do not have enough reputation point. The idea of using an external script is very good, because it provides flexibility and suggesting to use portmap or rpcinfo to check for nfs server availability is quite smart.
I have found a script on Github from Thibaut Madelaine that I think should be interesting to many who face the same problem. He uses rpcinfo like this
rpcinfo -u 123.456.789.12 nfs 3
where 123.456.789.12 is the ip address of your nfs server.If all is good, the response will instantly be something like
program 100003 version 3 ready and waiting
and if it failed123.456.789.12: RPC: Program not registered
. Of course the response may vary depending on your system flavour I guess.Create a script called test-mount.sh to test mount. I am using file create and delete as I find just reading a file unreliable.
Create test on monit config. This will run the test-mount.sh and if it fails it will run remount-data.sh. You can replace this with anything you want to do in case of a failed mount.