OK, so lets say I have a Nagios setup that monitors different services using the so-called nagios-plugins.
What would be the best practive for my nagios plugin (probably written in python) to determine if given service is running OK?
The particular service in question is a python socket server that listens on some port. So I will make sure nagios frequently checks that service and if it stops responding / dies, I should restart it. What should I do to know if the socket server is alive? Eventually how would I check if it is responding.
I have control over the service - I can change the way it works if that would help me determine it's health state.
Any ideas are welcome!
Keeping to the standard Nagios plugins found on, say, an Ubuntu repository, you can use the
check_tcp
plugin to send a string, and then check to see if it returns the expected response:Since you can modify your service, you can do something like "Are you OK?" and look for "I'm OK". It depends on how involved you want to get with checking to see if the service is up and running.
You can also use
check_procs
to see if the process for the service is there. This might be in conjunction with a check_tcp check, or as an alternative. Again, it depends on what you're doing, and how much you actually want to do. If you want to get very involved, you can write a custom Nagios check that will do all sorts of things to verify the functionality of the service and return custom state messages to the Nagios server.There are several ways to make sure a service is running.
ps -ef
output.netstat -lnp | grep your_port
.You can use a python script as you suggested to check, here is one I wrote that just checks 1 port. https://github.com/jonzobrist/Bash-Admin-Scripts/blob/master/tcpcheck.py
Here is a slightly different version, that is much faster, and checks the same port a number of times you specify. It'll hit a local server 1500 times in less than half a second.
https://github.com/jonzobrist/Bash-Admin-Scripts/blob/master/tcpcheck-bulk.py
If you're looking for a local shell script, pgrep processname works well, in Bash something like this should get you what you want.
You can do something similar with lsof -i :PORT For https/tcp 443 would look like