My webserver runs nginx with php5-fpm. If some trouble occurs, usually php5-fpm hung up, resulting in a "bad gateway" server error. Of course, I never know, if nginx may crash as well, some day.
When something happens, the both processes (ad their threads) usually exist and need a restart. I am not so much interested in the cause of a current problem, but want to restart both processes. To do so, I create two bash scripts /etc/monit/webserver.start.sh and /etc/monit/webserver.stop.sh.
Here's my monit configuration file (in conf.d):
check process webserver with pidfile /var/run/nginx.pid
start program = "/etc/monit/webserver.start.sh"
stop program = "/etc/monit/webserver.stop.sh"
if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
then alert
if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
for 2 cycles
then restart
if failed (url https://www.myserver.com/example/ and content == 'test string' and timeout 20 seconds)
for 4 cycles
then exec "/sbin/reboot"
That's not entirely wrong, but some questions remain:
- Actually, I do not want to monitor the
nginx
process here, but the ports/URLs. Can I use any other check instead ofcheck process
? - To do different actions after 1 fail, 2 fails, and 4 fails, I need three
if failed
conditions, resulting in three server requests. Is there any way to run one request per cycle and perform different activities after a different number of fails?
I tried to find the answers from the official monit reference, but obviously, I do not understand the possibilities as decribed in that source. Therefore, I would apprechiate some advice very much.
Update
After spending some mot time with the monit man page (it's much better structured than the online manual, in my opinion), I found this optimization:
CHECK HOST webserver WITH ADDRESS 127.0.0.1
START PROGRAM = "/etc/monit/webserver.start.sh"
STOP PROGRAM = "/etc/monit/webserver.stop.sh"
IF NOT EXIST THEN ALERT
IF FAILED (url https://www.mydomain.tld/example/ and content == 'test content' and timeout 20 seconds)
FOR 2 CYCLES
THEN RESTART
IF 2 RESTARTS WITHIN 5 CYCLES
THEN EXEC "/sbin/reboot"
This modification does not include the alert on first URL fail (a workaround would be to use dummy start/stop commands, here) but can do a restart after 2 fails and a reboot aber 4 fails - with only one server request.
It's still not perfect. If someone knows how to do it better, advice is stil appreciated :) Thanks!
Update
After some testing, I DO NOT recommend to use monit's timeout feature (IF 2 REsTARTS WITHIN...
) for second-order actions. It seems that the timeout action is re-run after a reboot under certaing circumstances. In my case, this caused multiple reboots:
[CET Dec 28 05:59:50] error : skipping queued event /var/monit/id - unknown data format
[CET Dec 28 05:59:50] error : skipping queued event /var/monit/state - unknown data format
[CET Dec 30 03:10:52] error : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan 1 03:08:10] error : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan 1 03:09:30] error : 'webserver' failed protocol test [HTTP] at INET[www.myserver.com/example/] via TCPSSL -- HTTP: Error receiving data -- Resource temporarily unavailable
[CET Jan 1 03:09:31] info : 'webserver' trying to restart
[CET Jan 1 03:09:31] info : 'webserver' stop: /etc/monit/webserver.stop.sh
[CET Jan 1 03:09:31] info : 'webserver' start: /etc/monit/webserver.start.sh
[CET Jan 1 03:10:31] error : 'webserver' failed, cannot open a connection to INET[www.myserver.com/example/] via TCPSSL
[CET Jan 1 03:10:31] info : 'webserver' trying to restart
[CET Jan 1 03:10:31] info : 'webserver' stop: /etc/monit/webserver.stop.sh
[CET Jan 1 03:10:31] info : 'webserver' start: /etc/monit/webserver.start.sh
[CET Jan 1 03:10:31] error : 'php-fpm' process is not running
[CET Jan 1 03:10:31] info : 'php-fpm' trying to restart
[CET Jan 1 03:10:31] info : 'php-fpm' start: /usr/sbin/service
[CET Jan 1 03:10:31] error : 'nginx' process is not running
[CET Jan 1 03:10:31] info : 'nginx' trying to restart
[CET Jan 1 03:10:31] info : 'nginx' start: /usr/sbin/service
[CET Jan 1 03:11:32] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan 1 03:11:32] info : 'webserver' exec: /sbin/reboot
[CET Jan 1 03:12:24] info : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan 1 03:12:24] info : Monit start delay set -- pause for 240s
[CET Jan 1 03:16:24] info : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan 1 03:16:24] info : monit HTTP server started
[CET Jan 1 03:16:24] info : 'Memory' Monit started
[CET Jan 1 03:16:24] error : skipping queued event /var/monit/id - unknown data format
[CET Jan 1 03:16:24] error : skipping queued event /var/monit/state - unknown data format
[CET Jan 1 03:16:24] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan 1 03:16:24] info : 'webserver' exec: /sbin/reboot
[CET Jan 1 03:17:04] info : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan 1 03:17:04] info : Monit start delay set -- pause for 240s
[CET Jan 1 03:21:04] info : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan 1 03:21:04] info : monit HTTP server started
[CET Jan 1 03:21:04] info : 'Memory' Monit started
[CET Jan 1 03:21:04] error : skipping queued event /var/monit/id - unknown data format
[CET Jan 1 03:21:04] error : skipping queued event /var/monit/state - unknown data format
[CET Jan 1 03:21:04] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan 1 03:21:04] info : 'webserver' exec: /sbin/reboot
[CET Jan 1 03:21:44] info : Starting monit daemon with http interface at [0.0.0.0:2812]
[CET Jan 1 03:21:44] info : Monit start delay set -- pause for 240s
[CET Jan 1 03:25:44] info : Starting monit HTTP server at [0.0.0.0:2812]
[CET Jan 1 03:25:44] info : monit HTTP server started
[CET Jan 1 03:25:44] info : 'Memory' Monit started
[CET Jan 1 03:25:44] error : skipping queued event /var/monit/id - unknown data format
[CET Jan 1 03:25:44] error : skipping queued event /var/monit/state - unknown data format
[CET Jan 1 03:25:44] error : 'webserver' service restarted 2 times within 2 cycles(s) - exec
[CET Jan 1 03:25:44] info : 'webserver' exec: /sbin/reboot
Unless anyone has a good idea, I shall switch back to multiple requests. Finally, they are not so time-consuming...
BurninLeo
I do not want to monitor the nginx process here, but the ports/URLs. Can I use any other check instead of check process?
you can use the host check, this is an example from the monit site:
To do different actions after 1 fail, 2 fails, and 4 fails, I need three if failed conditions, resulting in three server requests. Is there any way to run one request per cycle and perform different activities after a different number of fails?
EXEC can be used to execute an arbitrary program and send an alert. If you choose this action you must state the program to be executed and if the program require arguments you must enclose the program and its arguments in a quoted string. You may optionally specify the uid and gid the executed program should switch to upon start. For instance: