I'm looking for a way to kill all processes with a given name that have been running for more than X amount of time. I spawn many instances of this particular executable, and sometimes it goes into a bad state and runs forever, taking up a lot of cpu.
I'm already using monit, but I don't know how to check for a process without a pid file. The rule would be something like this:
kill all processes named xxxx that have a running time greater than 2 minutes
How would you express this in monit?
In monit, you can use a matching string for processes that do not have a PID. Using the example of a process named "myprocessname",
Maybe if you check to see if CPU load is at a certain level for 10 monitoring cycles (of 30-seconds each), then restart or kill, that could be an option. Or you could use monit's timestamp testing on a file related to the process.
There no ready-to-use tool with that functionality. Let assume you want to kill php-cgi scripts, that runs longer than minute. Do this:
pgrep php-cgi | xargs ps -o pid,time | perl -ne 'print "$1 " if /^\s*([0-9]+) ([0-9]+:[0-9]+:[0-9]+)/ && $2 gt "00:01:00"' | xargs kill
pgrep
will select processes by name,ps -o pid,time
prints runtime for every pid, and then analyse line, extract time from it, and print pid if time compares with defined one. result passed to kill.I solved this exact issue with ps-watcher and wrote about it on linux.com a few years back. ps-watcher does allow you to monitor processes and kill them based on accumulated run time. Here's the relevant ps-watcher configuration, assuming your process is named 'foo':
The key is the line
which says 'if accumulated process time is > 1 hour AND I'm not the parent process, restart me.
So, I realize that answer doesn't use monit, but it does work. ps-watcher is lightweight and simple to set up, so there's no harm running it in addition to your monit setup.
Monit can do this as of version 5.4:
See: the project CHANGES file
You could work this into monit as an exec statement.