I've used Daemontools to provide a simple and reliable way to supervise Unix services on my servers. It works well, but it requires a different way of thinking (The DJB Way) and some common complaints are:
- TAI64N based timestamps
- Doesn't store scripts under /etc/init.d (or (/usr/local)/etc/rc.d)
- Doesn't always work with scripts like apachectl. Some scripts need to be rewritten.
I remember that some similar "supervisor/watchdog" daemons were in the works about two years ago, but some were still a little rough around the edges.
If you have switched from Daemontools to something else, what did you choose and did it work well for you? Does RedHat or Ubuntu come with any process supervisor utilities by default?
Hrm, if you're using Ubuntu, their new init process, upstart, includes a level of process supervision. It can be used for your standard starting and stopping of services, a la SysV init scripts, and it can also monitor running applications and respawn them if they die.
You can also implement a poor man's process restarter via inittab, depending on what your needs are.
If you're primarily looking for something to keep an eye on a process, to make sure it's always running, and then restart it when it isn't, I've had great luck with restartd. Unfortunately, the only source for it that I know of is the Debian package. However, it's a very small and simple application, basically just a single .c and .h file, with a make file. Compiling it from the Debian source tarball on Red Hat is trivial (I even made an RPM of it at my previous job).
A final option I've heard of, but not used, is Supervisor. It looks like a promising tool, but restartd has worked well enough for me in the past, for what I needed, that I haven't yet bothered to play with it.
+1 for runit. More features and flexible than daemontools, compatible with existing daemontools arguments and options. Pretty neat.
But as you mentioned a lot of tools come with their own control binaries, apache2ctl, ejabberdctl, poundctl, collectd, etc. And although hacks exist, sometimes its just better to stick to the supplied tools, mostly when you are not sure of the cleanest possible implementation. I usually do a compromise, and have most of the services run under runit's supervision. And the others can be allowed to run using the trivial way.
Well, there's runit. I can't tell you what its differences and similarities with daemontools are, but judging by the Berstein-esque website, I'd say there is a definite Bernstein influence.
Fedora seems poised to switch to systemd: http://0pointer.de/blog/projects/systemd.html
As an alternative to the already mentioned
daemonize
anddaemontools
, there is the daemon command of the libslack package.daemon
is quite configurable and does care about all the tedious daemon stuff such as automatic restart, logging or pidfile handling.There's supervisord
There's also libslack's daemon tool that is written in C and available for various (Unix) platforms.
It is quite configurable and does care about all the tedious daemon stuff such as automatic restart, logging or pidfile handling.
Ubuntu comes with Upstart -- I don't know much about it but I know it does have "supervisor" capabilities. Apple's launchd is another option (that Wikipedia article has a nice "see also" section that lists a bunch of others too, including Upstart & RunIt).
All of them have their good points and their own special brand of übersuck - Whenever someone asks me about "process supervisor"/"watchdog" programs I always ask the same question: Why do you need one?
There are no popular/community-consensus tools for this because everyone who goes down this road realizes its a dead end. If your long running processes fail too often for simple monitoring to be good enough, then stop using them and move your code inside something that will be more event driven.
edit: as Chris points out below sometimes you're completely cornered, in which case a */1 cron job that looks for the process/pidfile, runs a start/restart if its missing, and outputs the results in an email to the responsible developer/product-manager is your fallback position.