Having managed to kill sshd on a remote machine (by running a script which used all available memory in the machine, oops...) which I have no access to other than visiting the hosting for[1], I was considering ways to ensure that sshd is always kept running.
Other than a hacky cron job to restart sshd every n minutes or hours, using inittab to get init to keep sshd running seems like a good idea.
Are there any drawbacks from this approach? It would seem like something which it would be sensible for Linux distros to do by default, since sshd is often the only available method of access for a machine..
Additionally, are there any other daemons which I should be using this approach for? Perhaps a monitoring agent such as nrpe for nagios?
[1] Yes, management cards or a network power switch would be a good idea, but they were deemed "unnecessary" at the time...
There's a few implementations of this idea. Upstart is used by Ubuntu and can restart services if they die, Solaris 10 has the Service Management Facility, runit is cross-platform and there's daemontools as already mentioned.
I can think of no other reason not to do the inittab thing, than that restarting sshd after an upgrade is a little more inconvenient.
Other than that: interesting idea.
you can tell linux OOM killer not to kill sshd, google for oom_adj for more details, or see i.e. here rhel manual
There are benefits to having services that need to be reliable under a scheme that will ensure they're always running. I prefer to use daemontools, myself, for the reasons documented here: http://cr.yp.to/daemontools/faq/create.html
I've not run ssh this way, but I would be happy enough doing it if I was in the situation where I thought that my current SSH management wouldn't work. As far as your "running out of memory" problem, you can deprioritise certain processes like sshd so that they don't get killed by the OOM killer in favour of the program actually causing the problem.
An interesting idea.
I haven't tried anything like that but I would check at what time in the boot process the things from inittab are started. If it is too early you may not have the network running.
Monit is a monitoring daemon that is designed for just what you want to do here.
The only issue that I can foresee is if it were attempt to respawn with a configuration error.
I thought you could rate limit respawning, but I can't seem to find any documentation to support this.
Like other have noted before me, using an existing tool like daemontools or monit will probably be the smartest route. You can't use inittab to spawn sshd wince it forks to the background, and init will try to run several sshd's. You'll most likely get "init: re-spawning too fast" messages.
You might want to write a small monitoring script that will run in a loop and make sure the original sshd (the one that accepts connections and forks to handle the sessions) is still running. One it fails, just use the system's init script to re-run it.
Just a note, if your sshd is killed by the kernel's OOM handler, there'e no guarantee your sshd will survive a restart...