I'm trying to setup a cron job to reboot devices daily. With a safe callback to a SysRq reset if for some reason the reboot does hang (issue being that SSH gets killed and the device never reboots so it is lost and requires costly human intervention to restart).
The script that used to work for a while:
5 5 * * * root /sbin/reboot -f; sleep 30; /bin/echo `date -u +'\%Y-\%m-\%dT\%H:\%M:\%SZ'` >> /var/log/player-reboot.error.log; echo 1 > /proc/sys/kernel/sysrq; sync; echo b > /proc/sysrq-trigger
However it's pretty brutal (hard reboot -f) and some of our devices did not recover recently (a couple over thousands every day).
Not sure what hangs (looks like the file is never written so I'd say either the reboot itself or the echo hangs?
Was looking to use ampersands & to never "lock" and be sure that a proper reset will happen eventually, however it does not seem to work at all (no more reboots):
5 5 * * * root /sbin/shutdown -r +2 &; sleep 240; /bin/echo `date -u +'\%Y-\%m-\%dT\%H:\%M:\%SZ'` >> /var/log/player-reboot.error.log &; echo 1 > /proc/sys/kernel/sysrq; sleep 1; echo b > /proc/sysrq-trigger
Can I use the ampersand in a cron script? Do you know another smarter way to achieve the desired results? Thanks!
The simpler approach is to schedule another process to check for greater then 24 hours (ie: 25h) uptime. If the check returns true, it is obvious that something went wrong with the reboot, and so the machine must be restarted via
SysRq
.For maximum reliability, your periodic check should not depends on
crond
(which can be killed by the hangingshutdown
process). Rather, use a polling scheme; something like that:You can first-start the above script with a
@reboot
crond
entry, or withrc.local
and friends.