I have a bunch of "night shift scripts" to maintain a server. The problem is, the "action window", when those scripts may run is always different. Sometimes there's nothing going on for minutes and hours, sometimes the server crunching some data all night long. The scripts aren't only (but mostly) DB scripts.
One developer came with an idea of implementing a daemon. This daemon should be checking the server conditions, and if there will be enough free resources found, some scripts would be started.
I find this idea interesting (not to say seducing ;-) ), but won't really reinvent the wheel. Are there any proven patterns? Some Shinken or Nagios plugins maybe?
In the nagios world you can chain tasks by using eventhandlers on services.
Event Handlers are in fact a second command executed after the first service command, always, (if activated in global configuration und for that service). Basic usage of the event handler consists of launching it with the state of the service and results of the command. Then the event handler script analyze the service state (are we OK/WARNING/CRITICAL? was it the first time the check send us this state? hard or soft state, etc) and decide to eventually launch a command. Previous link on documentation shows a basic bash script doing that (be careful the event handler is always run, even after a success result).
So you could add an event handler on a load average service, und this event handler could launch your cpu consuming maintenance tasks when the service state is OK. Or it could simply set a flag somewhere on your filesystem and your cron task would check that flag before running.
Now you may need to merge several services results bfore deciding if the system is really ready to launch the tasks, for several reasons:
Some checks like check_cluster could help you merge several services results and get a service in an OK state if 3 services on 5 are in OK state (for example). Then you would set the event handler on a service using check_cluster.
Managing the "I'm late" status is harder. The best place for that is the event handler code (ignore critical or warning status if you are late).
You could also have time periods constraints (exemple: maintenance tasks should only run on friday nigth). You have several recipe for that. IMHO the best one is to only set a flag with the event handler and set the time periods with the maintenance task scheduler (crontab). Nagios provides time periods which could be attached to your service but even the latest releases of Nagios had some heavy bugs with services not scheduled to run 7/7 24/24, bugs which pushed next service execution outside of the time period, then pushed it 1 week later (why 1 week?) and then never launch the service again). Cron or any external scheduler will make better and more robusts maintenance schedulers (I haven't tested the Shinken scheduler, maybe it really support official advanced time periods)