I have a command that runs a disk snapshot (on EC2, freezing an XFS disk and running an EBS snapshot command), which is set to run on a regular schedule as a cron job. Ideally I would like to be able to have the command delayed for a period of time if the disk is being used heavily at the moment the task is scheduled to run.
I'm afraid that using nice/ionice might not have the proper effect, as I would like the script to run with high priority while it is running (i.e. wait for a good time, then finish fast).
Thanks.
UPDATE:
This is what I ended up going with. It checks /proc/diskstats and runs my job when the current IO activity hits 0, or we timeout. I'll probably have to tweak this when I look at what kind of IO activity our servers actually get in production:
#!/bin/bash
DEVICE=sdf
# we want to make a snapshot when IO in progresses reaches this:
LOW_THRESHOLD=0
TIMER=0
MAX_SEC_DELAY=120
# Get the number of IO operations in progress:
ioInProgress(){
grep $DEVICE /proc/diskstats | awk '{print $12}'
}
# Wait for a good time to run snapshot, else timeout:
while [[ $TIMER -lt $MAX_SEC_DELAY && $(ioInProgress) -gt $LOW_THRESHOLD ]]; do
TIMER=`expr $TIMER + 1`
sleep 0.5
done
# Recording delay required:
echo $TIMER
echo "Executing snapshot"
run-the-snapshot
You can take a look at the batch command Maybe it will fit your needs.
man batch
for further details (it is a part of the at subsystem)i found a perl script here http://www.skolnick.org/cgi-bin/list.pl?file=serverload.pl
This should do what you need.
You could probably implement this with a simple shell script that parse uptime, and only being execution once the load average has lowered to a certain value.
Beware that if you server is constantly busy, with a run away process, then your cronjobs will never execute!.
Perhaps a better idea if you have memory is to run you cron jobs at a the lowest OS priority, thus they will only consume spare resources.
What about writing a little poll/daemon script that, beginning at your scheduled time, checks iostat (vmstat) for low disk activity and continues to check every 5 minutes until the disk activity is lower/below a preset threshold or a period of time has elapsed, whatever comes first?