I've got a few servers that have begun oom-killing their backup processes and, while I understand that encountering the oom condition is quite bad in itself, I need this process to not die so that backups happen properly while the memory issue is addressed.
To that end I've attempted to create way to launch processes with adjusted oom_scores in a way similar to launching a process with nice
.
#!/bin/bash
function oom_adj_exec() {
while getopts ':n:' opt; do
case $opt in
n)
if grep -q '^-\?[0-9]\+$' <(echo "$OPTARG"); then
if [ "$OPTARG" -ge -1000 -a "$OPTARG" -le 1000 ]; then
oom_score_adjust=$OPTARG
else
echo "Acceptable values for -n are from -1000 to 1000" >&2
return 255
fi
else
echo "Improper format for -n: $OPTARG" >&2
return 255
fi
break
;;
:)
echo "option -$OPTARG requires a value" >&2
return 255
;;
*)
echo "Unknown option -$opt" >&2
return 255
;;
esac
done
command=${@:$OPTIND}
# job control requires the monitor option which
# is usually not set for non-interactive shells
prev_state=$(set +o | grep monitor)
set -o monitor
$command &
pid=$!
echo "$oom_score_adjust" > /proc/$pid/oom_score_adj
fg %% > /dev/null
ecode=$?
# restore the previous state of the shell
$prev_state
return $ecode
}
oom_adj_exec $@
Example usage:
./oom_adj_exec.sh -n -500 /usr/bin/mem_bloater
While it seems to work I can't shake the feeling like there's something waiting in there to go horribly wrong. Is there anything that stands out as being a truly terrible idea and/or disaster waiting to happen?
I've also done this but not quite as nicely, like so:
Because it's in parentheses, it launches a subshell, sets the OOM score for the shell (in this case to 1000, to make it extremely likely to get killed in an OOM situation), and then the
exec
replaces the subshell with the intended program while leaving the new OOM score intact. It also won't affect the OOM score of the parent process/shell, as everything is happening inside the subshell.