systemd has the OOMScoreAdjust
option, which allows to adjust the oom-killer score of the started process.
To quote from the systemd documentation:
OOMScoreAdjust=
Sets the adjustment level for the Out-Of-Memory killer for executed processes. Takes an integer between -1000 (to disable OOM killing for this process) and 1000 (to make killing of this process under memory pressure very likely). See proc.txt for details.
In my setup, I am deploying a NodeJs server on AWS. Beside the Node server there is not much else running on the EC2 instance (expect monitoring and the essential OS processes). There are ELB health checks in place, which should eventually replace broken EC2 instances.
Still, I wonder if it is considered good practice to increase OOMScoreAdjust
to make the kernel prefer kill the Node server process if there are memory issues, as it can be automatically restarted. In systemd, it could look like this:
OOMScoreAdjust=1000
Restart=always
I have to admit that my understanding is limited. My current understanding is that it will most likely not make a real difference, and it is better to leave the defaults in place:
- If the memory draining process is the Node server, it will most likely be killed anyway.
- If the culprit is another process, restarting the Node server will not help and the ELB health checks should eventually take care of replacing the instance.
Still, I am curious if someone with a better understanding has already thought it through. Enabling it would only be one line in the systemd script. And when in doubt, I would rather have the kernel kill the Node process than any random system service.
In the case of a server with a single process it will probably not make a huge difference, but this can really shine if you have a process that frequently leaks memory.
For example on the desktop, Firefox tends to use more and more memory until the OOM-killer is invoked, and invariably it will decide that Xorg is using the most memory and kill it, bringing down your whole desktop when really it was only the browser that needed to be restarted.
So in this case setting the leaky program to have an OOM score of 1000 and to restart immediately won't be a problem, because it will get killed first and when it reloads it won't be using as much memory as before, freeing up memory overall.
If the process has a fairly constant memory use then it's unlikely to matter (but certainly won't hurt), but if it's leaky then it would likely result in quicker recovery than having the AWS ELB notice the problem and build a new VM.