Ping a Specific Port

Question

Eddie Parker

Asked: 2010-05-15 10:32:34 +0800 CST2010-05-15 10:32:34 +0800 CST 2010-05-15 10:32:34 +0800 CST

Avoid linux out-of-memory application teardown

772

I'm finding that on occasion my Linux box runs out of memory and it starts tearing down random processes to deal with it.

I'm curious what administrators do to avoid this? Is the only real solution to up the amount of memory (will upping the swap alone help?), or is there better ways to set up the box with software to avoid this? (i.e., quotas, or some such?).

8 Answers

Voted

voretaq7 · Answer 1 · 2010-05-15T11:02:54+08:00

By default Linux has a somewhat brain-damaged concept of memory management: it lets you allocate more memory than your system has, then randomly terminates a process when it gets in trouble. (The actual semantics of what gets killed are more complex than that - Google "Linux OOM Killer" for lots of details and arguments about whether it's a good or bad thing).

To restore some semblance of sanity to your memory management:

Disable the OOM Killer (Put vm.oom-kill = 0 in /etc/sysctl.conf)
Disable memory overcommit (Put vm.overcommit_memory = 2 in /etc/sysctl.conf)
Note that this is a trinary value: 0 = "estimate if we have enough RAM", 1 = "Always say yes", 2 = "say no if we don't have the memory")

These settings will make Linux behave in the traditional way (if a process requests more memory than is available malloc() will fail and the process requesting the memory is expected to cope with that failure).

Reboot your machine to make it reload /etc/sysctl.conf, or use the proc file system to enable right away, without reboot:

echo 2 > /proc/sys/vm/overcommit_memory

mctylr · Answer 2 · 2010-05-15T11:31:38+08:00

The short answer, for a server, is buy and install more RAM.

A server that routinely enough experienced OOM (Out-Of-Memory) errors, then besides the VM (virtual memory) manager's overcommit sysctl option in Linux kernels, this is not a good thing.

Upping the amount of swap (virtual memory that has been paged out to disk by the kernel's memory manager) will help if the current values are low, and the usage involves many tasks each such large amounts of memory, rather than a one or a few processes each requesting a huge amount of the total virtual memory available (RAM + swap).

For many applications allocating more than two time (2x) the amount of RAM as swap provides diminishing return on improvement. In some large computational simulations, this may be acceptable if the speed slow-down is bearable.

With RAM (ECC or not) be quite affordable for modest quantities, e.g. 4-16 GB, I have to admit, I haven't experienced this problem myself in a long time.

The basics at looking at the memory consumption including using free and top, sorted by memory usage, as the two most common quick evaluations of memory usage patterns. So be sure you understand the meaning of each field in the output of those commands at the very least.

With no specifics of applications (e.g. database, network service server, real-time video processing) and the server's usage (few power users, 100-1000s of user/client connections), I cannot think of any general recommendations in regards to dealing with the OOM problem.

janneb · Answer 3 · 2010-05-15T11:01:37+08:00

janneb

2010-05-15T11:01:37+08:002010-05-15T11:01:37+08:00

You can disable overcommit, see http://www.mjmwired.net/kernel/Documentation/sysctl/vm.txt#514

4

pehrs · Answer 4 · 2010-05-15T14:34:29+08:00

pehrs

2010-05-15T14:34:29+08:002010-05-15T14:34:29+08:00

You can use ulimit to reduce the amount of memory a process is allowed to claim before it's killed. It's very usefull if your problem is one or a few run away processes that crashes your server.

If your problem is that you simply don't have enough memory to run the services you need there are only three solutions:

Reduce the memory used by your services by limiting caches and similar
Create a larger swap area. It will cost you in performance, but can buy you some time.
Buy more memory

3

Magellan · Answer 5 · 2012-11-16T10:41:05+08:00

Increasing the amount of physical memory may not be an effective response in all circumstances.

One way to check this is the 'atop' command. Particularly these two lines.

This is out server when it was healthy:

MEM | tot   23.7G | free   10.0G | cache   3.9G | buff  185.4M | slab  207.8M |
SWP | tot    5.7G | free    5.7G |              | vmcom  28.1G | vmlim  27.0G |

When it was running poorly (and before we adjusted overcommit_memory from 50 to 90, we would see behavior with vmcom running well over 50G, oom-killer blowing up processes every few seconds, and the load kept radically bouncing due to NFSd child processes getting blown up and re-created continually.

We've recently duplicated cases where multi-user Linux terminal servers massively over-commit the virtual memory allocation but very few of the requested pages are actually consumed.

While it's not advised to follow this exact route, we adjusted overcommit-memory from the default 50 to 90 which alleviated some of the problem. We did end up having to move all the users to another terminal server and restart to see the full benefit.

noonex · Answer 6 · 2021-02-11T00:43:28+08:00

noonex

2021-02-11T00:43:28+08:002021-02-11T00:43:28+08:00

Despite a lot of answers here - the best you can do as an administrator is to investigate everything in oom killer report(s) and clearly understand why it triggers. Then that should give you a clue about next steps. It may be related to OS config or maybe a problem with particular piece of software.

1

Krzysztof Dryja · Answer 7 · 2017-02-04T01:23:33+08:00

Krzysztof Dryja

2017-02-04T01:23:33+08:002017-02-04T01:23:33+08:00

I had similar issue related to this bug and solution was to use older / newer (fixed) kernel.

However at the time I could not reboot my machine so some kind of ugly workaround was to login as root and clear system caches with this command:

echo 3 > /proc/sys/vm/drop_caches

0

c4f4t0r · Answer 8 · 2013-12-11T10:31:56+08:00

@voretaq7 linux doesn't has brain-damaged concept of memory management, by default vm.overcommit_ratio is 0,

0       -   Heuristic overcommit handling. Obvious overcommits of
            address space are refused. Used for a typical system. It
            ensures a seriously wild allocation fails while allowing
            overcommit to reduce swap usage.  root is allowed to
            allocate slightly more memory in this mode. This is the
            default.

In this way, if you have 4GB of ram and you try to allocate 4.2 GB with malloc of virtual memory, your allocation will fail.

With vm.overcommit_ratio = 1

            1    -   Always overcommit. Appropriate for some scientific
            applications. Classic example is code using sparse arrays
            and just relying on the virtual memory consisting almost
            entirely of zero pages.

With vm.overcommit_ratio = 2

           2    -   Don't overcommit. The total address space commit
            for the system is not permitted to exceed swap + a
            configurable percentage (default is 50) of physical RAM.
            Depending on the percentage you use, in most situations
            this means a process will not be killed while accessing
            pages but will receive errors on memory allocation as
            appropriate.

            Useful for applications that want to guarantee their
            memory allocations will be available in the future
            without having to initialize every page.

So by default linux doesn't overcommit, if your application more memory then you have, maybe your code is buggy

Avoid linux out-of-memory application teardown

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?