I'm running a Linux workstation without swap and I have installed earlyoom
daemon to automatically kill some processes if I'm running out of RAM. The earlyoom
works by monitoring kernel MemAvailable
value and if the available memory gets low enough, it kills less important processes.
This has worked fine for a long time but suddenly I'm now running into situation where MemAvailable
is suddenly really low compared to the rest of the system. For example:
$ grep -E '^(MemTotal|MemFree|MemAvailable|Buffers|Cached):' /proc/meminfo
MemTotal: 32362500 kB
MemFree: 5983300 kB
MemAvailable: 2141000 kB
Buffers: 665208 kB
Cached: 4228632 kB
Note how MemAvailable is much lower than MemFree
+Buffers
+Cached
.
Are there any tools I can run to further investigate why this happens? I feel that the system performance is a bit worse than normally and I had to stop the earlyoom
service because its logic will not work unless MemAvailable
is stable (that is, it correctly describes the available memory to user mode processes).
According to https://superuser.com/a/980821/100154 MemAvailable is an estimate of how much memory is available for starting new applications, without swapping. As I have no swap, what is this supposed to mean? Is this supposed to mean the amount of memory a new process can acquire before OOM Killer is triggered (because that would logically hit "the swap is full" situation)?
I had assumed that MemAvailable
>= MemFree
would be always true. Not here.
Additional info:
Searching around the internet suggests that the cause may be open files that are not backed by the filesystem and as a result, cannot be freed from the memory. The command sudo lsof | wc -l
outputs 653100
so I definitely cannot manually go through that list.
The top of the sudo slabtop
says
Active / Total Objects (% used) : 10323895 / 10898372 (94.7%)
Active / Total Slabs (% used) : 404046 / 404046 (100.0%)
Active / Total Caches (% used) : 104 / 136 (76.5%)
Active / Total Size (% used) : 6213407.66K / 6293208.07K (98.7%)
Minimum / Average / Maximum Object : 0.01K / 0.58K / 23.88K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
4593690 4593656 99% 1.06K 153123 30 4899936K ext4_inode_cache
3833235 3828157 99% 0.19K 182535 21 730140K dentry
860224 551785 64% 0.06K 13441 64 53764K kmalloc-64
515688 510872 99% 0.66K 21487 24 343792K proc_inode_cache
168140 123577 73% 0.20K 8407 20 33628K vm_area_struct
136832 108023 78% 0.06K 2138 64 8552K pid
...
which looks normal to me.
Creating a rough summary of lsof
$ sudo lsof | awk '{ print $2 }' | sort | uniq -c | sort -h | tail
6516 1118
7194 2603
7884 18727
8673 19951
25193 28026
29637 31798
38631 15482
41067 3684
46800 3626
75744 17776
points to me PID 17776 which is a VirtualBox instance. (Other processes with lots of open files are Chrome, Opera and Thunderbird.) So I wouldn't be overly surprised to later figure out that the major cause of this problem is VirtualBox because that's the only thing that really messes with the kernel.
However, the problem does not go away even if I shutdown virtualbox and kill Chrome, Opera and Thunderbird.
The discrepancy could be because you are using the wrong calculation. The answer you linked to does not highlight this, but look at the linked commit message:
The part of
Cached
which is not freeable as page cache (sigh), is counted asShmem
in/proc/meminfo
.You can also run
free
, and look in the "shared" column.Often this is caused by a mounted
tmpfs
. Checkdf -h -t tmpfs
.As you saw in the article you references, the whole set of calculations around MemAvailable is built around calculating how much memory is free to use without causing any swapping. You can see in the actual patch that implemented the MemAvailable number that MemAvailable = MemFree - LowWaterMark + (PageCache - min(PageCache / 2, LowWaterMark))
This formula points to the probability that your system's MemAvailable is low because your low water mark, the amount of free memory your system thinks it needs as its working space, is likely very high. This makes sense in a swapless environment where the system is much more concerned about running out of memory. You can look at what your current low watermark is:
I suspect in your case this is quite high.
Almost all the heuristics in Linux's memory management assume you will be operating with some swap space.