Ping a Specific Port

Question

Grisu47

Asked: 2024-10-05 05:55:14 +0800 CST2024-10-05 05:55:14 +0800 CST 2024-10-05 05:55:14 +0800 CST

How do memory limits in Kubernetes work with cgroup v2 memory.high?

772

I'm trying to understand how memory requests and limits work with cgroup v2. In the Kubernetes manifest we can configure memory request and memory limit. Those values are then used to configure the cgroup interface:

memory.min is set to memory request
memory.max is set to memory limit
memory.high is set to memory limit * 0.8, unless memory request == limit, in which case memory.high remains unset
memory.low is always unset

memory.max is pretty self explanatory: When a process in the cgroup tries to allocate a page and this would put the memory usage over memory.max and not enough pages can be reclaimed from the cgroup to satisfy the request within the memory.max, then the OOM killer is invoked to terminate a process inside the cgroup. memory.high is more difficult to understand: The kernel documentation says that the cgroup is put under "high reclaim pressure" when the high watermark is reached, but what exactly does this mean?

Later on it says:

When hit, it throttles allocations by forcing them into direct reclaim to work off the excess, but it never invokes the OOM killer.

Am I correct to assume this means that when the cgroup tries to allocate a page beyond the memory.high watermark, it will synchronously look at the lruvecs and try to reclaim as many pages from the end of the lists until it is back under the high watermark? Or is the "reclaim pressure" something that happens asynchronously (through kswapd)?

Question 2: What is even the point of using memory.high on Kubernetes? As far as I know, Kubernetes nodes typically run without swap space. The only pages that are reclaimable are anonymous pages (if there is enough swap available) and page cache. Since there is no swap, this only leaves page cache. The thing is that page cache would also be reclaimed when hitting memory.max, before invoking the OOM killer as a last resort if nothing can be reclaimed. Then memory.high is essentially useless:

As long as page cache is used, it can always be reclaimed and memory.max would do so, too. With memory.high we are just throttling the application earlier than we have to. Might as well set memory.max lower in the first place.
If no significant page cache is used (which is probably the case for the majority of applications running Kubernetes today), then nothing can be reclaimed ergo there is no throttling (no paging out unused anonymous memory, no thrashing visible in the pressure stall information that would warn us) and we will run into memory.max none the wiser. Using memory.high has no effect.

1 Answers

Voted

Matthew Ife · Answer 1 · 2024-10-05T07:31:08+08:00

Am I correct to assume this means that when the cgroup tries to allocate a page beyond the memory.high watermark, it will synchronously look at the lruvecs and try to reclaim as many pages from the end of the lists until it is back under the high watermark? Or is the "reclaim pressure" something that happens asynchronously (through kswapd)?

I dont think it will immediately go into direct reclaim (synchronous as you call it) at that point, but I'm not sure. In my experience it will eventually hit direct reclaim with memory.high demands are stretched too far. It will certainly push up the memory pressure regardless.

Question 2: What is even the point of using memory.high on Kubernetes? As far as I know, Kubernetes nodes typically run without swap space. The only pages that are reclaimable are anonymous pages (if there is enough swap available) and page cache. Since there is no swap, this only leaves page cache. The thing is that page cache would also be reclaimed when hitting memory.max, before invoking the OOM killer as a last resort if nothing can be reclaimed. Then memory.high is essentially useless:

Running without swap space generally is stupid and has been for a long time. Regardless however, the only pages that are reclaimable are indeed mostly in page cache. There are other strategies that might happen though.

Zpools can be used to compress the LRU. If its on thats usually floating at around 20% of available memory.
The compactor can run in an attempt to defragment the memory. This can make finding new pages a bit easier. Probably not the case in 99% of commitments.

But its slim pickings.

In general, your observations match my realities too when not having swap to evict anonymous pages - MemoryHigh makes thrashing a lot worse as you just keep your page cache to an absolute minimum and end up doing IO a lot.

We turn it off also on LXD/LXC instances as it causes unnecessary thrashing (its a hard limit in the code we have to go back later to 'fix').

MemoryLow however can be useful as a soft reservation mechanism to say to the kernel "dont rob pages from this control group below this memory range, choose another victim".

How do memory limits in Kubernetes work with cgroup v2 memory.high?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?