I have a problem with my Debian Servers. We run 4 different server which all have Intel CPUs and 128GB of RAM. Two of them run Wheezy, two of them run Jessie. We run a Java software on those systems which is heavily using memory and could eat up all memory.
For those cases I installed a swap partition on every server which is held on a RAID 1 running on 2 SSDs.
Problem with the Jessie systems: when the system nearly runs out of memory it starts swapping. This is tuned by the vm.swappiness = 10 parameter and looks ok to me. But the swapping itself is done so heavily, that the system totally hangs/freezes. There is so much disk io done that the system is not responding anymore.
I did some tests on all systems an artificially filled up the RAM to 120% by using:
stress --vm-bytes $(awk '/MemFree/{printf "%d\n", $2 * 1.2;}' < /proc/meminfo)k --vm-keep -m 1
The system start swapping and freezes while the swapping of the 20% is running. After ~20s the system is back and usable again but during the freeze nothing works anymore.
Of course this behaviour is not acceptable for a productive system. What I would expect is that swapping has a high priority but should never use more than 90% of all system resource so that the system still can be handled somehow.
Tuning the swappiness to different values didn't help..
We're using the following kernels:
Wheezy: Linux A 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux
Jessie: Linux B 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) x86_64 GNU/Linux
Did anyone run into the same problem and found a solution?
Edit: Thank you all for the comments and explanations. Of course I don't want to use swap as spare memory. The 120% usage was just a test. In production, the systems uses maybe 100,0001% of the memory and already stops being responsive. In the production mode with our software running there is also a high frequency of changing data so that the system might be busy by just swapping a very small amount of data back and forth the whole time.
We're still facing this issue with our Java applications even on servers with the current Debian Buster OS releases.
What we did to prevent it: we add to the end of
the config parameter
Until the system doesn't really really need it the swap isn't used. Besides that we make sure to configure our Java app to only used a max. amount of memory.
There are three options you may wish to consider:
1) Tune up your application's memory usage to not exceed the available memory in the system and disable swapping entirely. I only configure systems with swap under very unusual circumstances. If your server has more than one NUMA node, look at your biggest memory consumer's configuration and look for NUMA related options. If there aren't any, use numactl to set the process' memory to interleave between the nodes. Google for "mysql swapping insanity" for more details a out why NUMA can cause unusual swapping and OOM conditions even when plenty of memory is available.
2) Set swappiness=100. This will make the kernel swap out pages at the first sign of pressure. This can cause swapping to happen more often but in smaller increments and thus take edge of the system grinding to a halt for a long time.
3) Configure your swap on zram with lz4 compression. It is far faster than swapping onto spinning rust or even SATA SSD (probably slower than modern NVMe, though). Make sure you configure the zram size to less than the amount of available memory after deducting any memory reserved for huge pages. For example, if you have 128GB of RAM and you have 64GB of huge pages reserved, configure zram for, say 60GB. It is dynamically allocated and freed and 0-filled pages (you'd be surprised how many of those there are in working memory) get outright discarded.