I have a docker container which starts a simple java (jgroups-based) application through a bash script. The java process is limited through Xmx by 128m, the container is allowed to use 256m (swap is disabled). Unfortunately, time to time I face the following OOM messages:
Jul 07 02:43:54 ip-10-1-2-125 kernel: oom_kill_process: 16 callbacks suppressed
Jul 07 02:43:54 ip-10-1-2-125 kernel: java invoked oom-killer: gfp_mask=0x2400040, order=0, oom_score_adj=0
Jul 07 02:43:54 ip-10-1-2-125 kernel: java cpuset=0ead341e639c2f2bd27a38666aa0834c969e8c7e6d2fb21516a2c698adce8d5f mems_allowed=0
Jul 07 02:43:54 ip-10-1-2-125 kernel: CPU: 0 PID: 26686 Comm: java Not tainted 4.4.0-28-generic #47-Ubuntu
Jul 07 02:43:54 ip-10-1-2-125 kernel: Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/12/2016
Jul 07 02:43:54 ip-10-1-2-125 kernel: 0000000000000286 000000006ffe9d71 ffff8800bb3c7c88 ffffffff813eb1a3
Jul 07 02:43:54 ip-10-1-2-125 kernel: ffff8800bb3c7d68 ffff880033aea940 ffff8800bb3c7cf8 ffffffff812094fe
Jul 07 02:43:54 ip-10-1-2-125 kernel: 000000000000258c 000000000000000a ffffffff81e66760 0000000000000206
Jul 07 02:43:54 ip-10-1-2-125 kernel: Call Trace:
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff813eb1a3>] dump_stack+0x63/0x90
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff812094fe>] dump_header+0x5a/0x1c5
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff811913b2>] oom_kill_process+0x202/0x3c0
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff811fd304>] ? mem_cgroup_iter+0x204/0x390
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff811ff363>] mem_cgroup_out_of_memory+0x2b3/0x300
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff81200138>] mem_cgroup_oom_synchronize+0x338/0x350
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff811fb660>] ? kzalloc_node.constprop.48+0x20/0x20
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff81191a64>] pagefault_out_of_memory+0x44/0xc0
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff8106b2c2>] mm_fault_error+0x82/0x160
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff8106b778>] __do_page_fault+0x3d8/0x400
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff8106b7c2>] do_page_fault+0x22/0x30
Jul 07 02:43:54 ip-10-1-2-125 kernel: [<ffffffff81829838>] page_fault+0x28/0x30
Jul 07 02:43:54 ip-10-1-2-125 kernel: Task in /docker/0ead341e639c2f2bd27a38666aa0834c969e8c7e6d2fb21516a2c698adce8d5f killed as a result of limit of /docker/0ead341e639c2f2bd27a38666aa0834c96
Jul 07 02:43:54 ip-10-1-2-125 kernel: memory: usage 262144kB, limit 262144kB, failcnt 6868
Jul 07 02:43:54 ip-10-1-2-125 kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Jul 07 02:43:54 ip-10-1-2-125 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jul 07 02:43:54 ip-10-1-2-125 kernel: Memory cgroup stats for /docker/0ead341e639c2f2bd27a38666aa0834c969e8c7e6d2fb21516a2c698adce8d5f: cache:96KB rss:262048KB rss_huge:135168KB mapped_file:16
Jul 07 02:43:54 ip-10-1-2-125 kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Jul 07 02:43:54 ip-10-1-2-125 kernel: [26659] 0 26659 1127 20 7 3 0 0 sh
Jul 07 02:43:54 ip-10-1-2-125 kernel: [26665] 0 26665 1127 20 7 3 0 0 run.sh
Jul 07 02:43:54 ip-10-1-2-125 kernel: [26675] 0 26675 688639 64577 204 7 0 0 java
Jul 07 02:43:54 ip-10-1-2-125 kernel: Memory cgroup out of memory: Kill process 26675 (java) score 988 or sacrifice child
Jul 07 02:43:54 ip-10-1-2-125 kernel: Killed process 26675 (java) total-vm:2754556kB, anon-rss:258308kB, file-rss:0kB
Jul 07 02:43:54 ip-10-1-2-125 docker[977]: Killed
As you can see, RSS of my app is just about 64M. But for some reason RSS of the cgroup is 256M (including 128M of huge pages).
Is that a kind of OS caches? If so, why OOM doesn't flush them before killing user's apps?
Oh! Looks I forgot to post the answer.
The problem above is with my java process, it's not related to docker. I mistakenly thought that OOM report prints RSS in Kbytes. That's wrong - OOM report prints the amount of pages, which are usually take 4K each.
In my case, pid 26675 takes 64577 pages for RSS which equals (64577 * 4K) 258'308 KBytes. Adding 2 bash processes gives us the limit of the current CGroup - 262144kB.
So, the further analysis must be in the JVM field: heap/metaspace analyses, native memory tracking, threads, etc...
Not all Java memory is on the Heap. Before Java 8 there was the Permgen some of which has moved to Metaspace. You also have a stack (possibly 1Mb) for every thread, and the code for the JVM. It appears your container is undersized.
There are tunables for Permgen and stack sizing. Metaspace will grow as much as required. There are demonstration programs that will grow the Metaspace to huge sizes.
Study the memory model for Java before resizing your container. The JVM itself will fail with an Out of Memory condition if the allocated memory is too small. Threads will fail if the stack size is too small.