I have a Debian-based system that is running out of memory but there appears to be plenty of free memory. The box will run for about 6-12 days and then it will begin killing anything allocating memory (usually allocating skbs
). Eventually, it kills Xorg and it degenerates into a watchdog reboot, then the box will run another 6-12 days before seeing the same failure.
Here is the oom-killer log:
[521652.462829] Xorg invoked oom-killer: gfp_mask=0x400cc0(GFP_KERNEL_ACCOUNT), order=0, oom_score_adj=0
[521652.462841] CPU: 1 PID: 28603 Comm: Xorg Tainted: G C 5.4.59-v7l+ #37
[521652.462844] Hardware name: BCM2711
[521652.462847] Backtrace:
[521652.462864] [<c020dfb0>] (dump_backtrace) from [<c020e318>] (show_stack+0x20/0x24)
[521652.462869] r7:ffffffff r6:00000000 r5:60000013 r4:c129dc94
[521652.462877] [<c020e2f8>] (show_stack) from [<c0a4ca0c>] (dump_stack+0xd8/0x11c)
[521652.462887] [<c0a4c934>] (dump_stack) from [<c0381478>] (dump_header+0x64/0x200)
[521652.462892] r10:00000000 r9:00001400 r8:c12a0090 r7:c0dd8104 r6:c2a93c80 r5:eee7bd00
[521652.462896] r4:cf0e1c68 r3:7747a280
[521652.462903] [<c0381414>] (dump_header) from [<c0380834>] (oom_kill_process+0x178/0x184)
[521652.462907] r7:c0dd8104 r6:cf0e1c68 r5:eee7c250 r4:eee7bd00
[521652.462914] [<c03806bc>] (oom_kill_process) from [<c03812c8>] (out_of_memory+0x27c/0x33c)
[521652.462918] r7:c1208380 r6:cf0e1c68 r5:c1204f88 r4:eee7bd00
[521652.462927] [<c038104c>] (out_of_memory) from [<c03cef78>] (__alloc_pages_nodemask+0xc18/0x1288)
[521652.462931] r7:00000000 r6:c120508c r5:0000fe2b r4:00000000
[521652.462941] [<c03ce360>] (__alloc_pages_nodemask) from [<c08eebb8>] (alloc_skb_with_frags+0xdc/0x1a4)
[521652.462945] r10:c551f840 r9:00008000 r8:004008c0 r7:00000000 r6:00000008 r5:00000008
[521652.462948] r4:00000003
[521652.462955] [<c08eeadc>] (alloc_skb_with_frags) from [<c08e6518>] (sock_alloc_send_pskb+0x214/0x248)
[521652.462960] r10:cf0e1d7c r9:c1204f88 r8:c026d33c r7:cf0e1dcc r6:ffffe000 r5:00000000
[521652.462964] r4:d799ea00
[521652.462973] [<c08e6304>] (sock_alloc_send_pskb) from [<c0a02e40>] (unix_stream_sendmsg+0x144/0x3a0)
[521652.462978] r10:d799ea00 r9:d799e700 r8:00000f00 r7:cf0e1dcc r6:c551fb40 r5:00008f00
[521652.462981] r4:00008000
[521652.462987] [<c0a02cfc>] (unix_stream_sendmsg) from [<c08e1690>] (sock_write_iter+0xb0/0x114)
[521652.462991] r10:eee3c900 r9:d4722d00 r8:cf0e1e38 r7:00000000 r6:00000000 r5:c1204f88
[521652.462994] r4:cf0e1ed4
[521652.463002] [<c08e15e0>] (sock_write_iter) from [<c03f9734>] (do_iter_readv_writev+0x168/0x1d4)
[521652.463006] r10:00000000 r9:cf0e1f60 r8:c1204f88 r7:00000000 r6:eee3c900 r5:00000000
[521652.463009] r4:00000000
[521652.463016] [<c03f95cc>] (do_iter_readv_writev) from [<c03faa4c>] (do_iter_write+0x94/0x1a4)
[521652.463020] r10:00000001 r9:bed31b84 r8:cf0e1f60 r7:00000000 r6:eee3c900 r5:cf0e1ed4
[521652.463023] r4:00000000
[521652.463030] [<c03fa9b8>] (do_iter_write) from [<c03fac2c>] (vfs_writev+0x9c/0xe8)
[521652.463034] r9:bed31b84 r8:cf0e1f60 r7:eee3c900 r6:cf0e1ed4 r5:0001a780 r4:c1204f88
[521652.463041] [<c03fab90>] (vfs_writev) from [<c03face8>] (do_writev+0x70/0x144)
[521652.463045] r8:eee3c900 r7:00000092 r6:00000000 r5:eee3c901 r4:c1204f88
[521652.463052] [<c03fac78>] (do_writev) from [<c03fc4d4>] (sys_writev+0x1c/0x20)
[521652.463056] r10:00000092 r9:cf0e0000 r8:c02011c4 r6:0000002e r5:bed31b84 r4:00000001
[521652.463064] [<c03fc4b8>] (sys_writev) from [<c0201000>] (ret_fast_syscall+0x0/0x28)
[521652.463068] Exception stack(0xcf0e1fa8 to 0xcf0e1ff0)
[521652.463072] 1fa0: 00000001 bed31b84 0000002e bed31b84 00000001 00000000
[521652.463077] 1fc0: 00000001 bed31b84 0000002e 00000092 00000001 00000000 b6fcf6f4 00000000
[521652.463081] 1fe0: 00000002 bed318f8 00000000 b69f1654
[521652.463085] Mem-Info:
[521652.463097] active_anon:27936 inactive_anon:40717 isolated_anon:0
active_file:9032 inactive_file:15510 isolated_file:0
unevictable:22734 dirty:0 writeback:0 unstable:0
slab_reclaimable:3248 slab_unreclaimable:7041
mapped:27349 shmem:61689 pagetables:1321 bounce:0
free:253912 free_pcp:34 free_cma:60386
[521652.463106] Node 0 active_anon:111744kB inactive_anon:162868kB active_file:36128kB inactive_file:62040kB unevictable:90936kB isolated(anon):0kB isolated(file):0kB mapped:109396kB dirty:0kB writeback:0kB shmem:246756kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[521652.463117] DMA free:258884kB min:20480kB low:24576kB high:28672kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:404kB unevictable:0kB writepending:0kB present:786432kB managed:679084kB mlocked:0kB kernel_stack:2280kB pagetables:0kB bounce:0kB free_pcp:136kB local_pcp:0kB free_cma:241544kB
[521652.463122] lowmem_reserve[]: 0 0 1204 1204
[521652.463139] HighMem free:756764kB min:512kB low:7948kB high:15384kB active_anon:111480kB inactive_anon:162868kB active_file:35840kB inactive_file:61440kB unevictable:90820kB writepending:0kB present:1232896kB managed:1232896kB mlocked:44kB kernel_stack:0kB pagetables:5284kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[521652.463143] lowmem_reserve[]: 0 0 0 0
[521652.463155] DMA: 356*4kB (UE) 786*8kB (UEC) 683*16kB (UEC) 28*32kB (UEC) 19*64kB (UEC) 8*128kB (C) 5*256kB (C) 2*512kB (C) 6*1024kB (C) 2*2048kB (C) 55*4096kB (C) = 259600kB
[521652.463197] HighMem: 322*4kB (UM) 3624*8kB (UM) 3968*16kB (UM) 2072*32kB (UM) 2012*64kB (UM) 1118*128kB (UM) 376*256kB (UM) 132*512kB (M) 40*1024kB (M) 19*2048kB (UM) 20*4096kB (UM) = 757576kB
[521652.463238] 86322 total pagecache pages
[521652.463246] 0 pages in swap cache
[521652.463252] Swap cache stats: add 36, delete 36, find 10/14
[521652.463257] Free swap = 2121980kB
[521652.463262] Total swap = 2122748kB
[521652.463267] 504832 pages RAM
[521652.463272] 308224 pages HighMem/MovableOnly
[521652.463277] 26837 pages reserved
[521652.463281] 65536 pages cma reserved
There is no memory fragmentation as shown here:
[521652.463155] DMA: 356*4kB (UE) 786*8kB (UEC) 683*16kB (UEC) 28*32kB (UEC) 19*64kB (UEC) 8*128kB (C) 5*256kB (C) 2*512kB (C) 6*1024kB (C) 2*2048kB (C) 55*4096kB (C) = 259600kB
[521652.463197] HighMem: 322*4kB (UM) 3624*8kB (UM) 3968*16kB (UM) 2072*32kB (UM) 2012*64kB (UM) 1118*128kB (UM) 376*256kB (UM) 132*512kB (M) 40*1024kB (M) 19*2048kB (UM) 20*4096kB (UM) = 757576kB
Here is the gfp_mask
decoded:
Xorg invoked oom-killer: gfp_mask=0x400cc0(GFP_KERNEL_ACCOUNT), order=0, oom_score_adj=0
Order = 0 means allocating 4kb pages
0x400cc0
4: ___GFP_THISNODE
C: ___GFP_KSWAPD_RECLAIM, ___GFP_DIRECT_RECLAIM,
C: ___GFP_IO, ___GFP_FS
0: ZONE_NORMAL allocation
Here is the kernel /proc/slabinfo
. Slabtop basically indicates no aggregating kernel memory so I don't think this is a kernel leak:
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
fuse_request 156 156 104 39 1 : tunables 0 0 0 : slabdata 4 4 0
fuse_inode 112 112 576 28 4 : tunables 0 0 0 : slabdata 4 4 0
PINGv6 0 0 896 18 4 : tunables 0 0 0 : slabdata 0 0 0
RAWv6 36 36 896 18 4 : tunables 0 0 0 : slabdata 2 2 0
UDPv6 68 68 960 17 4 : tunables 0 0 0 : slabdata 4 4 0
tw_sock_TCPv6 0 0 192 21 1 : tunables 0 0 0 : slabdata 0 0 0
request_sock_TCPv6 0 0 240 17 1 : tunables 0 0 0 : slabdata 0 0 0
TCPv6 53 53 1920 17 8 : tunables 0 0 0 : slabdata 4 4 0
ext4_groupinfo_4k 72 72 112 36 1 : tunables 0 0 0 : slabdata 2 2 0
ovl_inode 722 1156 456 17 2 : tunables 0 0 0 : slabdata 68 68 0
mqueue_inode_cache 25 25 640 25 4 : tunables 0 0 0 : slabdata 1 1 0
discard_entry 0 0 80 51 1 : tunables 0 0 0 : slabdata 0 0 0
nat_entry 0 0 24 170 1 : tunables 0 0 0 : slabdata 0 0 0
f2fs_inode_cache 0 0 752 21 4 : tunables 0 0 0 : slabdata 0 0 0
nfs_direct_cache 0 0 136 30 1 : tunables 0 0 0 : slabdata 0 0 0
nfs_inode_cache 0 0 736 22 4 : tunables 0 0 0 : slabdata 0 0 0
fat_inode_cache 64 64 496 16 2 : tunables 0 0 0 : slabdata 4 4 0
fat_cache 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0
squashfs_inode_cache 418 992 512 16 2 : tunables 0 0 0 : slabdata 62 62 0
jbd2_inode 408 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0
jbd2_journal_head 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_inode_cache 264 264 744 22 4 : tunables 0 0 0 : slabdata 12 12 0
ext4_allocation_context 156 156 104 39 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_prealloc_space 224 224 72 56 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_io_end 340 340 48 85 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_pending_reservation 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0
ext4_extent_status 512 512 32 128 1 : tunables 0 0 0 : slabdata 4 4 0
mbcache 408 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0
kioctx 18 18 448 18 2 : tunables 0 0 0 : slabdata 1 1 0
pid_namespace 0 0 120 34 1 : tunables 0 0 0 : slabdata 0 0 0
posix_timers_cache 88 88 184 22 1 : tunables 0 0 0 : slabdata 4 4 0
rpc_inode_cache 18 18 448 18 2 : tunables 0 0 0 : slabdata 1 1 0
rpc_buffers 16 16 2048 16 8 : tunables 0 0 0 : slabdata 1 1 0
ip4-frags 0 0 136 30 1 : tunables 0 0 0 : slabdata 0 0 0
xfrm_state 56 56 576 28 4 : tunables 0 0 0 : slabdata 2 2 0
PING 672 672 768 21 4 : tunables 0 0 0 : slabdata 32 32 0
RAW 63 63 768 21 4 : tunables 0 0 0 : slabdata 3 3 0
UDP 114 114 832 19 4 : tunables 0 0 0 : slabdata 6 6 0
tw_sock_TCP 84 84 192 21 1 : tunables 0 0 0 : slabdata 4 4 0
request_sock_TCP 68 68 240 17 1 : tunables 0 0 0 : slabdata 4 4 0
TCP 108 108 1792 18 8 : tunables 0 0 0 : slabdata 6 6 0
cachefiles_object_jar 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0
fscache_cookie_jar 42 42 96 42 1 : tunables 0 0 0 : slabdata 1 1 0
dquot 84 84 192 21 1 : tunables 0 0 0 : slabdata 4 4 0
eventpoll_pwq 510 510 40 102 1 : tunables 0 0 0 : slabdata 5 5 0
inotify_inode_mark 510 510 48 85 1 : tunables 0 0 0 : slabdata 6 6 0
scsi_data_buffer 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
request_queue 46 46 1408 23 8 : tunables 0 0 0 : slabdata 2 2 0
blkdev_ioc 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
biovec-max 100 100 3072 10 8 : tunables 0 0 0 : slabdata 10 10 0
biovec-128 42 42 1536 21 8 : tunables 0 0 0 : slabdata 2 2 0
biovec-64 84 84 768 21 4 : tunables 0 0 0 : slabdata 4 4 0
user_namespace 0 0 376 21 2 : tunables 0 0 0 : slabdata 0 0 0
sock_inode_cache 572 968 640 25 4 : tunables 0 0 0 : slabdata 41 41 0
skbuff_fclone_cache 336 336 384 21 2 : tunables 0 0 0 : slabdata 16 16 0
skbuff_head_cache 21690 21861 192 21 1 : tunables 0 0 0 : slabdata 1041 1041 0
configfs_dir_cache 73 73 56 73 1 : tunables 0 0 0 : slabdata 1 1 0
file_lock_cache 638 864 128 32 1 : tunables 0 0 0 : slabdata 27 27 0
fsnotify_mark_connector 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0
net_namespace 9 9 3456 9 8 : tunables 0 0 0 : slabdata 1 1 0
task_delay_info 3264 3264 80 51 1 : tunables 0 0 0 : slabdata 64 64 0
taskstats 92 92 344 23 2 : tunables 0 0 0 : slabdata 4 4 0
proc_dir_entry 672 672 128 32 1 : tunables 0 0 0 : slabdata 21 21 0
pde_opener 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0
proc_inode_cache 198 360 440 18 2 : tunables 0 0 0 : slabdata 20 20 0
seq_file 184 184 88 46 1 : tunables 0 0 0 : slabdata 4 4 0
bdev_cache 112 112 576 28 4 : tunables 0 0 0 : slabdata 4 4 0
shmem_inode_cache 2259 2431 456 17 2 : tunables 0 0 0 : slabdata 143 143 0
kernfs_iattrs_cache 1512 1512 72 56 1 : tunables 0 0 0 : slabdata 27 27 0
kernfs_node_cache 26334 26334 96 42 1 : tunables 0 0 0 : slabdata 627 627 0
filp 3347 4788 192 21 1 : tunables 0 0 0 : slabdata 228 228 0
inode_cache 9885 10460 400 20 2 : tunables 0 0 0 : slabdata 523 523 0
dentry 14017 26190 136 30 1 : tunables 0 0 0 : slabdata 873 873 0
names_cache 40 40 4096 8 8 : tunables 0 0 0 : slabdata 5 5 0
key_jar 1235 1365 192 21 1 : tunables 0 0 0 : slabdata 65 65 0
buffer_head 1783 2176 64 64 1 : tunables 0 0 0 : slabdata 34 34 0
uts_namespace 0 0 416 19 2 : tunables 0 0 0 : slabdata 0 0 0
vm_area_struct 9393 10842 104 39 1 : tunables 0 0 0 : slabdata 278 278 0
mm_struct 368 368 512 16 2 : tunables 0 0 0 : slabdata 23 23 0
files_cache 384 384 256 16 1 : tunables 0 0 0 : slabdata 24 24 0
signal_cache 575 575 704 23 4 : tunables 0 0 0 : slabdata 25 25 0
sighand_cache 423 480 1344 24 8 : tunables 0 0 0 : slabdata 20 20 0
task_struct 296 360 3904 8 8 : tunables 0 0 0 : slabdata 45 45 0
cred_jar 1837 1952 128 32 1 : tunables 0 0 0 : slabdata 61 61 0
anon_vma_chain 9461 10880 32 128 1 : tunables 0 0 0 : slabdata 85 85 0
anon_vma 5761 6351 56 73 1 : tunables 0 0 0 : slabdata 87 87 0
pid 2432 2432 64 64 1 : tunables 0 0 0 : slabdata 38 38 0
trace_event_file 1445 1445 48 85 1 : tunables 0 0 0 : slabdata 17 17 0
radix_tree_node 3240 6708 304 26 2 : tunables 0 0 0 : slabdata 258 258 0
task_group 192 192 256 16 1 : tunables 0 0 0 : slabdata 12 12 0
vmap_area 8064 8064 32 128 1 : tunables 0 0 0 : slabdata 63 63 0
dma-kmalloc-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-1k 0 0 1024 16 4 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-512 0 0 512 16 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-256 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-64 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
dma-kmalloc-192 0 0 192 21 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-1k 0 0 1024 16 4 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-512 0 0 512 16 2 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-256 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-192 0 0 192 21 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-128 2496 2496 128 32 1 : tunables 0 0 0 : slabdata 78 78 0
kmalloc-rcl-64 2304 2304 64 64 1 : tunables 0 0 0 : slabdata 36 36 0
kmalloc-8k 64 68 8192 4 8 : tunables 0 0 0 : slabdata 17 17 0
kmalloc-4k 588 612 4096 8 8 : tunables 0 0 0 : slabdata 80 80 0
kmalloc-2k 272 272 2048 16 8 : tunables 0 0 0 : slabdata 17 17 0
kmalloc-1k 1318 1392 1024 16 4 : tunables 0 0 0 : slabdata 87 87 0
kmalloc-512 2130 2200 512 16 2 : tunables 0 0 0 : slabdata 138 138 0
kmalloc-256 660 752 256 16 1 : tunables 0 0 0 : slabdata 47 47 0
kmalloc-192 1701 1701 192 21 1 : tunables 0 0 0 : slabdata 81 81 0
kmalloc-128 2948 3392 128 32 1 : tunables 0 0 0 : slabdata 106 106 0
kmalloc-64 25371 28928 64 64 1 : tunables 0 0 0 : slabdata 452 452 0
kmem_cache_node 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
kmem_cache 144 144 256 16 1 : tunables 0 0 0 : slabdata 9 9 0
Can anyone provide some assistance with what resource is running out? I haven't been able to determine even why the OOM killer is running. Any help is greatly appreciated.
Edit:
In response to comments:
The memory cgroup is not enabled, so it (shouldn't be) a cgroup issue:
cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 4 1 1
cpu 2 1 1
cpuacct 2 1 1
blkio 7 1 1
memory 0 57 0
devices 5 39 1
freezer 6 1 1
net_cls 3 1 1
pids 8 46 1
Here is /proc/meminfo
:
cat /proc/meminfo
MemTotal: 1911980 kB
MemFree: 734172 kB
MemAvailable: 890688 kB
Buffers: 16664 kB
Cached: 479784 kB
SwapCached: 0 kB
Active: 395800 kB
Inactive: 273448 kB
Active(anon): 336436 kB
Inactive(anon): 111840 kB
Active(file): 59364 kB
Inactive(file): 161608 kB
Unevictable: 125944 kB
Mlocked: 7552 kB
HighTotal: 1232896 kB
HighFree: 446848 kB
LowTotal: 679084 kB
LowFree: 287324 kB
SwapTotal: 524284 kB
SwapFree: 524284 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 298848 kB
Mapped: 218932 kB
Shmem: 272964 kB
KReclaimable: 17748 kB
Slab: 43448 kB
SReclaimable: 17748 kB
SUnreclaim: 25700 kB
KernelStack: 2664 kB
PageTables: 5140 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1480272 kB
Committed_AS: 1766576 kB
VmallocTotal: 245760 kB
VmallocUsed: 6368 kB
VmallocChunk: 0 kB
Percpu: 512 kB
CmaTotal: 262144 kB
CmaFree: 240632 kB
I've already tried adjusting vm.overcommit_memory=2
and vm.overcommit_ratio=2
to get a CommitLimit of 4GB on another box which is larger than my VM working set to rule out the heuristic overcommit algorithm having some issue. It still crashed in the same manner.