Ubuntu 16.04 x64bit kernel 4.4.0 cpu:8 , memory:31G , ZFS is main filesystem and cifs share is mounted
# sudo numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32157 MB
node 0 free: 2301 MB
node distances:
node 0
0: 10
cat /proc/meminfo | grep -i huge
AnonHugePages: 13080576 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
my server freezes randomly with log in syslog(full log see pastebin) i read this article which explains these kind of error and possible resolution here
Jan 15 02:35:01 centrallogserver CRON[55892]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jan 15 02:36:49 centrallogserver kernel: [120146.673901] java: page allocation failure: order:4, mode:0x240c0c0
Jan 15 02:36:49 centrallogserver kernel: [120146.673908] CPU: 7 PID: 52372 Comm: java Tainted: P O 4.4.0-112-generic #135-Ubuntu
Jan 15 02:36:49 centrallogserver kernel: [120146.673911] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
Jan 15 02:36:49 centrallogserver kernel: [120146.673915] 0000000000000286 d4f0e41d54eb99fa ffff88038a7cb968 ffffffff813fc233
Jan 15 02:36:49 centrallogserver kernel: [120146.673920] 000000000240c0c0 0000000000000000 ffff88038a7cb9f8 ffffffff8119696a
Jan 15 02:36:49 centrallogserver kernel: [120146.673924] d4f0e41d00000004 0000000000000004 0000000000000040 ffff880284f12a00
Jan 15 02:36:49 centrallogserver kernel: [120146.673929] Call Trace:
Jan 15 02:36:49 centrallogserver kernel: [120146.673938] [<ffffffff813fc233>] dump_stack+0x63/0x90
Jan 15 02:36:49 centrallogserver kernel: [120146.673945] [<ffffffff8119696a>] warn_alloc_failed+0xfa/0x150
Jan 15 02:36:49 centrallogserver kernel: [120146.673952] [<ffffffff8119a14f>] ? __alloc_pages_direct_compact+0x10f/0x130
Jan 15 02:36:49 centrallogserver kernel: [120146.673959] [<ffffffff8119a5fd>] __alloc_pages_slowpath.constprop.88+0x48d/0xb00
Jan 15 02:36:49 centrallogserver kernel: [120146.673966] [<ffffffff8119aef6>] __alloc_pages_nodemask+0x286/0x2a0
Jan 15 02:36:49 centrallogserver kernel: [120146.673975] [<ffffffff811e483c>] alloc_pages_current+0x8c/0x110
Jan 15 02:36:49 centrallogserver kernel: [120146.673980] [<ffffffff81198ac9>] alloc_kmem_pages+0x19/0x90
Jan 15 02:36:49 centrallogserver kernel: [120146.673986] [<ffffffff811b63ce>] kmalloc_order_trace+0x2e/0xe0
Jan 15 02:36:49 centrallogserver kernel: [120146.673993] [<ffffffff811f10ce>] __kmalloc+0x22e/0x250
Jan 15 02:36:49 centrallogserver kernel: [120146.674053] [<ffffffffc08e5c51>] smb2_unlock_range+0xa1/0x340 [cifs]
Jan 15 02:36:49 centrallogserver kernel: [120146.674094] [<ffffffffc08daef1>] ? smb2_add_credits+0xb1/0x250 [cifs]
Jan 15 02:36:49 centrallogserver kernel: [120146.674137] [<ffffffffc08bd600>] cifs_lock+0xc00/0x12a0 [cifs]
Jan 15 02:36:49 centrallogserver kernel: [120146.674142] [<ffffffff811f048b>] ? __slab_free+0xcb/0x2c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674147] [<ffffffff811f048b>] ? __slab_free+0xcb/0x2c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674154] [<ffffffff8139677e>] ? common_file_perm+0x6e/0x1a0
Jan 15 02:36:49 centrallogserver kernel: [120146.674160] [<ffffffff81266c6e>] vfs_lock_file+0x1e/0x40
Jan 15 02:36:49 centrallogserver kernel: [120146.674164] [<ffffffff81266f6b>] do_lock_file_wait+0x5b/0x100
Jan 15 02:36:49 centrallogserver kernel: [120146.674170] [<ffffffff811efc8a>] ? kmem_cache_alloc+0x1ca/0x1f0
Jan 15 02:36:49 centrallogserver kernel: [120146.674174] [<ffffffff812651bb>] ? locks_alloc_lock+0x1b/0x70
Jan 15 02:36:49 centrallogserver kernel: [120146.674179] [<ffffffff81268763>] fcntl_setlk+0x133/0x2c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674186] [<ffffffff812244c2>] SyS_fcntl+0x3e2/0x5e0
Jan 15 02:36:49 centrallogserver kernel: [120146.674193] [<ffffffff818457ad>] entry_SYSCALL_64_fastpath+0x2b/0xe7
Jan 15 02:36:49 centrallogserver kernel: [120146.674197] Mem-Info:
Jan 15 02:36:49 centrallogserver kernel: [120146.674207] active_anon:3871678 inactive_anon:544913 isolated_anon:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207] active_file:181867 inactive_file:199383 isolated_file:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207] unevictable:5021 dirty:138 writeback:0 unstable:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207] slab_reclaimable:232459 slab_unreclaimable:1851907
Jan 15 02:36:49 centrallogserver kernel: [120146.674207] mapped:260688 shmem:6003 pagetables:26155 bounce:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207] free:179404 free_pcp:283 free_cma:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674217] Node 0 DMA free:15840kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 15 02:36:49 centrallogserver kernel: [120146.674230] lowmem_reserve[]: 0 2976 32142 32142 32142
Jan 15 02:36:49 centrallogserver kernel: [120146.674236] Node 0 DMA32 free:164924kB min:12132kB low:15164kB high:18196kB active_anon:465040kB inactive_anon:473384kB active_file:15584kB inactive_file:57088kB unevictable:1204kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:3048416kB mlocked:1204kB dirty:44kB writeback:0kB mapped:45748kB shmem:2908kB slab_reclaimable:146872kB slab_unreclaimable:1350916kB kernel_stack:6624kB pagetables:9124kB unstable:0kB bounce:0kB free_pcp:704kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 15 02:36:49 centrallogserver kernel: [120146.674249] lowmem_reserve[]: 0 0 29165 29165 29165
Jan 15 02:36:49 centrallogserver kernel: [120146.674255] Node 0 Normal free:536852kB min:118872kB low:148588kB high:178308kB active_anon:15021672kB inactive_anon:1706268kB active_file:711884kB inactive_file:740444kB unevictable:18880kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29865212kB mlocked:18880kB dirty:508kB writeback:0kB mapped:997004kB shmem:21104kB slab_reclaimable:782964kB slab_unreclaimable:6056680kB kernel_stack:65472kB pagetables:95496kB unstable:0kB bounce:0kB free_pcp:428kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 15 02:36:49 centrallogserver kernel: [120146.674267] lowmem_reserve[]: 0 0 0 0 0
Jan 15 02:36:49 centrallogserver kernel: [120146.674273] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15840kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674291] Node 0 DMA32: 504*4kB (UME) 2593*8kB (UME) 2117*16kB (UE) 3322*32kB (UH) 1*64kB (H) 2*128kB (H) 2*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 164792kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674310] Node 0 Normal: 17501*4kB (UEH) 30839*8kB (UMH) 12373*16kB (UMH) 689*32kB (U) 0*64kB 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 536860kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674329] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674333] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674335] 408507 total pagecache pages
Jan 15 02:36:49 centrallogserver kernel: [120146.674338] 19222 pages in swap cache
Jan 15 02:36:49 centrallogserver kernel: [120146.674341] Swap cache stats: add 382634, delete 363412, find 121020/166633
Jan 15 02:36:49 centrallogserver kernel: [120146.674344] Free swap = 3438328kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674346] Total swap = 4194300kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674348] 8388461 pages RAM
Jan 15 02:36:49 centrallogserver kernel: [120146.674351] 0 pages HighMem/MovableOnly
Jan 15 02:36:49 centrallogserver kernel: [120146.674353] 156078 pages reserved
Jan 15 02:36:49 centrallogserver kernel: [120146.674355] 0 pages cma reserved
Jan 15 02:36:49 centrallogserver kernel: [120146.674357] 0 pages hwpoisoned
Jan 15 02:36:49 centrallogserver kernel: [120146.674577] java: page allocation failure: order:4, mode:0x240c0c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674581] CPU: 7 PID: 52372 Comm: java Tainted: P O 4.4.0-112-generic #135-Ubuntu
Jan 15 02:36:49 centrallogserver kernel: [120146.674585] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
As possible workaround i have increased the min free bytes from 60MB to 256MB and vfs_cache_pressure=50 Similarly i decreased the zfs_arc_max and zfs_dirty_data_max to 8GB and 128MB respectively but still the problem persists. Please suggest what system tuning could be done to prevent freezing issue one possible way i see is disabling overcommiting so no memory is allocated larger than physical ram
0 Answers