I have an aging mailserver....it simply runs courier postfix and smtp. It is a KVM VM and has two drives, one for os and one for data. I have a very full data drive (reports 100% full even though the used and available have a spread of about 24GB). I am not sure why or what is eating up space and then releasing it. A top shows mostly just postfix's imapd doing stuff. I cannot get iotop on this machine. So I figured to start freeing up space in users mailboxes on the server I would do a du -h -d1 to try to get who the biggest offenders are. Well, this command runs SLOW slower than it has ever. So since it ran slow, I figured I would issue a screen command of:
du -h -d1 > mailboxsizes.txt
So I could come to it in the morning and see the usages. It wrote out about 6 mailboxes, largest one being 2.2GB and then nothing. So came to the actual machine to see what the command was doing if it was still running and saw this:
[root@xmail]# du -h -d1 > /root/mailboxsizes.txt
[14280.306953] INFO: task imapd:12559 blocked for more than 120 seconds.
[14280.307710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[14280.309680] imapd D ffff8800d3d9cd98 0 12559 1 0x00000080
[14280.310591] ffff8800b17bbc20 0000000000000086 ffff880057bbce70 ffff8800b17bbfd8
[14280.310591] ffff8800b17bbfd8 ffff8800b17bbfd8 ffff880057bbce70 ffff8800d3d9cd90
[14280.313532] ffff8800d3d9cd94 ffff880057bbce70 00000000ffffffff ffff8800d3d9cd98
[14280.313532] Call Trace:
[14280.315669] [<ffffffff8168d159>] schedule_preempt_disabled+0x29/0x70
[14280.316637] [<ffffffff8168adb5>] __mutex_lock_slowpath+0xc5/0x1c0
[14280.316637] [<ffffffff81208e17>] ? unlazy_walk+0x87/0x140
[14280.318543] [<ffffffff8168a21f>] mutex_lock+0x1f/0x2f
[14280.319516] [<ffffffff81683c93>] lookup_slow+0x33/0xa7
[14280.320690] [<ffffffff8120c8f3>] path_lookupat+0x773/0x7a0
[14280.321718] [<ffffffff81183775>] ? filemap_fault+0x215/0x410
[14280.321718] [<ffffffff811de5e5>] ? kmem_cache_alloc+0x35/0x1e0
[14280.323363] [<ffffffff8120f23f>] ? getname_flags+0x4f/0x1a0
[14280.324348] [<ffffffff8120c94b>] filename_lookup+0x2b/0xc0
[14280.324348] [<ffffffff81210367>] user_path_at_empty+0x67/0xc0
[14280.325307] [<ffffffff811b1431>] ? handle_mm_fault+0x6b1/0xfe0
[14280.327150] [<ffffffff812103d1>] user_path_at+0x11/0x20
[14280.327965] [<ffffffff81203843>] vfs_fstatat+0x63/0xc0
[14280.328093] [<ffffffff81203dae>] SYSC_newstat+0x2e/0x60
[14280.328093] [<ffffffff81692875>] ? do_page_fault+0x35/0x90
[14280.330895] [<ffffffff8168ea88>] ? page_fault+0x28/0x30
[14280.331790] [<ffffffff8120408e>] SyS_newstat+0xe/0x10
[14280.331857] [<ffffffff81697089>] system_call_fastpath+0x16/0x1b
I am new to sysadmining and have zero idea what any of this is telling me save for something to do with imapd? I have done a reboot on this machine several times and it barely released any hard drives space or seemingly resources. I cannot figure out what is going on and why du failed above like it did. Mostly I am here asking where to even start? While this machine is old and has always had its moments, it has never done this before (even though I do acknowledge the data drive is low on space) but if I clear it, something eats it up.
For completeness:
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 2.9G 0 2.9G 0% /dev
tmpfs 2.9G 0 2.9G 0% /dev/shm
tmpfs 2.9G 41M 2.8G 2% /run
tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup
/dev/vda3 21G 18G 2.1G 90% /
/dev/vdb 459G 435G 442M 100% /mail
/dev/vda1 976M 119M 790M 14% /boot
tmpfs 581M 0 581M 0% /run/user/0
tmpfs 581M 0 581M 0% /run/user/1000
top - 06:06:53 up 6:52, 3 users, load average: 36.42, 36.64, 31.74
Tasks: 346 total, 8 running, 338 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.4 us, 1.2 sy, 0.0 ni, 0.0 id, 89.9 wa, 0.0 hi, 0.0 si, 1.5 st
KiB Mem : 5946284 total, 130280 free, 2278016 used, 3537988 buff/cache
KiB Swap: 2516988 total, 1906332 free, 610656 used. 3362528 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19174 postfix 20 0 27028 5504 1488 R 8.3 0.1 0:38.53 imapd
19586 postfix 20 0 27028 5504 1488 D 7.6 0.1 0:32.18 imapd
19008 postfix 20 0 27028 5504 1488 R 7.0 0.1 0:48.61 imapd
19372 postfix 20 0 27464 5872 1504 D 4.3 0.1 0:30.38 imapd
20087 postfix 20 0 27028 5504 1488 D 4.3 0.1 0:23.27 imapd
20188 postfix 20 0 27028 5504 1488 D 4.3 0.1 0:23.31 imapd
20353 postfix 20 0 27028 5508 1488 D 4.3 0.1 0:23.05 imapd
19963 postfix 20 0 27028 5508 1488 D 4.0 0.1 0:23.85 imapd
20275 postfix 20 0 27028 5508 1488 D 4.0 0.1 0:22.56 imapd
18460 postfix 20 0 29348 5748 1588 R 3.7 0.1 0:38.09 imapd
20236 postfix 20 0 27028 5516 1488 D 3.7 0.1 0:22.86 imapd
32 root 20 0 0 0 0 S 1.7 0.0 5:57.44 kswapd0
20079 postfix 20 0 32728 9152 1520 S 1.7 0.2 0:01.90 imapd
19702 postfix 20 0 27028 5516 1488 D 1.3 0.1 0:27.77 imapd
18575 postfix 20 0 30472 6848 1596 D 1.0 0.1 0:14.86 imapd
19782 postfix 20 0 27028 5508 1488 D 1.0 0.1 0:27.02 imapd
1026 root 20 0 1174028 22616 8992 S 0.7 0.4 2:53.90 fail2ban-s+
Not sure what to look at and try next to figure out where I can du some folders and know whose old inbox's we are keeping around to get rid of in an attempt to free up space and hopefully make the server perform better.
my only idea, is to systemctl stop postfix for a bit and see if du's and ls's work better and double check that top isn't pinged out with that.
also in case it is relevant an iostat:
iostat
Linux 3.10.0-514.16.1.el7.x86_64 (xmail) 11/21/2024 _x86_64_ (3 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
5.95 0.01 1.29 88.86 0.65 3.24
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
vda 11.92 252.15 61.60 6335865 1547820
vdb 1517.62 62131.62 78.76 1561227117 1979120