I have an aging mailserver....it simply runs courier postfix and smtp. It is a KVM VM and has two drives, one for os and one for data. I have a very full data drive (reports 100% full even though the used and available have a spread of about 24GB). I am not sure why or what is eating up space and then releasing it. A top shows mostly just postfix's imapd doing stuff. I cannot get iotop on this machine. So I figured to start freeing up space in users mailboxes on the server I would do a du -h -d1 to try to get who the biggest offenders are. Well, this command runs SLOW slower than it has ever. So since it ran slow, I figured I would issue a screen command of:
du -h -d1 > mailboxsizes.txt
So I could come to it in the morning and see the usages. It wrote out about 6 mailboxes, largest one being 2.2GB and then nothing. So came to the actual machine to see what the command was doing if it was still running and saw this:
[root@xmail]# du -h -d1 > /root/mailboxsizes.txt
[14280.306953] INFO: task imapd:12559 blocked for more than 120 seconds.
[14280.307710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[14280.309680] imapd D ffff8800d3d9cd98 0 12559 1 0x00000080
[14280.310591] ffff8800b17bbc20 0000000000000086 ffff880057bbce70 ffff8800b17bbfd8
[14280.310591] ffff8800b17bbfd8 ffff8800b17bbfd8 ffff880057bbce70 ffff8800d3d9cd90
[14280.313532] ffff8800d3d9cd94 ffff880057bbce70 00000000ffffffff ffff8800d3d9cd98
[14280.313532] Call Trace:
[14280.315669] [<ffffffff8168d159>] schedule_preempt_disabled+0x29/0x70
[14280.316637] [<ffffffff8168adb5>] __mutex_lock_slowpath+0xc5/0x1c0
[14280.316637] [<ffffffff81208e17>] ? unlazy_walk+0x87/0x140
[14280.318543] [<ffffffff8168a21f>] mutex_lock+0x1f/0x2f
[14280.319516] [<ffffffff81683c93>] lookup_slow+0x33/0xa7
[14280.320690] [<ffffffff8120c8f3>] path_lookupat+0x773/0x7a0
[14280.321718] [<ffffffff81183775>] ? filemap_fault+0x215/0x410
[14280.321718] [<ffffffff811de5e5>] ? kmem_cache_alloc+0x35/0x1e0
[14280.323363] [<ffffffff8120f23f>] ? getname_flags+0x4f/0x1a0
[14280.324348] [<ffffffff8120c94b>] filename_lookup+0x2b/0xc0
[14280.324348] [<ffffffff81210367>] user_path_at_empty+0x67/0xc0
[14280.325307] [<ffffffff811b1431>] ? handle_mm_fault+0x6b1/0xfe0
[14280.327150] [<ffffffff812103d1>] user_path_at+0x11/0x20
[14280.327965] [<ffffffff81203843>] vfs_fstatat+0x63/0xc0
[14280.328093] [<ffffffff81203dae>] SYSC_newstat+0x2e/0x60
[14280.328093] [<ffffffff81692875>] ? do_page_fault+0x35/0x90
[14280.330895] [<ffffffff8168ea88>] ? page_fault+0x28/0x30
[14280.331790] [<ffffffff8120408e>] SyS_newstat+0xe/0x10
[14280.331857] [<ffffffff81697089>] system_call_fastpath+0x16/0x1b
I am new to sysadmining and have zero idea what any of this is telling me save for something to do with imapd? I have done a reboot on this machine several times and it barely released any hard drives space or seemingly resources. I cannot figure out what is going on and why du failed above like it did. Mostly I am here asking where to even start? While this machine is old and has always had its moments, it has never done this before (even though I do acknowledge the data drive is low on space) but if I clear it, something eats it up.
For completeness:
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 2.9G 0 2.9G 0% /dev
tmpfs 2.9G 0 2.9G 0% /dev/shm
tmpfs 2.9G 41M 2.8G 2% /run
tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup
/dev/vda3 21G 18G 2.1G 90% /
/dev/vdb 459G 435G 442M 100% /mail
/dev/vda1 976M 119M 790M 14% /boot
tmpfs 581M 0 581M 0% /run/user/0
tmpfs 581M 0 581M 0% /run/user/1000
top - 06:06:53 up 6:52, 3 users, load average: 36.42, 36.64, 31.74
Tasks: 346 total, 8 running, 338 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.4 us, 1.2 sy, 0.0 ni, 0.0 id, 89.9 wa, 0.0 hi, 0.0 si, 1.5 st
KiB Mem : 5946284 total, 130280 free, 2278016 used, 3537988 buff/cache
KiB Swap: 2516988 total, 1906332 free, 610656 used. 3362528 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19174 postfix 20 0 27028 5504 1488 R 8.3 0.1 0:38.53 imapd
19586 postfix 20 0 27028 5504 1488 D 7.6 0.1 0:32.18 imapd
19008 postfix 20 0 27028 5504 1488 R 7.0 0.1 0:48.61 imapd
19372 postfix 20 0 27464 5872 1504 D 4.3 0.1 0:30.38 imapd
20087 postfix 20 0 27028 5504 1488 D 4.3 0.1 0:23.27 imapd
20188 postfix 20 0 27028 5504 1488 D 4.3 0.1 0:23.31 imapd
20353 postfix 20 0 27028 5508 1488 D 4.3 0.1 0:23.05 imapd
19963 postfix 20 0 27028 5508 1488 D 4.0 0.1 0:23.85 imapd
20275 postfix 20 0 27028 5508 1488 D 4.0 0.1 0:22.56 imapd
18460 postfix 20 0 29348 5748 1588 R 3.7 0.1 0:38.09 imapd
20236 postfix 20 0 27028 5516 1488 D 3.7 0.1 0:22.86 imapd
32 root 20 0 0 0 0 S 1.7 0.0 5:57.44 kswapd0
20079 postfix 20 0 32728 9152 1520 S 1.7 0.2 0:01.90 imapd
19702 postfix 20 0 27028 5516 1488 D 1.3 0.1 0:27.77 imapd
18575 postfix 20 0 30472 6848 1596 D 1.0 0.1 0:14.86 imapd
19782 postfix 20 0 27028 5508 1488 D 1.0 0.1 0:27.02 imapd
1026 root 20 0 1174028 22616 8992 S 0.7 0.4 2:53.90 fail2ban-s+
Not sure what to look at and try next to figure out where I can du some folders and know whose old inbox's we are keeping around to get rid of in an attempt to free up space and hopefully make the server perform better.
my only idea, is to systemctl stop postfix for a bit and see if du's and ls's work better and double check that top isn't pinged out with that.
also in case it is relevant an iostat:
iostat
Linux 3.10.0-514.16.1.el7.x86_64 (xmail) 11/21/2024 _x86_64_ (3 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
5.95 0.01 1.29 88.86 0.65 3.24
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
vda 11.92 252.15 61.60 6335865 1547820
vdb 1517.62 62131.62 78.76 1561227117 1979120
The message in your display is a kernel warning. Such warnings typically get broadcast to the console, recorded in the kernel ringbuffer (typically displayed with
dmesg
command with the optional-T
switch) and/or copied to log files in/var/log
.Often the command(s) you're running in that console are not the cause of the error that gets displayed.
In this case there might be a correlation though.
Most Open Source IMAP implementations store messages in the
Maildir/
format where each e-mail message is an individual file. On a filesystem with 450 GB of mail that will be a lot of directories, many with a great number of e-mails and therefore a lot of files. Runningdu
on that /mail file system is likely to be (extremely) inefficient, depending on the file system it uses. You can check which file system is in use with for examplecat /proc/mounts
and/or in/etc/fstab
. Some file systems perform better than others, but several 10.000's of files in a single directory (in a single Inbox or other mail folder) can be problematic already.Similarly it will also be taxing when the imapd needs to index such an Inbox or other mail folder.
A full filesystem, as already displayed with the
df
command, aggravates the issue. Your users are probably still receiving messages, reading them and deleting them so there is probably quite some load.Of interest might also be to run
df -i
if you're also running out of inodes.In general (nearly) full file systems suffer from all kinds of performance issues.
For a VM it is usually possible and trivial to simply assign more storage from the hypervisor to the virtual disk, then grow
/mail
the file system and many of your problems should go away. Both of those are usually online operations that can be safely executed during business hours.The opposite is to find wasted storage and reclaim that by deleting data that is no longer needed.
As to freeing up space:
Remove the Maildir and data of user accounts that no longer exist.
The IMAP protocol requires that clients mark individual messages as deleted, but that does not actually delete the files yet from a Maildir. That requires a separate EXPUNGE command. Some broken/legacy clients don't issue that expunge and deleted messages will linger indefinitely on your mail servers.
Other mail clients "delete" mail by moving the messages to a Trash or similar named folder where they can also linger indefinitely. Emptying their Trash for them can be useful to safely free up necessary space.
Depending on the mail servers that you're using there may be specific tools to help you with such clean up, like the
doveadm-expunge
tool that can search for email messages with the DELETED flag and will properly delete from your filesystem.