We have a Mac OS X 10.5 Leopard Server mail server that recently started getting issues with IMAP mailboxes having "invalid format[s]" this weekend. It turned out that there were some bad block counts on the volume housing IMAP data and the issue has not resurfaced after repairing the volume and the afflicted mailboxes. However, a new issue that is persisting is frequently crashing imaps
processes and ever increasing db4
"lockers" errors like the following:
Apr 13 17:01:12 host lmtpunix[31509]: DBERROR db4: 1134 lockers
The errors for the crashing imaps
processes from /var/log/system.log
are as follows:
Apr 12 13:43:10 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:12 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:13 host imaps[11792]: login: pool-72-92-XXX-XXX.burl.east.myfairpoint.net [72.92.XXX.XXX] user3 CRAM-MD5+TLS User logged in
Apr 12 13:43:15 host ReportCrash[14362]: Formulating crash report for process imapd[11792]
Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11
Apr 12 13:43:15 host ReportCrash[14362]: Saved crashreport to /Library/Logs/CrashReporter/imapd_2011-04-12-134315_host.crash using uid: 0 gid: 0, euid: 0 egid: 0
And the following from /var/log/mailaccess.log
:
Apr 12 13:43:10 host imaps[11792]: accepted connection
Apr 12 13:43:10 host imaps[11792]: mydelete: starting txn 2147495107
Apr 12 13:43:10 host imaps[11792]: mydelete: committing txn 2147495107
Apr 12 13:43:10 host imaps[11792]: mystore: starting txn 2147495108
Apr 12 13:43:10 host imaps[11792]: mystore: committing txn 2147495108
Apr 12 13:43:10 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:12 host imaps[11792]: accepted connection
Apr 12 13:43:12 host imaps[11792]: mydelete: starting txn 2147495112
Apr 12 13:43:12 host imaps[11792]: mydelete: committing txn 2147495112
Apr 12 13:43:12 host imaps[11792]: mystore: starting txn 2147495113
Apr 12 13:43:12 host imaps[11792]: mystore: committing txn 2147495113
Apr 12 13:43:12 host imaps[11792]: starttls: TLSv1 with cipher AES128-SHA (128/128 bits new) no authentication
Apr 12 13:43:12 host imaps[11792]: AOD: user options: no lookup required for: user3
Apr 12 13:43:13 host imaps[11792]: login: pool-72-92-XXX-XXX.burl.east.myfairpoint.net [72.92.149.161] user3 CRAM-MD5+TLS User logged in
Apr 12 13:43:13 host imaps[11792]: quota set to "unlimited" for mailbox user.user3
Apr 12 13:43:13 host imaps[11792]: open: user user3 opened Other Users/listmaster
Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11
Apr 12 13:43:15 host master[94896]: service imaps pid 11792 in BUSY state: terminated abnormally
Apr 12 13:43:15 host master[94896]: process 11792 exited, signaled to death by 11
Apr 12 13:43:15 host master[94896]: service imaps pid 11792 in BUSY state: terminated abnormally
The crash reports are all like the following:
Process: imapd [39069]
Path: /usr/bin/cyrus/bin/imapd
Identifier: imapd
Version: ??? (???)
Code Type: X86 (Native)
Parent Process: master [38605]
Date/Time: 2011-04-13 18:25:24.068 -0400
OS Version: Mac OS X Server 10.5.7 (9J61)
Report Version: 6
Anonymous UUID: 223C4DD1-2AE2-4381-8A28-DEB9082281A8
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000077a0ca64
Crashed Thread: 0
Thread 0 Crashed:
0 imapd 0x0003090c process_records + 588
1 imapd 0x00031362 mailbox_expunge + 2146
2 imapd 0x00006fde cmd_close + 179
3 imapd 0x00018cf8 cmdloop + 2940
4 imapd 0x0001c1b7 service_main + 1498
5 imapd 0x00002e73 main + 3502
6 imapd 0x00002006 start + 54
Thread 0 crashed with X86 Thread State (32-bit):
eax: 0x61766970 ebx: 0x000306cb ecx: 0x00000008 edx: 0x77a0ca64
edi: 0x00bfffa4 esi: 0x162a5fa4 ebp: 0xbfffad48 esp: 0xbfffac90
ss: 0x0000001f efl: 0x00010202 eip: 0x0003090c cs: 0x00000017
ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037
cr2: 0x77a0ca64
Yes, they all crash in process_records
in mailbox_expunge
.
I'm not really seeing any other errors in the logs, at least that seem to be related to the crashed processes in any way or are innocuous like SQUAT failed to open index file
and IOERROR: fstating sieve script /usr/sieve/u/user/defaultbc: No such file or directory
.
I must admit, I have not rebuilt the Other Users/listmaster
mailbox nor the user3
mailbox yet. It's not always the same user.
We do have some users that have found that sent email is not getting saved to their 'Sent Messages' mailbox and hasn't been since the date of the original issue. Rebuilding their mailbox (currently using sudo mailbfr -m username
as it does some extra permissions fixed in addition to the sudo /usr/bin/cyrus/bin/reconstruct -r user/username
that I'd normally run) seems to allow newly sent email to be saved to it, but I'm having trouble finding a correlation between that issue and this one (or any other errors in the logs).
Any suggestions would be greatly appreciated. Is is really crashing trying to delete messages? Should I just rebuild all users' mailboxes individually? I really do not want to rebuild the Cyrus database in its entirety and lose all flagged/read status for all messages.
I believe that corrupted blocks went to incorrect db indexes which are causing crashes while storing new data. Except of rebuilding database there is not much you can do. You can backup users .seen files and try to use them, but test this idea on test user. Honestly I think that harrdrive with bad blocks should be removed from the server as soon as possible anyway
I solved this issue long ago.
I don't remember the exact commands, but I found a way to reasonably correlate a specific crash to a specific user, at which point I could then run
mailbfr -m
to rebuild that user's mailbox. Eventually I was able to rebuild all of the problem mailboxes and rid the server of the issue.