I've got a bunch of duplicate messages in my IMAP server's Maildir. What's the best way to remove them?
Some relevant points:
- Shared Message-ID is usually a good enough definition of duplicate. A tiny script that removes all but one of the duplicate messages would work.
- Sometimes it's necessary to find duplicates based on shared message bodies. What's a reasonable definition of shared here? Bitwise equivalent? What about weird differences in line wrapping, escaping, character encoding?
- Sometimes there's some meaningful difference between 'duplicate' messages. What's the best way to review the differences in sets of 'duplicate' messages? Diffs?
I've made some significant improvements to Kevin's script mentioned above, and he was kind enough to accept my pull requests. Eventually we split this off into a dedicated project which you can find here:
https://github.com/kdeldycke/maildir-deduplicate
for generic files in linux, I use fdupes utils to remove duplicate files. I found it also works for Maildir messages.
Best I've found today is Kevin Deldycke's maildir-deduplicate.
X-MIMETrack
header by default and compares headers using theSHA224
digest.I bet someone could make something fancy from Rick Sanders' delIMAPdups.pl, part of his IMAP Tools.
Gnome's Evolution [a graphical mail user agent] has a built-in feature to remove duplicate mail. As explained on this help page, it boils down to:
Voilà.
P.S. Evolution can access your messages locally (MailDir, MH, Mbox) or over IMAP.
If you use Dovecot for IMAP access, you can use the following command:
It should take care of everything, all duplicate emails should be deleted right away.