I've got a bunch of duplicate messages in my IMAP server's Maildir. What's the best way to remove them?
Some relevant points:
- Shared Message-ID is usually a good enough definition of duplicate. A tiny script that removes all but one of the duplicate messages would work.
- Sometimes it's necessary to find duplicates based on shared message bodies. What's a reasonable definition of shared here? Bitwise equivalent? What about weird differences in line wrapping, escaping, character encoding?
- Sometimes there's some meaningful difference between 'duplicate' messages. What's the best way to review the differences in sets of 'duplicate' messages? Diffs?