I have a mailserver (running Exim as MTA/MDA, and Dovecot for IMAP access) with around 50 users, and around 100GB total data (including some huge accounts, some defunct ones, etc). The mail is all stored in Maildirs. We have a sudden need to pull out all mails where the headers (To, From, Cc, etc) contain one of a handful of domains, to satisfy a request from our lawyers.
Now, I can hack together an inefficient solution (grep -R through the mail archive for the domains in question, a touch of cut, sort and uniq to get just the distinct filenames, copy all those files to a new Maildir and take it from there, perhaps) but this is going to take a hell of a long time to run on the hardware available. Is there a tool out there that will take away the pain of this process for me?
Platform isn't a huge issue - the server in question runs Ubuntu 12.04, but I have a sufficiently recent snapshot of the data I can mount on a machine running anything reasonable - and there's no requirement that the solution be FOSS, although the indicative software budget is in the hundreds not thousands of pounds.
I suspect there's a really obvious answer to this that Google isn't showing me, probably because I have the wrong search terms - anyone have an experience of this?
Thanks!