I would like to do some statistics on the e-mails sent through my postfix server.
Mostly I will need to create a report counting then number of messages that were sent through this machine.
I am running postfix on ubuntu 9.10, but have very limited experience managing it, so if someone could outline how to conceptually do the above I would be grateful.
Bonus points are if I will be able to filtering/groping on
- subject of the message (will mostly be simple RE)
- eventually I'd like to include spam score from Spamassasin or dspam (when I configure/install them)
Have no problem with SQL so if I could get csv of timestamp, subject, messageid, subject, spam-score I'd know how to proceed; just need to conceptually organize where to get this from (which options to set and/or which logs to extract from).
EDIT: I also have a requirement for the procedure to be as reliable as possible - I'd like to filter out any bounces and other errors if possible.
Have a look at pflogsumm
I used the info at http://www.packetmischief.ca/network/monitoring/postfix/ to do something like this. It's worked well for me, I included it for a while in my weekly output message, and while I was still validating that the server ran correctly, the daily output mail. Can't give details (intellectual property stuff signed for work...) but it really isn't a big change from that site, in fact as I recall, just following the info there will give you what you initially are looking for (via snmp).
If you're good with perl, you can easily get this to populate a db instead.
As for the "bonus points" - the subject is not logged in postfix's logging (at least not by default). I use amavisd-new and it logs the spam information into the log, so it would be fairly trivial to add some lines to the perl scripts referenced in the link above and have it pick up the score. An example line looks like this:
Oct 23 01:29:58 hercules amavis[19936]: (19936-08) SPAM, -> , Yes, hits=15.389 tag=0.1 tag2=3.5 kill=3.5 tests=FH_HELO_EQ_D_D_D_D=1.117, HELO_DYNAMIC_IPADDR2=3.888, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, RAZOR2_CF_RANGE_51_100=0.365, RAZOR2_CF_RANGE_E4_51_100=0.467, RAZOR2_CHECK=1.729, RCVD_IN_BRBL_LASTEXT=1.644, RCVD_IN_PBL=3.558, RCVD_IN_RP_RNBL=1.284, RDNS_DYNAMIC=0.363, SPF_SOFTFAIL=0.972, quarantine spam-60473 (maia-spam-quarantine)
I wrote "grep" like script for postfix log: https://github.com/brablc/postfix-tools/blob/master/pflogrep
It searches for a line and outputs all lines with the same queue id and thus allowing
pflogsumm
to provide correct statistics on filtered data.