I'm looking at having a way for my mail users to completely manage their own spam training. Before I get into it, my mail server details:
Debian 7.5, postfix 2.9.6, dovecot 2.1.7, amavisd-new 2.7.1, spamassassin 3.3.2
So, each of my users in each domain has a Junk folder (/var/vmail/domain/user/.Junk
) where they can put spam that doesn't get flagged as such. Then I have this script in place:
/etc/cron.daily/learnspam
#!/bin/sh
find /var/vmail -name .Junk -exec echo Examining {}... \; -exec sa-learn --dbpath=/var/lib/amavis/.spamassassin --spam {}/cur \;
I also have a folder that each user has called False Positives where they can drag messages into that are erroneously marked as spam, and I have a daily script for that too, which learns it as ham and moves it back to their inbox.
/etc/cron.daily/falsepos
#!/bin/sh
doveadm search -A mailbox 'False Positives' 2>/dev/null | while read user guid uid; do
doveadm fetch -u $user text mailbox-guid $guid uid $uid > /tmp/$guid-$uid.eml
doveadm move -u $user INBOX mailbox-guid $guid uid $uid
done
sa-learn --dbpath=/var/lib/amavis/.spamassassin --ham /tmp/*-*.eml
if ls /tmp/*-*.eml >/dev/null 2>&1; then
rm /tmp/*-*.eml
fi
My question is, am I doing this correctly? Is there a better way? Does sa-learn
work properly with amavis? I figure as long as I'm using the --dbpath=/var/lib/amavis/.spamassassin
option, it should work fine.
You might want to take a look at dspam. It integrates with Dovecot and does basically exactly what you want, but on the fly, as the move operations happen (moving into Junk => spam, moving out of Junk => false positive).
Your approach looks fine; I do something similar.
Two remarks:
--dbpath
is good, that prevents a common setup error where SA uses a DB in~amavis
andsa-learn
writes to a different DB in~root
.Dspam does Bayesian filtering better than spam assassin. Many other filtering mechanisms like RBL, greylisting and DNS validity checks can be configured from the MTA (e.g. postfix). In this approach, you only look at email content after the other tests have been passed, which makes the system much less resource hungry. You don't get the same weighted combination, but if set up well you can get a very good spam system which uses much less CPU, and RAM. Also the dovecot plugin is triggered by moving mail between folders, which is much nicer than having separate folders for training.