Ping a Specific Port

Question

CaptSaltyJack

Asked: 2014-07-21 12:09:31 +0800 CST2014-07-21 12:09:31 +0800 CST 2014-07-21 12:09:31 +0800 CST

Suggested mechanisms for user-driven spam training?

772

I'm looking at having a way for my mail users to completely manage their own spam training. Before I get into it, my mail server details:

Debian 7.5, postfix 2.9.6, dovecot 2.1.7, amavisd-new 2.7.1, spamassassin 3.3.2

So, each of my users in each domain has a Junk folder (/var/vmail/domain/user/.Junk) where they can put spam that doesn't get flagged as such. Then I have this script in place:

/etc/cron.daily/learnspam

#!/bin/sh

find /var/vmail -name .Junk -exec echo Examining {}... \; -exec sa-learn --dbpath=/var/lib/amavis/.spamassassin --spam {}/cur \;

I also have a folder that each user has called False Positives where they can drag messages into that are erroneously marked as spam, and I have a daily script for that too, which learns it as ham and moves it back to their inbox.

/etc/cron.daily/falsepos

#!/bin/sh

doveadm search -A mailbox 'False Positives' 2>/dev/null | while read user guid uid; do
    doveadm fetch -u $user text mailbox-guid $guid uid $uid > /tmp/$guid-$uid.eml
    doveadm move -u $user INBOX mailbox-guid $guid uid $uid
done

sa-learn --dbpath=/var/lib/amavis/.spamassassin --ham /tmp/*-*.eml
if ls /tmp/*-*.eml >/dev/null 2>&1; then
    rm /tmp/*-*.eml
fi

My question is, am I doing this correctly? Is there a better way? Does sa-learn work properly with amavis? I figure as long as I'm using the --dbpath=/var/lib/amavis/.spamassassin option, it should work fine.

3 Answers

Voted

moenoel · Answer 1 · 2014-07-23T23:13:30+08:00

Best Answer

moenoel

2014-07-23T23:13:30+08:002014-07-23T23:13:30+08:00

You might want to take a look at dspam. It integrates with Dovecot and does basically exactly what you want, but on the fly, as the move operations happen (moving into Junk => spam, moving out of Junk => false positive).

3

mschuett · Answer 2 · 2014-07-24T00:40:29+08:00

mschuett

2014-07-24T00:40:29+08:002014-07-24T00:40:29+08:00

Your approach looks fine; I do something similar.

Two remarks:

Using --dbpath is good, that prevents a common setup error where SA uses a DB in ~amavis and sa-learn writes to a different DB in ~root.
One design limitation regarding multi-user operation: SpamAssassin uses a single global Bayes DB -- not a DB per user.

2

mc0e · Answer 3 · 2014-07-27T00:54:58+08:00

mc0e

2014-07-27T00:54:58+08:002014-07-27T00:54:58+08:00

Dspam does Bayesian filtering better than spam assassin. Many other filtering mechanisms like RBL, greylisting and DNS validity checks can be configured from the MTA (e.g. postfix). In this approach, you only look at email content after the other tests have been passed, which makes the system much less resource hungry. You don't get the same weighted combination, but if set up well you can get a very good spam system which uses much less CPU, and RAM. Also the dovecot plugin is triggered by moving mail between folders, which is much nicer than having separate folders for training.

0

Suggested mechanisms for user-driven spam training?

/etc/cron.daily/learnspam

/etc/cron.daily/falsepos

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?