Ping a Specific Port

Question

jmreicha

Asked: 2011-10-21 13:16:20 +0800 CST2011-10-21 13:16:20 +0800 CST 2011-10-21 13:16:20 +0800 CST

Processing items with SpamAssassin and sa-learn

772

I have been working on getting SpamAssassin up and running for awhile now and am pretty close to being finished. However, there is one last thing that is grinding away at me that I can't seem to figure out. I have searched around a bit but have been unable to find an answer that I find to be conclusive, so I just want a little clarity so I can sleep better at night.

I have read that SpamAssassin needs at least 200 messages, preferably 1000 to do an effective job of Bayesian filtering. I have been feeding it spam (at least I think) by issuing the following command:

sa-learn --showdots --mbox --spam spamfolder

As far as I can tell it is being processed by SpamAssassin. So I run:

sa-learn --dump magic

and get the following output:

bruticus@bruticus:~$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        306          0  non-token data: nspam
0.000          0        210          0  non-token data: nham
0.000          0      68430          0  non-token data: ntokens
0.000          0 1318421928          0  non-token data: oldest atime
0.000          0 1319141693          0  non-token data: newest atime
0.000          0 1319142287          0  non-token data: last journal sync atime
0.000          0 1319142287          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count

Are the items in the nspam and nham column indicative of the actual amount of learning and messages that SpamAssassin is using for its Bayesian analysis?

Do I need to get these two sets of numbers up into the 1,000's to get SpamAssassin to really start doing its job or how do I know when I have fed it enough spam to start working correctly?

1 Answers

Voted

mailq · Answer 1 · 2011-10-21T15:34:13+08:00

Best Answer

mailq

2011-10-21T15:34:13+08:002011-10-21T15:34:13+08:00

You always need Spam and Ham samples. By only feeding Spam SpamAssassin refuses to activate the bayesian Spam filter.

By issuing a spamassassin -D < /path/to/a/complete.mail you can check if bayesian filtering is activated or not (somewhere in the whole debug messages).

Hopefully you didn't train SpamAssassin with old Spam (months old). It will only work well if you used recent Spam you (personally or as a company) got in the past. If you don't have Ham or Spam samples right now you should better set SA to autolearn. Then the filter gets trained over time. This takes longer and you can't see the benefit right now, but the outcome will impress you in the end.

Yes, your numbers show the "current" learned messages. If these numbers are greater than 200 you are finished. Everything above just makes it "safer" as in "more valid" or "accurate". With auto-learning these numbers will increase over time and also decrease as statistics of old mails will be dropped over time.

4

Processing items with SpamAssassin and sa-learn

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?