When my mail setup detects that a mail is spam, it puts *SPAM*
in the subject. Now I want to improve my bayes filter by training it on my corpus of spam.
If I feed these thousands of mails to sa-learn
, will that work even if they still have the *SPAM*
in the subject? Or will it have the effect of telling the filter “something is only spam if it has *SPAM*
in the header”, which would be counter-productive?
According to the man page for
sa-learn
, this will be okay.