On a linux box running postfix+amavis+spamassassin, we are thinking of implementing bayes filtering. This system already does spam filtering (without bayes) at the moment for multiple customer domains.
The question is, how should training be done in this scenario? Would we need to collect spam and ham from each client or would just one do and have a global database?
Thanks.
Bayes database is global per each SA configuration. You can setup it's location via
bayes_path
option inlocal.cf
configuration file. Check for more detail here: https://wiki.apache.org/spamassassin/SiteWideBayesSetupYou can perform initial training of the database with your sets of ham and spam messages or wait for SA to learn from messages being received from postfix.
You may want to set different SA configuration files for different domains if average message content for these domain is too different and there are too many incoming messages with borderline content which should be marked as spam for users of one domain and as ham for users of another domain.