A lot of spam is getting through the filter on the mail server I run with the relatively simple trick of starting with few lines of (incredibly obvious) weight loss or other scam text at the top, followed by a larger body of text from programming documentation — or, most evil of all, text scraped from Stack Exchange. At best, Spamassassin regards this as BAYES_50, and it happens that the rest of the messages are constructed carefully enough that they don't hit other triggers. (For example, the headers are minimal and correct.) Often, the included excerpts align closely enough with my legitimate interests that the message overall is scored as BAYES_00, because the very spammy tokens are just overwhelmed by juicy nuggets of sysadmin problem-solving.
The top part is so obviously spammy (and in fact tends to be very similar to previously-received and trained as spam messages) that I'm kind of amazed that it's getting through — but clearly it is. It seems like a separate pass which scored the top 25 (or so) lines of the message and weighed that heavily would solve the problem. Is there a way to do this?
Several people have suggested writing custom regular expressions. I do not want to get into this, as this is a constant losing battle. It's what people did before Bayesian spam sorting came into widespread use, and it was generally terrible. No human can keep up. It's not much more effective than just hitting the delete key for each spam message, and a lot more work on my part.
Bayesian spam filtering works. It even works on this spam, if I split out the "above the fold" portion and just analyze that part, with the decoy / chaff removed. The question is: how can I get Spamassassin to do that?