LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   SpamAsssassin training no longer seems to work (https://www.linuxquestions.org/questions/linux-server-73/spamasssassin-training-no-longer-seems-to-work-822143/)

wpost 07-26-2010 12:02 AM

SpamAsssassin training no longer seems to work
 
I keep training SpamAssassin with the spam that slips through, but don't see any improvement is spam detection. The details are these:

Using ssh I trained SpamAssassin with examples of spam and ham I had saved for that purpose, perhaps 500 of each. I set up email recipes such that spam@example.com would go through sa-learn --spam --mbox, and ham@example.com would go through sa-learn --ham --mbox. Since then I have forwarded (as an attachment) any spam that slips through to spam@... and an equal amount of fresh ham to ham@...

That worked well for about a year. Spam in the inbox went from 150 per day to less than one per week on average, with no false positives. Naturally, spam blooms sometimes appeared, but training dealt with them.

I wrote up my spam setup in greater detail in these notes to myself:

http://my.opera.com/wpost/blog/spamassassin

For the past month or so, however, a steady stream of nearly identical spam has been getting through and despite training it all I see no change in spam scores on it. Here are relevant headers from a typical message:

(snip)
X-Spam-Status: No, hits=-6.6 required=3.5
tests=BAYES_00,RCVD_IN_DNSWL_MED,STOX_REPLY_TYPE autolearn=ham
version=3.002005
(snip)
X-AVES-Antispam: Maybe spam, 17.32 >= 4.00 [as:15.30 cc:0.00 hc:2.02
sa:17.32]
(snip)

Notice that a spam filter on an upstream server correctly flagged it, but my SpamAssassin. Nor does the score on these nearly identical messages seem to change with training.

Another clue: previously my spam folder was receiving fresh spam every hour. Since this problem began the spam folder receives almost nothing.

Sure, it would be easy to cook up a procmail recipe to filter on the upstream server's "maybe spam" header, but I'd rather fix the underlying problem with SpamAssassin, not cover it up.

Prior to coming here I searched the web, consulted my web host's knowledge base's articles on SpamAssassin, and read everything relevant at spamassassin.apache.org, but I remain stumped.

Any thoughts on what might I be doing wrong here, and what might I look at to improve spam training?

Noway2 07-26-2010 05:53 AM

Something doesn't completely add up.
Quote:

X-Spam-Status: No, hits=-6.6 required=3.5
tests=BAYES_00,RCVD_IN_DNSWL_MED,STOX_REPLY_TYPE autolearn=ham
version=3.002005
This says that the email scored as -6.6 on the spam meter and that 3.5 is required to take action. You have three rules that came into play on this message: Bayes_00, Received in DNSWL (medium level) and Stox Reply type. According to SpamAssassin documentation, the default scores for these (with bayes) would range form 0 -1.9 for the Bayes (0%), 1.89 to .1 for the reply type, and 0 to -2.3 for being in a whitelist sender category.

This means, by default, at best this could score about -4.2 and you are getting -6.6. Did you alter any of the severity levels with your modifications?

Also, your BAYES filter THINKS that this message is NOT spam, declaring a spam percentage of less than 1%. Apparently it has been taught that this type of content is valid.

Perhaps you should 'clear' things out and re-teach it?


All times are GMT -5. The time now is 10:03 AM.