Article:
 |
|
Mail-Filtering Techniques
|
| Subject: |
|
Spammers are NOT beating Bayesian filters |
| Date: |
|
2004-05-20 19:15:40 |
| From: |
|
mgwhit
|
|
|
|
Spammers may have "learned to work around Bayesian filtering by inserting 'positive' words into their messages", but, if so, they have learned wrong.
Spammers have adopted the tactic of including random words, gibberish, fake chitchat and even actual news articles into their emails, presumably to get around Bayesian (and non-Bayesian) filters, but this is a desperation move. This rarely affects the Bayesian analysis of these emails because Bayesian techniques rely on only those tokens that it deems statistically significant (i.e. only those considered vary high or very low indicators of spamminess). When I run my email through bogofilter (at home running Linux) or SpamBayes (running Outlook at work), the additional garbage that spammers hide in their messages inevitably ends up being tossed aside in favor of the more obvious spam tokens. There are only so many ways you can type something that looks like "Viagra".
I'm getting 98% accuracy with no false positives using Bayesian filters. Just because you see spammers trying to outfox them doesn't mean they're successfully doing it.
|
Showing messages 1 through 1 of 1.
-
Spammers are NOT beating Bayesian filters
2004-05-25 17:52:25
rmcouat
[View]