In the arms race of spam prevention, content-based filters, including any Bayesian ones you care to throw at it, have been beaten. Until we get truly intelligent recognition, where a computer is smart enough to know that a subject of “She will love you for it” is Viagra spam, and that “I was at the end of my rope until I found this” is some money scam, the spammers will be able to get any content past the filters.


In addition to the tricks discussed in the
ActiveState Field Guide To Spam,
spammers are already started foiling the filters by throwing in random real words. I regularly get spam through two levels of filtering (SpamAssassin and Eudora) that looks like this:


      Our rates are the lowest!  You can get 3.45% fixed for
rough pencil final happy
      30-years!  Follow this link to get the best rates
napkins canine amazed
      in the country, but only for a limited time!

The extra random non-spam text foils it. And, since the words are random, tactics to get a checksum or signature on it are, or will be, useless. I suspect it won’t be long before spam comes through with three lines of spam content, and a couple K of random words. If we get to where words that are clearly random are somehow caught, then the spammers will turn to pulling random pages off the net for their obscuring text. Maybe they’ll throw in, say, a few pages of Macbeth to foil things.

The answer is to stop the spammers before they get their message in. All content-based filtering depends on the spammer getting their payload to us first, instead of checking them at the gate. This will mean a replacement of SMTP. Until then, SPF seems to have potential, but it has its drawbacks.

Mind you, I’m not throwing away my SpamAssassin install. It helps stop a significant amount of the spam. Unfortunately, content-based filtering is a Band-Aid on the real problem.

Do you see any solution outside of replacing SMTP?