O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


A Plan for Spam Folders

by Brian McWilliams, author of Spam Kings
01/20/2005

Spam filter developers make no secret that putting spammers out of business is their ultimate goal. Author and programmer Paul Graham threw down that gauntlet in his seminal 2002 paper on Bayesian spam filtering: "If we get good enough at filtering out spam, it will stop working, and the spammers will actually stop sending it."

In a subsequent article entitled Will filters kill spam? Graham predicted that spammers would give up if everyone had an effective spam filter.

But even if spam filters were 100 percent accurate and universally used, their ability to kill spam completely seems improbable, given current consumer behavior and a design element indispensable to most junk email filters: the spam folder.

Consumers have shown they can't resist the temptation to buy from spammers. A recent survey by Forrester Research (PDF) found that 41 percent of U.S. internet users have made purchases in response to spam. According to the Direct Marketing Association, email ads generated more than $32 billion in sales in 2003.

To be sure, those figures might be even higher if it weren't for spam filters. But the reality is that some consumers still respond to spam even after it has been filtered into spam folders.

For proof of this, try Googling the string %40Bulk. That's part of a long URL that shows up in a Web site's traffic logs whenever a Yahoo Mail user clicks on a link to the site from within a message in the user's spam folder.

Fortunately for spam researchers, a number of junk emailers leave their web server logs exposed to search engine spiders. As a result, we can get a partial view of how consumers interact with spam folders.

Related Reading

Spam Kings, hardcover edition
The Real Story behind the High-Rolling Hucksters Pushing Porn, Pills, and %*@)# Enlargements
By Brian McWilliams

Consider the example of LinkToCash.com, a web site that advertises a multilevel marketing (MLM) scheme. According to the site's referrer log file, it received around 965 visits from Yahoo Mail users in January 2003. Around 107, or about 11 percent, of those Yahoo visitors arrived by clicking on an email in their Bulk email folder.

Referrer log files from other advertised sites show lower percentages of click-throughs. WebCashVideos.net, another MLM site, recorded 98 Yahoo Mail visits in an undated log, just 4 of which were from the Bulk folder.

In some instances, however, the number of site visitors who arrive via the Yahoo Mail spam folder nearly equals the number of those who come by clicking on messages in their in-box. Referrer logs for a page at DesertPublications.com show that 16 of 34 Yahoo Mail visitors on May 28, 2003 clicked on a link in a message that had been identified as spam.

Now, the unscientific examples above should obviously be regarded with caution. It's impossible to know whether the log files are truly representative of Yahoo user behavior or that of email users overall. (A Yahoo representative said the company didn't have systemwide data to share on its users' click-throughs from their spam folders.)

Furthermore, we don't know whether these visitors actually purchased anything at the destination sites. Many could simply be window-shopping. Some might even be antispammers doing reconnaissance prior to reporting the site for spamming.

But these logs reveal an important fact: simply segregating spam from legitimate email won't stop some users from opening it and visiting the advertised site.

Most big webmail providers seem to recognize this, although they are reluctant to publicly discuss their spam folder strategies. MSN Hotmail, for example, disables links in messages that have been identified as spam. The service requires users to click on a special link to activate URLs in suspected spam messages.

The latest version of AOL (9.0) behaves similarly. Users who attempt to click on links in suspected spam receive a pop-up warning message: "This link has been disabled for your safety. To activate, click 'Show images and enable links' above."

However, some email providers, including Yahoo and Gmail, as well as many client-based spam filters, including the one in Outlook 2003, give users full access to messages that have been filtered into spam folders.

It appears that webmail providers and others who offer spam filters must ask themselves an important philosophical question: just how paternalistic do we want to be?

Clearly, there are dangers to eliminating the spam folder altogether and simply deleting or not delivering messages caught by a spam filter.

After all, content-based spam filters rely heavily on the concept of training, and they need input from users to learn what's junk and what isn't. So, even if spam filters achieved 100 percent effectiveness, they'd still need training to reach that feat, and that means saving rather than deleting suspected spam.

For this reason, the user guide for SpamAssassin, one of the most popular content-based spam filters, specifically warns administrators against deleting suspected spam.

Furthermore, all the big webmail providers advise users to regularly review the contents of their spam folders, to ensure that legitimate messages haven't erroneously been filed away there. Indeed, avoiding so-called false positives--and the accompanying user wrath--is likely a key reason most email services stick with the spam folder concept.

Even as spam filters approach perfection, spam folders, coupled with consumer behavior, will unavoidably keep a (reduced) number of spammers in business. But the nature of that business may change dramatically.

In time, spam folders may become for some incorrigible online shoppers akin to Sunday circulars--those pullout ad sections that get inserted into newspapers. (This view is in sharp contrast to that of many internet users, who regard the contents of their spam folder as virtual toxic waste.)

In turn, spammers may stop putting so much effort into disguising their messages to try to fool filters. Instead of using cryptic subject lines, weird HTML, and bizarre language in hopes of landing something in the recipient's inbox, spammers may focus instead on creating irresistible subject lines. Once they've resigned themselves to being segregated to spam folders, junk emailers may decide that writing good ad copy is just as important to spam success as having access to fresh proxies or bulletproof hosting.

For filter developers, it's only natural to strive for an internet free of junk email. (Paul Graham has admitted to feeling "as if I were playing some kind of competitive game with the spammers.") But for most internet users, filters still provide a vital productivity- and sanity-saving service--even if they don't completely wipe spammers off the face of the earth.

Brian McWilliams is the author of Spam Kings and is an investigative journalist who has covered business and technology for web magazines including Wired News and Salon, as well as the Washington Post and PC World, Computerworld, and Inc. magazines.


Return to the O'Reilly Network.

Copyright © 2009 O'Reilly Media, Inc.