How I Learned to Stop Worrying and Love the Panopticon

by Cory Doctorow

How much ass does Google kick? All of it.

Remember when searching the Internet was hard? The dark days when we relied on dumb-as-sand machine intelligences, like those on the back-ends of AltaVista and Lycos, to rank the documents that matched our keywords? The grim era before Google, when searching was a spew of boolean mumbo-jumbo, NEAR this, NOT that, AND the other?

God, that sucked.

Lucky for the Internet, Google figured out the One True Way to make sense of the Internet, to defeat gamers of the system and send info-free brochureware plummeting to number n - 1 out of n results.

They did it with our help. Google's near-magical ordering of the Internet is built around the notion that computers are good at doing repetitive, uncreative things -- fetishistically counting things, for example -- and rotten at understanding why they're being asked to do these boring tasks. By contrast, human beings are great at understanding why they're doing something, but they're woefully deficient in the do-the-same-thing-perfectly-and-forever department.

AltaVista tried to get computers to do both the repetitive parts (capturing billions of documents) and the creative parts (figuring out what the documents are about). This yielded the largest collection of randomly organized documents in the world, a Web-accessible version of a library where all the books have been re-shelved by axe-grinding illiterates who wanted to make sure that no matter what you were looking for, you'd find porn.

Yahoo tried just the opposite, getting human beings to manually identify and describe all the documents comprising what was meant to be an exhaustive index of all the worthwhile pages on the Web. There were "scaling issues" involved in this laudable effort (for "scaling issues" here, substitute "catastrophic failures"), and over time, Yahoo's directory dwindled to an increasingly marginal sliver of the Internet's vastness. At the rate that Yahoo's army of indexers work, and at the rate that the Internet's unwashed horde of writers is adding to the noosphere, it's only a matter of a few years before every human being alive will have to pass his or her every working hour contributing to Yahoo's index, just to keep its sliver from dwindling into utter pointlessness.

Let humans do what they do; let computers do the same.

Google bridges the divide between human-generated indexes and machine-generated analysis.

Y'see, the Web is full of people like you and me, making links between documents; human beings, making decisions about documents, voting with their links. When I link to some arbitrary document, it's an indication that I think that it's in some way authoritative. When you link to a document I wrote, you're indicating that I'm in some way authoritative. The Internet is already structured in a meaningful way, but that structure is obscured. Google teases out the relationship between the URLs, examining the webs of authority: this person is linked to by 50,000 others, and he links to this other person over here, which indicates that person one is a pretty sharp individual, one who's inspired 50,000 human beings to take time out of their busy schedules to link to him; and person one thinks that person two is on the ball, which suggests that person two knows what she's on about.

It's a best-of-both-worlds solution. The computers at Google are asked to tirelessly count and re-count the number and destination of links on every page that Scooter, the Googlebot, can lay its user-agent on. Those links are made by human beings, doing what they do best, link by link, drip by drip, layering a film of order over the Internet.

The approach works well. Eerily well. Enter a couple of search terms, and biff-bam, the most authoritative documents containing those keywords are served up in an instant. Nearly every document on the Web has a human decision associated with it for Google to glom onto; that's because nearly every document on the Web has a human author. Human authors don't just put documents onto the Web; they put them into the Web, into the meshed hairball of incoming and outgoing links, indicating not only what keywords the document contains, but also who the document's author believes is authoritative, and vice versa.

It's quite elegant.

