O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


Jon Udell Googling Your Email

by Jon Udell
10/07/2002

Someday we'll tell our grandchildren about those moments of epiphany, back in the last century, when we first glimpsed how the Web would change our relationship to the world. For me, one of those moments came when I was looking for an ODBC driver kit that I knew was on a CD somewhere in my office. After rifling through my piles of clutter to no avail, I tried rifling through AltaVista's index. Bingo! Downloading those couple of megabytes over our 56K leased line to the Internet was, to be sure, way slower than my CD-ROM drive's transfer rate would have been, but since I couldn't lay my hands on the CD, it was a moot point. Through AltaVista I could find, and then possess, things that I already possessed but could not find.

There began an odd inversion that continues to the present day. Any data that's public, and that Google can see, is hardly worth storing and organizing. We simply search for what we need, when we need it: just-in-time information management. But since we don't admit Google to our private data stores -- Intranets [1] and mailboxes, for example -- we're still like the shoemaker's barefoot children. Most of us can find all sorts of obscure things more easily than we can find the file that Tom sent Leslie last week.

What would it be like to Google your email? Raphaël Szwarc's ZOË is a clever piece of software that explores this idea. It's written in Java (source available), so it can be debugged and run everywhere. ZOË is implemented as a collection of services. Startup is as simple as unpacking the zipped tarball and launching ZOË.jar. The services that fire up include a local Web server that handles the browser-based UI, a text indexing engine, a POP client and server, and an SMTP server.

Because ZOË has a Web-style architecture, you can use it remotely as well as locally. At the moment, for example, I'm running ZOË on a Mac OS X box in my office, but browsing into it from my wirelessly connected laptop outside. I wouldn't recommend this, however, since ZOË's Web server has no access controls in place. By contrast, Radio Userland -- also a local, Web-server-based application, which I'm currently running on a Windows XP box in my office and browsing into remotely -- does offer HTTP basic authentication, though not over SSL. In the WiFi era, you have to be aware of which local services are truly local.

ZOË doesn't aim to replace your email client, but rather to proxy your mail traffic and build useful search and navigation mechanisms. At the moment, I'm using ZOË together with Outlook (on Windows XP) and Entourage (on MacOSX). ZOË's POP client sucks down and indexes my incoming mail in parallel with my regular clients. (I leave a cache of messages on the server so the clients don't step on one another.) By routing my outbound mail through ZOË's SMTP server, it gets to capture and index that as well. Here's a typical search result.

ZOË helps by contextualizing the results, then extracting and listing Contributors (the message senders), Attachments, and Links (such as the URL strings found in the messages). These context items are all hyperlinks. Clicking "Doug Dineley" produces the set of messages from Doug, like so:

Following Weblog convention, the # sign preceding Doug's name is a permalink. It assigns a URL to the query "find all of Doug's messages," so you can bookmark it or save it on the desktop.

Note also the breadcrumb trail that ZOË has built:

ZOË -> Com -> InfoWorld

These are links too, and they lead to directories that ZOË has automatically built. Here's the view after clicking the InfoWorld link:

Nice! Along with the directory of names, ZOË has organized all of the URLs that appear in my InfoWorld-related messages. This would be even more interesting if those URLs were named descriptively, but of course, that's a hard thing to do. Alternatively, ZOň could spider those URLs and produce a view offering contextual summaries of them. We don't normally think of desktop applications doing things like that, but ZOË (like Google) is really a service, working all the time, toiling in ways that computers should and people shouldn't.

Related Reading

Web Services Essentials
Distributed Applications with XML-RPC, SOAP, UDDI & WSDL
By Ethan Cerami

When we talk about distributed Web services, we ought not lose sight of the ones that run on our own machines, and have access to our private data. ZOË reminds us how powerful these personal services can be. It also invites us to imagine even richer uses for them.

Fast, fulltext search, for example, is only part of the value that ZOË adds. Equally useful is the context it supplies. That, of course, relies on the standard metadata items available in email: Subject, Date, From. Like all mail archivers, ZOË tries to group messages into threads, and like all of them, it is limited by the unfortunate failure of mail clients to use References or In-Reply-To headers in a consistent way. Threading, therefore, depends on matching the text of Subject headers and sacrifices a lot of useful context.

For years, I've hoped email clients would begin to support custom metadata tags that would enable more robust contextualization -- even better than accurate threading would provide. My working life is organized around projects, and every project has associated with it a set of email messages. In Outlook, I use filtering and folders to organize messages by project. Unfortunately, there's no way to reuse that effort. The structure I impose on my mail store cannot be shared with other software, or with other people. Neither can the filtering rules that help me maintain that structure. This is crazy! We need to start to think of desktop applications not only as consumers of services, but also as producers of them. If Outlook's filters were Web services, for example, then ZOË -- running on the same or another machine -- could make use of them.

Services could flow in the other direction, too. For example, ZOË spends a lot of time doing textual analysis of email. Most of the correlations I perform manually, using Outlook folders, could be inferred by a hypothetical version of ZOË that would group messages based on matching content in their bodies as well as in their headers, then generate titles for these groups by summarizing them. There should be no need for Outlook to duplicate these structures. ZOË could simply offer them as a metadata feed, just as it currently offers an RSS feed that summarizes the current day's messages.

At InfoWorld's recent Web services conference, Google's cofounder Sergey Brin gave a keynote talk. Afterward, somebody asked him to weigh in on RDF and the semantic Web. "Look," he said, "putting angle brackets around everything is not a technology, by itself. I'd rather make progress by having computers understand what humans write, than to force humans to write in ways computers can understand." I've always thought that we need to find more and better ways to capture metadata when we communicate. But I've got to admit that the filtering and folders I use in Outlook require more effort than most people will ever be willing to invest. There may yet turn out to be ways to make writing the semantic Web easy and natural. Meanwhile, Google and, now, ZOË remind us that we can still add plenty of value to the poorly-structured stuff that we write every day. It's a brute-force strategy, to be sure, but isn't that why we have these 2GHz personal computers?

Jon Udell is an author, information architect, software developer, and new media innovator.


1 Users of the Google Search Appliance do, of course, invite Google behind the firewall.


Read more Jon Udell columns.

Return to the O'Reilly Network.

Copyright © 2009 O'Reilly Media, Inc.