Jason Hunter wrote the popular Java Servlets book from O’Reilly. He just announced the availability of an interesting product named MarkMail which is based on a commercial product named MarkLogic Server. Here’s an excerpt from an email announcing the service:
For the last few months I’ve been working on a new project: a web
site for interacting with email archives. We’re using, as the
site’s initial content set, the public Apache mailing list archives
– because Apache is the community I know best and I think people
here will find the site useful. We’ve loaded a bit over 4,000,000
emails across 500 lists.
For some screenshots, read the rest of the entry.
I gave the site a quick look and it looks very interesting, here’s the page you are greeted with when you visit http://apache.markmail.org.
Figure 1. Summary of Aggregate Apache Mail Activity in MarkMail
If you change your URL to http://lucene.apache.org you’ll see a histogram of mail volume over time for the Lucene project, and if you then search for Hadoop you’ll see the following screen.
Figure 2. Viewing Search Results in MarkMail
Figure 3. Viewing an Email Conversation in MarkMail
Once you’ve selected a message, you can use the ‘n’ and ‘p’ buttons to navigate between messages.
MarkLogic Server, what is it?
Looks like the MarkLogic Server is an XML Content repository. The response of the system seems very fast, after doing some digging, I found this on the MarkLogic site:
How MarkMail works
To provision the MarkMail service we:
* store an archive of all sent email messages
* enrich messages with inferred structure from headers, body content and attachments
* build structured and full text search indices
* dynamically render all results pages, including necessary analytics
To build our indices, we simply subscribe to each mailing list, and as messages arrive the header content (such as the sender, recipient, date and message ID) is parsed and translated into XML. Each email is loaded and stored as an XML document, and accessed using the W3C standard XQuery language.
Pretty interesting, I’d given up on XQuery after having a few very bad experiences with Xindice before it was abandoned earlier this decade, but it looks like someone is using it with good results. I wonder how a solution like this (XQuery plus MarkLogic server) would compare to a something using Jackrabbit and the JCR API? Or am I comparing apples to oranges?
Update Monday @ 1:45 PM CST: Thanks to David for helping fix a glaring URL error.
Update Tuesday @ 9:21 AM CST: I am an idiot.