Prime-Time Hypermedia

by Jon Udell

When Michael Kinsley stepped down from the editorship of Slate, an interviewer asked him an interesting question: what had he expected of the Web, and where had it fallen short? The biggest disappointment, Kinsley said, was the Web's failure to deliver on the promise of hypermedia. Online music and film reviews describe songs and movies in words, he said, but rarely, if ever, quote from audio and video streams. The hypermedia Web that he had imagined was, for the most part, not happening.

Issues of copyright and fair use are doubtless part of the problem, but there are also technical hurdles to overcome. In the course of trying to transform my blog from a hypertext publication into a hypermedia publication, I've run into a bunch of obstacles. In the world of tech blogging they are -- ironically -- almost purely technical. Presentations, demos, and interviews are often freely available for viewing or listening, yet infuriatingly hard to link to. Almost anyone can create and post a snippet of audio or video, but almost no one can do so easily, spontaneously, or routinely.

In a series of columns beginning with this one, I'll review and elaborate on a variety of hypermedia techniques I've been experimenting with. I don't know beans about high-end AV technologies, so don't look for expert guidance or Hollywood production values. I come at this from the bottom up, as a web-savvy blogger frustrated by the opaqueness and intractability of existing hypermedia content. I want to be able to repurpose that stuff on my blog. I want you to be able to do the same on your blog. And I'd like to see all of our blogs enriched with original audio and video content, where appropriate. It's time to take the Web to that next level, and the means to do so are at hand.

Before diving into the gnarly details, I'll use this installment to lay out some guiding principles. In no particular order:

Blogs Will be a Primary Index into the Hypermedia Web

We normally assume that you can't search video and audio. In fact, there's technology in the pipeline that could turn that assumption on its head. In 2002, I reviewed (registration required) a revolutionary phonetic indexing system from Fast-Talk Communications (now Nexidia). Using a demo application based on this system, I was able to record phone interviews, index them in near-realtime, and then search them phonetically. For example, to find an occurrence of "MySQL" using this application, "my sequel" was an effective search term. I probably could have used "my seek well" instead. Rather than doing something really hard -- converting speech to text, then indexing the text -- the system takes a clever shortcut. It recognizes and indexes raw phonemes (the basic sounds of speech), translates your search term into phonemes, and searches accordingly.

It worked well enough for me to become a practical tool, not just a novelty. I don't know when Nexidia's (or comparable) technology will find its way into Google and its competitors, but sooner or later I expect it to radically transform our use of media content. For example, a Nokia presentation at JavaOne this year included a segment on web services middleware, focusing on a developer framework that simplifies access to a Liberty-based identity service. The whole video is 19 minutes long; this particular segment runs from 11:45 to 13:05. Suppose a search of all the JavaOne videos for "web services Liberty" yielded, near the top of a relevance-ranked list, a pointer to that segment within that stream. That's revolutionary, feasible, and coming soon -- I hope.

Even when we can search audio and video this way, though, I've concluded that text wrapped around segments within AV streams will be a potent way of finding those segments -- maybe the most potent. For the foreseeable future, text will be much more efficiently searchable than AV content. What's more, blogs together with text search engines form the nexus within which interesting bits of content are drawn to our attention. Conferences produce many hours of AV content that no one has time to consume. Buried within those hours of content are highlights that people will want to discuss and share. Bloggers today refer to those highlights, but they rarely link to them. When we do that, the Google dynamic can kick in. A few days after this article is published, I won't need to remember where I stashed the rtsp: URL that I used in the previous paragraph. I'll just need to be able to find the article in which I used that URL. So this very article will contribute a small piece of what I hope will become a massive index into the hypermedia Web. Blogs, in aggregate, will provide the bulk of that index.

Free the URLs

When web sites present AV content, the prevailing ethic is to bury the URLs deep within layers of indirection, script, and server-side sleight-of-hand. I'm not sure why this is so. To emulate TV and radio? That's a tragic error. Web AV is a different creature entirely. If I'm watching TV or listening to radio, I can't share that experience with you, unless you're available to watch or listen right now. Even if you are available, I can't share the specific slice of content that piqued my interest and that I thought would pique yours.

URLs are the key to a whole new way of attending to media. The principle is well established in the realm of text. As a blogger and a subscriber to blogs, I participate in an awareness network. If I read something important, I'll react to it on my blog, and you'll find out about it. You provide the same filtering and alerting services to me. Collectively, we process vast flows of information with an efficiency that would have seemed unattainable just a few years ago. URLs fuel the engine that powers this awareness network. And in the realm of AV, the engine is starving for URLs. This has to change.

HTTP Can Do More Than You Think

I owe this insight to Kevin Marks, formerly an Apple QuickTime engineer and now director of engineering for Technorati. I'd been experimenting with streaming servers, and exploring the various syntaxes used to randomly access portions of streams, on the assumption that you need to use a streaming server to support random access. Not so, Kevin pointed out. Streaming protocols are necessary for live broadcast, but otherwise, plain old HTTP is good enough not only for sequential access, but also for random access. HTTP 1.1's little-known and under-exploited byte range feature makes it possible to jump around in large media files.

This has extraordinary implications for multimedia bloggers, few of whom have access to Helix, QuickTime, or Windows Media servers, but many of whom can post files to web servers that support HTTP 1.1. Suppose, for example, you record a one-hour interview and post it to your blog as a 20MB MP3 file. You needn't do anything special to support random access into that file. It's all in the hands of the client. Some (including Winamp and RealPlayer) will let you move the slider to any point in the file; others (including QuickTime and Windows Media Player) won't.

It amazes me how few people -- including my geekiest acquaintances -- know about this. Taking things a step further, I've built an experimental web-based service that eliminates the client dependency. Hand it an MP3 URL plus start/stop times, and it hands you back the corresponding slice of the file. My first version of this hack Does The Simplest Thing That Could Possibly Work, and needs refinement, but it convinces me that there's important territory to explore at the intersection of HTTP and media content.

Applications Can Become Movies

In the software world, we spend a lot of time describing how things work. To echo Michael Kinsley's lament about music and film, why should those descriptions use only text, possibly augmented with screenshots? Why don't we present, and quote from, live experiences?

It's way easier to do that than you might think. Tools that capture screen video, along with voiceover, can produce compelling software demonstrations. It's true that many of these tools are commercial, but some highly capable ones -- including Windows Media Encoder -- are free.

When I review these tools, I'll do so with an eye toward speed, simplicity, and spontaneity. I'm not nearly as interested in highly produced marketing collateral as I am in the vast reservoirs of experiential knowledge that would otherwise go unrecorded and unshared.

Hypermedia Blogging for Everybody

The two-way Web unleashed by the blogging revolution is, and will remain, largely a textual medium. And yet we're clearly at an inflection point. It's increasing feasible to create and share media content. If you needed special AV skills and instincts in order to do that, it would be a non-starter. But I've never been an AV guy. What motivates me to explore the subject now is a profound sense that it's ready to become part of mainstream communication on the Web. I'm not sure where this series of columns will lead, but let's take it one step at a time.

Jon Udell is an author, information architect, software developer, and new media innovator.

Return to the O'Reilly Network