O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


Primetime Hypermedia

Marrying Hypertext and Hypermedia

10/13/2004

The first presidential debate was widely republished on the Web, in both text and video formats. But, as usual, the two formats were mostly used separately, not in concert. For example, C-SPAN offered a RealVideo stream here: rtsp://cspanrm.fplive.net/cspan/project/c04/c04093004_debate1.rm and pointed separately to a corresponding HTML transcript here: http://www.debates.org/pages/trans2004a.html.

If you peek at the address of the video stream, you'll see that I've cheated slightly. If you capture the rtsp: URL shown above and load it directly into RealPlayer, it'll work. But if you link to it from a browser, it likely won't. Here are some ways to wrap an rtsp: URL in an http: URL so it'll be more likely to cooperate with a browser.

  1. Place the rtsp: URL in a .ram file and form an http: URL linking to the .ram file, like so.

  2. Use the media server's ramgen virtual directory to provide the necessary wrapper. So http://cspanrm.fplive.net/ramgen/cspan/project/c04/c04093004_debate1.rm produces rtsp://cspanrm.fplive.net/cspan/project/c04/c04093004_debate1.rm.

  3. Place the following SMIL markup in a .smil file:

    
    <smil xmlns="http://www.w3.org/TR/REC-smil">
    <body>
      <video src="rtsp://cspanrm.fplive.net/cspan/project/c04/c04093004_debate1.rm"/>
    </body>
    </smil>
    

    and form an http: URL linking to it, like so.

  4. Use a ramgen-like service--such as Rich Persaud's RPXP/web--to produce the SMIL markup dynamically, like so.

Suppose, though, that you don't want to simply point people to the hour-and-a-half video of the debate. Suppose, instead, you want to call out each of the question-and-answer segments individually. This problem presents several kinds of challenges. First, there's a flock of video formats to consider, most notably Real, QuickTime, Windows Media, and Flash. Second, there's the distinction between downloading from a web server and streaming from a media server.

For the purposes of this column, let's focus on just one of the cells in that 4-by-2 matrix. CSPAN's debate video uses the Real format and is served up by RealServer. Even with this tight focus, there are two quite different ways to create a segmented presentation that uses hyperlinks to access segments and hypertext to annotate them. Basically, it's a question of who controls the show--the media player or the browser.

Media Player as Container

Let's look first at how the media player can run the show. Here's a more elaborate SMIL wrapper, which, much like an HTML frameset, carves the RealPlayer's display into a three-pane view:


<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
<head>
<meta name="base" content="http://udell.roninhouse.com/drafts/pm3/" />

<layout>
<root-layout width="560" height="480" />
  <region id="text_region" width="320" height="480" left="0" top="0" />
  <region id="video_region" width="240" height="180" left="320" top="0" />
  <region id="text_region2" width="240" height="300" left="320" top="180" />
</layout>
</head>
<body>
<par>
  <textstream src="debate_index.rt" region="text_region" dur="1:30:00"/>
  <video src="rtsp://cspanrm.fplive.net/cspan/project/c04/<!--line break-->
  c04093004_debate1.rm?start=0:12" region="video_region" dur="1:30:00"/>
  <seq>
  <textstream src="debate00.rt" 
    region="text_region2" dur="2:10"/>
  <textstream src="debate01.rt" 
    region="text_region2" dur="3:51"/>
  <textstream src="debate02.rt" 
    region="text_region2" dur="3:46"/>
  <textstream src="debate03.rt" 
    region="text_region2" dur="3:45"/>
  </seq>

</par>
</body>
</smil>

Here's a link to that presentation, and here's a picture of it:

It's a two-column layout. The left column is a single full-height pane, containing links to (just the first three) questions and rebuttals. The right column is divided into a video pane on top and an annotation pane below.

In the main SMIL file, the contents of the left column are defined like so:


<textstream src="debate_index.rt" region="text_region" dur="1:30:00"/>

This says that the RealText markup sourced from debate_index.rt will play in the left column for the whole duration of the video--an hour-and-a-half. Here is that markup:


<window type="generic"
        duration="1:30:00"
        height="480"
        width="320"
        underline_hyperlinks="true" />

<font face="arial" size="2">

<ol>
<li><a href="command:seek(0:0)" target="_player">Intro</a></li>

<br/>
<li>
<a href="command:seek(2:10)" target="_player">Q1 to Kerry</a>,
  <a href="command:seek(4:26)" target="_player">Bush rebuttal</a> 
</li>
<br/>
<li><a href="command:seek(6:01)" target="_player">Q2 to Bush</a>,
  <a href="command:seek(8:10)" target="_player">Kerry rebuttal</a>
</li>
<br/>
<li><a href="command:seek(9:47)" target="_player">Q3 to Kerry</a>,
  <a href="command:seek(11:56)" target="_player">Bush rebuttal</a>

</li>
</ol>

</font>

</window>

These links invoke the player's seek command to jump to the indicated times. In parallel, the <video src=..."/> tag1 sources the video into the top right pane for the same hour-and-a-half duration.

1 See how the src= attribute contains an rtsp: URL with a start attribute? Here's an interesting twist. It can also contain an HTTP URL pointing to a static .rm file. And in that case, also, you can append start/stop parameters. So while a URL like http://server/file.rm?start=0:10&end=1:00 is not meaningful in the browser (you'll just get the whole file, not the specified slice), it is meaningful to RealPlayer and will deliver the slice. The mechanism is the same as the one I demonstrated for MP3 files last month: RealPlayer uses the HTTP Range header to access partial content on a web server.

Also in parallel, wrapped in a <seq> tag and timed to match the segments accessed by the links in the left column, there's a sequence of annotations--in this case, Jim Lehrer's questions. Each of these is sourced from a RealText file that looks like this:

<window type="generic"
        height="300"
        width="240"/>

<p>
Q2 to Bush:  Do you believe the election of Senator Kerry on November the
2nd would increase the chances of the U.S. being hit by another 9/11-type
terrorist attack?
</p>

</window>

Let's look at the pros and cons of this approach. On the plus side, it achieves the desired effect. The video is divided into individually-accessible segments, and each segment is accompanied by explanatory text. This is a powerful way to consume video. There's also a nice blend of continuous play and interactivity. If you never touch the controls (links) in the left pane, the video will simply play through, but the annotations will change on cue so that each segment is appropriately labeled. At any point, though, you can click a link to access a question or rebuttal. Here too, the annotation will synch to the video and help contextualize it.

The main drawback is that media players are horrible text browsers. Even if RealText were supported in players other than Real's, it would be a bad bargain. The syntax is HTML-like but different from-- and vastly less powerful than--HTML. The layout capability is crude. And perhaps most troubling, the links are held captive. If you're writing a blog item and want to link to Bush's second rebuttal, my encapsulation of that segment isn't directly available to you.

Browser as Container

Alternatively, we can run the show from the browser, using a set of parameterized links like so:

Intro

Q1 to Kerry: Do you believe you could do a better job than President Bush in preventing another 9/11-type terrorist attack on the United States? Bush rebuttal

Q2 to Bush: Do you believe the election of Senator Kerry on November the 2nd would increase the chances of the U.S. being hit by another 9/11-type terrorist attack? Kerry rebuttal

Q3 to Kerry: "Colossal misjudgments." What colossal misjudgments, in your opinion, has President Bush made in these areas? Bush rebuttal

This method accomplishes the same goal: to segment and annotate the video. It's strong where the other method is weak. You get to exploit the familiarity of HTML, the power of the browser's layout engine, and the universality of HTTP URLs. But, it's also weak where the other method is strong. The text doesn't work as closely with the video. There's little contextualization and no synchronization.

We could argue about the merits of both approaches--and plenty of people have--but instead, I'll step back and make three assertions:

  1. Both methods can be effective.

  2. But neither is widely practiced.

  3. Because it takes too much work.

It's true that we desperately need better integration between media players and browsers. It's also true that we need ways to smooth out the differences between video formats and delivery mechanisms (i.e., streaming versus downloading). But in order to empower regular folks to weave hypertext together with hypermedia in routine conversation (for example, on blogs), we're going to have to solve a much more basic problem. The popular media players are built for an audience of consumers, not producers. They assume that you'll watch and listen, perhaps scanning backward and forward. But if you want to republish and contextualize, it's insanely hard. Before the nasty complexity of video formats and MIME types and segmentation syntaxes can even become relevant, you first have to be able to select your segments, and the players afford no reasonable ways to do that.

What we need are the kinds of features found in video editors: jump to a frame, move forward or backward a frame at a time, and select a range of frames. Substitute the word "character" for "frame" in the previous sentence, and imagine how bizarre it would be for a text player (i.e., browser) to lack such features. Yet, that's just what using a media player is like. If we want to empower people to create finely granular and richly contextualized AV experiences, it's got to change.



Read more Primetime Hypermedia columns.

Copyright © 2009 O'Reilly Media, Inc.