O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples


Audio Linkblogging

by Jon Udell
07/18/2005

In my first column on this topic, I showed how little-known features of the HTTP protocol, the MP3 file format, and certain media players conspire to enable random access to MP3 files hosted on standard web servers. And I described a rudimentary service that combines these features to enable you to quote from a web-hosted MP3 file by forming a URL that includes its web address, its duration, and the desired start and stop times.

Back then, in September 2004, podcasting was just starting to take off, and in fact the term "podcast" appears nowhere in that article. Today, in July 2005, the recent addition of podcatching support to iTunes is exposing millions of people to an explosion of new audio content, much of it in spoken-word form and open to fair use.

Over the same period of time, we've seen a surge of interest in social bookmarking services such as del.icio.us. I've been using it for all kinds of things, but what's relevant here is the tag del.icio.us/judell/soundbite, where I've been collecting interesting quotes from shows appearing (mostly) on ITConversations. The RSS feed for that tag, del.icio.us/rss/judell/soundbite, is, in effect, an audio linkblog, albeit one that's a hybrid of a textual wrapper with pointers to audio segments. But it is also implicitly a podcast, and I decided to make it one explicitly.

While I was at it, I decided to clean up the syntax used by the original version of my MP3 clipping service, which used this awkward pattern:


http://udell.infoworld.com:8002/?site=HOST&url=PATH&beg=HH:MM:SS&\
  end=HH:MM:SS&dur=HH:MM:MM

There were two nasty bits here. First, splitting the URL into site and path components was awkward. Second, you had get your player to tell you the duration of the file and plug that into the URL as well--even more awkward. So here's the new pattern:


http://udell.infoworld.com:8003/?url=URL&beg=MM:SS&end=MM:SS

To get rid of the duration parameter, I needed a way to find the duration of a remote MP3 file from a small sample of that file. I used a Python module called MP3Info for this purpose. It was last seen at http://shell.lab49.com/~vivake/python/MP3Info.py, but the site seems to have gone dark, so I wound up using the copy in Google's cache.

For future archaeologists, I made two slight modifications that are both evident in this diff:


< if ( self.version == 1 ):
< fudgeFactor = 1
<   else:
< fudgeFactor = 2
<
< self.length = int(round(( self.content_length / self.framelength) * 
<   (self.samplesperframe / self.samplerate))) / fudgeFactor
---
> self.length = int(round((self.filesize / self.framelength) * 
>   (self.samplesperframe / self.samplerate)))

HTTP: The Definitive Guide

Related Reading

HTTP: The Definitive Guide
By David Gourley, Brian Totty

The module expects a whole MP3 file, but I'm only giving it a short (6K) extract from the middle of the file. So I substitute the full length of the original file, as reported by the remote web server, for the length of the clip reported by the file system.

The fudge factor is just a dreadful hack that makes the reported duration right for the files I've tried so far. There's undoubtedly a more clueful way to do this, and I hope someone will show it to me.

These changes weren't strictly necessary. My audio linkblog could have used the original uglier URL syntax as well. But the simpler, the better.

The first incarnation of the audio linkblog was straightforward. I simply transformed the del.icio.us RSS feed for my sound-bite tag into an RSS 2.0 feed whose enclosures were the referenced clipping URLs. In order to supply the length attribute required by the enclosure tag, I made an HTTP HEAD request to the origin server where the MP3 file was hosted.

Unfortunately, neither iPodder nor the then newly released iTunes 4.9 was willing to work with the clipping URLs contained in those enclosure tags. iPodder would download the resources but couldn't figure out what to name them; iTunes wouldn't download them at all.

So I resorted to another dreadful hack, and appended a bogus &ext=.mp3 parameter to the URLs. That worked, at least for iTunes. After converting the clips in my sound-bite feed into a playlist, I was able to listen to them in iTunes and on my iPod.

It was a weird experience, though. First Peter Yared would speak for a couple of minutes, then Doug Engelbart, then Kim Polese, and if they hadn't been my clips I wouldn't have known who was speaking or in what context. In theory I could record audio introductions to each clip. But that wouldn't be a sustainable procedure even for me, never mind a less motivated person. Forming the clip URL and bookmarking it to del.icio.us is already asking more effort than most people will be willing to give.

Then I realized that I already had contextual metadata, in the form of the del.icio.us title, extended description, and tags. Why not convert that metadata to audio and use it to introduce the clips?

I tried two different text-to-speech (TTS) solutions. First, on Win32, I used pyTTS, which wraps the Microsoft speech API for Python. The prerequisites--Microsoft's SAPI 5.1 redistributable kit, and Mark Hammond's win32all extensions--were a breeze to install, and the TTS module couldn't be easier to use. Here's all you need, for example, to convert some metadata into a WAV file:


text = '%s, %s, Tags, %s' % ( title, descr, tagset )
tts.SpeakToWave( 'tmp.wav', text )

The ever-popular lame can then convert the WAV file to MP3.

I wasn't wild about the TTS results, though, even after trying the extra voices available for the engine. Then I remembered AT&T's online TTS demo, so I tried that too. It's not packaged as a web service, but it was straightforward to issue a request, receive a reference to a WAV file, and then retrieve that file. And AT&T's Audrey sounded a bit better than Microsoft's MSMary!

If this turns into more than an experiment, I'll look into licensing TTS software. Chris Brooks, who runs talkr.com, a website that converts blogs into podcasts, uses NeoSpeech and recommends it highly.

With these ingredients, I was able to create a sound-bite podcast. So far, I've only gotten it to work in iTunes, and even there only imperfectly. Because the introductions and clips are separate enclosures, they'll play out of order if shuffling is turned on in the player. Clearly, they should be combined, though I'm not yet sure what the right way to do that will be. There's also an iTunes issue. iPodder automatically creates playlists from podcasts, but iTunes doesn't, which creates an extra (and to me) annoying step in the process.

Warts notwithstanding, this exercise has given me a sense of the possibilities that will open up once we can link to discrete segments of audio, subscribe to them, and recombine them. In the realm of text these capabilities have radically improved our ability to assimilate and share information. In the audio realm the need is even greater, because it takes so much longer to listen than to read. In tandem with the social bookmarking and tagging services, the blogosphere stands ready to process all the new audio content that's coming online. Everything we need for efficient classification and recommendation is in place, with the exception of things we take for granted in the textual realm: selection, quotation, and linking. All these things are doable, and I hope someday to take them for granted in the audio and video domains too.

Jon Udell is an author, information architect, software developer, and new media innovator.


Return to the O'Reilly Network

Copyright © 2009 O'Reilly Media, Inc.