We've expanded our news coverage and improved our search! Visit news.oreilly.com for the latest or search for all things across O'Reilly!
advertisement

Search


Sponsored Developer Resources

Atom 1.0 Feed RSS 1.0 Feed RSS 2.0 Feed


Webloggers
Login
Home




Multimedia, or Why Text Rocks

   Print.Print
Email.Email weblog link
Discuss.Discuss
Blog this.Blog this

Nathan Torkington
Oct. 26, 2004 02:21 PM
Permalink

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.

When I was but a young boy, learning my way in the world, I had a good teacher. Tom talks about the Unix toolkit with near-religious reverence, and is a big believer in doing the simplest thing possible that works. One of the keys to that is using plain text where you can--don't use binary formats unless you have to. Some days it seems my life will be nothing more than relearning the lessons that Tom taught me. Take my media collection as an example ...

I keep all my email. A year's email is 400-500M these days, what with all the humongous book attachments I send around. I've got five or six years archived, with duplicates I've been too weak to merge. End result is about 2G of data that I can grep.

Like everyone else on this planet, I've replaced a bulky CD collection with a bulky file server that hides in a closet. One hour of MP3 is 60-100M depending on the bitrate you use. Now I have mp3s of all my music, and all the associated metadata issues. I found id3lib's id3convert utility useful in turning the ID3 tags that RealJukebox put in the files in 1999 into something iTunes can play in 2004, but I still have a lot of crap data (Bela Fleck vs Béla Fleck, some untagged directories, etc.). Because I end up copying files from the basement server to my Mac to play them, I have to consciously copy them back if I update the tagging in iTunes. Managing 66G of MP3s (about 850 hours) with scp and Apache::MP3 isn't easy.

When my son William was born, we bought a digital camera and started taking pictures. We only took breaks when we lost or broke our cameras (memo: if you know where my wife left the latest one, please tell me!). We've never been resolution fiends, so our original 2MP camera served us well and our latest was 3MP. We have nearly 11,000 pictures that take up about 6G. When you include resized and thumbnails that Gallery wants to add, it's 7.5G. The stock metadata on cameras is pretty crap, though it's slowly getting better. The real interesting stuff, though ("what's this a picture of?") still has to be provided manually and kept in the filename.

Now I have a digital video camera. I have over 60 tapes of my kids and various conferences that I could concievably put online. An hour of raw video imported into iMovie is 12G. Streaming and copying 12G of video and audio is unfeasible, so I'd probably want to compress it. DVD compression fits 2 hours into 4.3G. I've seen good results on Hollywood movies being distributed on BitTorrent, encoded as 3ivx, divx, mpeg4, and VCD. For example, a perfectly watchable 2 VCD Bourne Supremacy runs for 108m and takes 1.5G. A Spanish language version of Scooby Doo 2 is only 750M as a DivX AVI. My 60 hours of video could take anywhere from 45-720G. If I had as many hours of video as I do of audio, I'd need between 650G and 10T of disk.

Video has other issues. To encode the video takes time, varying from a few minutes (with no compression) to longer than the playback time depending on the codec and the speed of my machine. I can only import from my video camera in real time, which means it takes an hour to import each hour of video. I can't get around that with a faster machine. And, finally, the state of metadata for movies is crap. Quicktime has some metadata, AVI may but I haven't seen any in the field, and nothing compares to the richness of metadata in audio. Good luck describing everything that happens in an hour long movie in a filename.

Why care about metadata? For search. If I want to find the email paging me home because my grandpa died, I can do it: mmap 'print if /\bted.*died\b/i' email/*. If I want to find the MP3 of Bela Fleck playing Eager and Anxious, iTunes makes that easy and even compensates for my lack of consistency with e-acute. If I want to find a photo of my son and his grandfather, it's harder--is the grandfather "Barry", "Grandpa", or even "Gwumpa"? Is William "William", "the boy", or even "Wawoh"? Descriptions and filenames aren't as rigorously-encoded as ID3 tags in my house, and that's saying something. For video, it's even worse.

All this was triggered by a great set of numbers I found in Jim Gray's address on winning the Turing Award. I clipped it for your pleasure:

Text gives me more content, faster, than audio and video. Performance is what you lose in text, and I'm not suggesting that music and movies be reduced to scripts and scores. But for the technical content that we float around the web, it's the rare event that features performance. Mostly it's the text you care about. Maciej's Audioblogging Manifesto ruthlessly pounds this point home. If you lock up content in audio, video, or even pictures, you lose the ability to search, skim, select, and spread.

There are some promising signs, like searching audio by phonemes and building metadata into the upload process. Jon Udell has been looking into the select and spread problem. For the most part, though, recording a lecture or conversation locks up the information. I can skim text, I can auto-summarize text, I can search text, I can trivially copy and paste text. I can do none of these with audio and video.

What's worse is how immersive the environments are. I can leave a keynote running behind me, perking up whenever something interesting is said, but it's distracting. I tried listening to Jon Stewart on C-SPAN and it was impossible to keep in the background. If it's any good, you can't be programming or writing email. And once you're listening along, you are trapped in real-time: it'll take 30 minutes to listen to 30 minutes of audio. That's why audio works really well for the artificially confined: if you have a long commute, you can drive and listen to Jon Stewart at the same time. But if you're at a desk working, like most of us are for most of the day, the immersive nature of audio and video (TV is the worst--look how it dominates a room and inhibits conversation!) makes multitasking difficult.

The only solution I see is to extract the best metadata for the talk: the text of what was said. So consider this a call to arms. Text is what the web works best at. Transcribe your recordings of keynotes. Extract audio from video recordings of talks and offer it as a separate download. Make it easy for us to consume quickly.

On behalf of speed-readers and people without a commute everywhere, I thank you.

--Nat

Nathan Torkington is conference planner for the Open Source Convention, OSCON Europe, and other O'Reilly conferences. He was project manager for Perl 6, is on the board of The Perl Foundation, and is a frequent speaker on open source topics. He cowrote the bestselling Perl Cookbook.

Return to weblogs.oreilly.com.



Weblog authors are solely responsible for the content and accuracy of their weblogs, including opinions they express, and O'Reilly Media, Inc., disclaims any and all liabililty for that content, its accuracy, and opinions it may contain.

Creative Commons License This work is licensed under a Creative Commons License.



Sponsored by:



Weblog authors are solely responsible for the content and accuracy of their weblogs, including opinions they express, and O'Reilly Media, Inc. disclaims any and all liability for that content, its accuracy, and opinions it may contain.

For problems or assistance with this site, email