What does a modern Java media library needs to be? Part 2 of this series looks at the ideas and capabilities of the current crop, including Java Media Framework, QuickTime for Java, IBM’s MPEG-4 Toolkit, and more. Part 1 set up the premise that the read/write nature of Web 2.0 requires media libraries to offer creative capabilities and not just playback, and argued that Desktop Java needs to offer more ambitious functionality to avoid being rendered irrelevant by webapps, Ajax, and Flash. Part 3 will discuss designs for an ideal Java media library and try to figure out who’ll pay for it.

So, I spent a fair amount of time yesterday and today porting a C example of a QuickTime tween to a QuickTime for Java equivalent. I’d had trouble getting the tween tutorial working in Java, so thought that porting a complete working app was the best option, just to prove it works.

It’s a lot of work getting your head around some of these deeper, darker parts of QuickTime. Not only are they harder to find documentation and sample code for, but they often lack convenient all-in-one API calls, requiring you instead to build up structures of QuickTime “atoms” by yourself. If you’ve read QuickTime for Java: A Developer’s Notebook, you might recognize some of this AtomContainer.insertChild() and Track.setInputMap() stuff from the the material on video effects in chapter 9. It’s hard to understand some of the jargon and the abstract concepts behind it.

But it’s a hell of a lot better than doing without!

Comparing the major Java media frameworks available today — and acknowledging up front that none are suitable to address the ambitions I posed in part I — there’s a striking difference between QTJ and the usual default, Sun’s Java Media Framework. I often find it difficult to express how profound the difference is, especially since the casual reader would see little difference in simple examples, except that QTJ tends to require more imports and is less approachable (consider the sample code I wrote for the Wikipedia entries on QTJ and JMF).

So let me try to explain this difference by way of analogy. Imagine you have grown up in some hopelessly backwards, soul-crushing environment. Indiana, perhaps. For your entire young life, the only music you have been exposed to are the various installments of the Kidz Bop series of CD’s. Then, through an event of remarkable fortune and providence, you come across the complete recorded works of The Beatles, The Rolling Stones, The Kinks, and The Who.

This is how I felt when I started to really get QuickTime.

Part of it for me is coming from a television production background. Looking at the API’s, I have a wonderful sense that the original designers, and their successors, understand the needs of media application developers. It’s not just the API’s, but also crucial abstractions that are essential to getting anywhere in this problem domain.

I’ll give you another example. One thing I do to keep my mind in the production game, and to indulge my love of anime, is a podcast called The Annotated Alchemist, which is all about the anime series Fullmetal Alchemist. I did the first episode in Final Cut Express, the next 15 or so in Garage Band, and recently switched to Soundtrack. One thing I’m doing is adding a music or ambient “bed” under the longer stretches of my talking, since just the sound of my voice must get pretty grating after a while, and there’s only so much I can do with TV excerpts and segment stings to break it up. Here’s a look at a typical episode (click for full-size):

java-media-2-soundtrack-aa22.png

The thing to notice in the timeline is how I mix multiple tracks for the show — ANNC is my vocal track, “TV Audio” is AIFF audio captured from DVD, the two tracks of music are for the music or ambient beds (I use two tracks in case I want to cross fade, like if I’m going from serious to funny), effects are mostly stingers (like a telephone ring to kick off the feedback segment, etc.). If you look at the envelopes, you can see how I use them to be very specific in how I duck audio in and out and cross-fade things. I want this level of manual control because sometimes with the TV show audio I use, I need to get out quickly from a line of dialogue (so, for example, I don’t pick up a line that starts right after it), and other times I’ll let it linger under my own vocal track if there’s good NATSOT (”natural sound on tape”) or if it has its own good music bed.

Given how QuickTime works, I can easily tell you how this could be done in QuickTime: a QuickTime movie supports an infinite number of tracks, in this case they would all be audio, the audio levels (the green sections) could be imaged by getting to the compressed sample data, decompressing it to scratch (presumably in a thread) and then running an averaging routine appropriate to the zoom level, the envelopes are done with volume tweens, whereby the volume is interpolated between two values over a period of two moments in time (these tweens are represented by the points and the lines between them in the purple envelope views). The individual sound files can be all over the disk or the network and QuickTime will find them, because it works with a system of media references, which by default free the developer from caring where any of the actual media resides.

Oh, and as a bonus, all the edits are non-destructive. If I discover one of my edits has cut off the beginning or end of a piece of audio, I can just grab the edge of the region and drag it out. That’s because media references are really just pointers to media, and it’s a simple matter to change a pointer to indicate that you want to use more (or less) of some source.

So that’s QuickTime and equivalent media libraries (Soundtrack may well be written with Core Audio, to be honest, but the same concepts apply). And most of QuickTime is exposed to QuickTime for Java, so if anyone’s up for writing this kind of podcast editor in QTJ, e-mail me. I also think that within five years, we’re going to be doing this kind of thing, collaboratively, on the web (Odeo, which lets you record and distribute podcasts in a web interface, is already part way there).

But if you wanted to do this with JMF… I honestly don’t think it’s possible. All the abstractions I talked about — pointer-like media references, tweens, arrangement of tracks — they don’t exist in JMF. In fact, JMF doesn’t have a useful abstraction of media in a stopped state. To JMF, media is something you play, and optionally “process” (e.g., transcode to another format), and that’s about it. The library offers no deep knowledge of media, and no support for the kinds of tasks needed to create and distribute media.

“Rip, mix, burn” is a great metaphor for what people want to do with their media in the Web 2.0 era. And when Apple says “rip, mix, burn”, what you should hear in the abstract is “capture, edit, export”. JMF’s biggest problem is that it can barely do the first (and pretty much only on Windows), and can’t do the second at all.

Many people will tell you that JMF’s key failing is its poor support for codecs in real-world use. Granted, its list of supported formats was mediocre in 1999 and is a complete joke today. But I don’t think that throwing a bunch of codecs at JMF will fix it. Consider: in Part I, I discussed how Flash has come to dominate cross-platform media playback on the web. Well, Flash supports a hell of a lot fewer media formats than JMF does! Flash can’t be used as a general purpose media player, because what it supports is a grab-bag of semi-proprietary formats, such as FLV (an H.263 derivative), ON2’s VP6, and Sorenson Spark (another H.263 derivative). Yet it is these formats, wrapped by Flash, that are taking over the web.

And here’s a little irony to consider: Flash is taking over web video with H.263 knock-offs… and JMF offered all-Java H.263 playback in 1999! So, dot-com crash notwithstanding, why wasn’t YouTube delivered as an applet seven years ago?! I’d submit to you that, once again, distribution problems are the deal-killer for client-side Java.

So JMF’s problems aren’t really about its poor media support (it doesn’t help, mind you, but that’s simply not the worst of its problems in my mind). Its most obvious problem is its abandonware status, with an API that has not been further developed since late 1999, only minor bug-fix maintenance since then, and even a web page that has not been updated in over two years. Nobody who needs to do significant media work can take JMF or Sun seriously at this point, given the pitiful state JMF is in.

But even with a huge influx of attention, I don’t think JMF would succeed. I think JMF suffers badly from what I’ll call “concretism”, a lack of abstract thinking in its designs. JMF sees the media world only in the form of fully-produced, ready-to-play media files and streams. It can’t tolerate the states that media would be in during the production process — references to many external files, the use effects and tween tracks, extraneous data (capture or other source data of which the user only needs some parts), etc.

And, since my premise in Part I is that the era of the read/write web will require media libraries that are optimized for creative tasks, JMF is completely useless for the kinds of media tasks we’ll want to perform over the next few years.

As much as I’ve praised QTJ in this article (and having written a book on the topic), you might think I’m going to suggest it is the media library of the future. It is most certainly not. While I enjoy the deep power of QTJ, I’d be blind not to see its limitations. First and most importantly, it only runs on Mac and Windows, and since it is a wrapper around the propreitary, native QuickTime library, it can never exist on any other platform until and unless Apple ports QuickTime to said platform (which isn’t going to happen).

There’s also the matter of Apple’s disinterest in QTJ, which comes from its curious origins. QTJ was pulled together in the late 90’s, when it briefly looked like desktop developers were going to bolt en masse to Java. Wanting to protect one of its crown jewels, Apple intended QTJ to be used by existing QuickTime developers who were moving to Java. Apple apparently never intended to deal with the opposite migration — Java developers adopting QuickTime as a media library. This is one reason it’s so hard to get into QTJ: the docs are all written for someone who knows and has already used QuickTime’s various abstratcions in C. It doesn’t help that there’s a lot of irrelevant legacy stuff in QTJ’s API’s that gets in the way of finding the “good stuff”. At any rate, the desktop migration to Java stalled for a number of reasons (performance and appearance challenges, high-profile failures like the ports of Corel Office and Netscape Navigator to Java, etc.), and Apple’s original strategic rationale for QTJ faded with time.

Apple gives QTJ very little attention these days, though you can still get support with ADC incidents, they still fix bad bugs, and it still picks up functionality from the ongoing development of the underlying QuickTime platform. Still, QTJ depends heavily on deprecated API’s such as the Sound Manager and QuickDraw, and with no obvious effort underway to remove those dependencies in QTJ, the writing is pretty much on the wall for this library: Apple has destined QTJ for obsolescence over the next few years. If anything, Apple seems to want to get developers off the straight-C QuickTime API and move them over to QTKit, the Cocoa-friendly Objective-C API for making QuickTime calls. QTKit is small now, but Apple’s evangelism can only mean that it will become far more prominent in Leopard and beyond. One problem, of course, is that while the straight-C API could be ported between Mac and Windows with a little work, QTKit is Mac-only (there’s also a QuickTime COM that I know little about, but which has some of the same goals as QTKit, offering an object-oriented view of QuickTime to C++ developers on Windows). Either way, there’s not much prospect for a Java wrapper approach around QuickTime anymore, there’ll be less in the future, and it still only works on two platforms anyways.

Let me note a few other Java media libraries in passing. The IBM Toolkit for MPEG-4 is interesting because it provides MPEG-4 audio and video playback in an all-Java form with remarkably good performance. I can play MPEG-4 home movies with the toolkit and, looking at the Activity Monitor, the hit on the CPU is pretty modest, as seen here on a two-year-old dual 1.8 G5 Power Mac, using less than 30% of the CPU bandwidth:

java-media-2-ibm-mpeg4-cpu.png

It’s a given in many circles that decoding and playing compressed media in Java would be a performance nightmare that would peg the CPU at 100% and drop frames. The IBM toolkit proves that’s simply not true. Whether it’s Moore’s Law speeding up CPU’s or better JVM’s speeding up Java performance, there’s really no need to be afraid of an all-Java media engine anymore.

IBM’s toolkit is also interesting because while it doesn’t offer an API for producing MPEG-4 content (it can transcode into MPEG-4, but doesn’t have the various creative abilities of a QuickTime or similar), it supports more of the MPEG-4 spec than you usually see, including XMT-Ω, a SMIL-like markup for describing the layout of multiple elements of an MPEG-4 scene (most implementations, like QuickTime, only support a single video rectangle covering the entire window).

Aside from the its non-support of capture and editing, the other big downside of IBM’s toolkit is that it’s a proprietary technology with a fairly expensive license, particularly considering that for a US$7,000 unlimited distribution license, “the technology is licensed as is, unsupported, and without any warranty”.

One other approach to consider in this roundup of Java media libraries is a number of projects to put Java wrappers around open-source media player libraries. Some of the best known include ffmpeg wrappers like Jffmpeg (Windows/Linux) and Fobs4JMF (Windows/Linux/Mac). There’s also an up-and-coming JMF replacement called FMJ, which aims to support more formats and codecs than JMF does. A new project in this space is JVLC (Windows/Linux), which seeks to put a Java wrapper around the play-anything VLC library.

These projects could quite reasonably be expected to achieve what JMF didn’t: provide playback support for a wide range of real-world media formats to Java applications. Obviously, my argument above is that playback alone isn’t nearly interesting or useful enough, and I think the native dependencies will be a severe limitation, but I don’t want to begrudge these projects for picking up the ball after Sun so egregiously dropped it. For some apps, playback of a specific format is enough, and these libraries may enable a number of applications to stick with Java where they might otherwise have had to go native.

Still, I think the future of Java media needs to be all-Java, and needs to support creative “-ilities”. In Part III, I’ll look at how to get there, and consider who’s going to pay for it.