So, it’s been just over a week since my BoF on Java Media. But let’s start with some context about Desktop Java as a whole…

The big news out of JavaOne is JavaFX, a seemingly radical desktop initiative that has drawn cautious praise and scathing derision. Now that we’ve had a week to make sense of it, we’re seeing more sensible analyses, as well as first steps towards understanding and using it.

My first inclination is to be delighted with JavaFX, first and foremost because it suggests that Sun has correctly surmised the true nature of Apollo and Flex — nuclear-tipped missles lobbed towards the gray-and-purple parts of Santa Clara — and have bravely decided to take the fight back to Adobe, before the growing Flash ubiquity can finish off Desktop Java and start selling Flex servers to EE shops. Now let’s see if they really have the wherewithal to stay in the fight for more than six months, or to do more than issue press releases about how great their stuff is.

But think about this: when did Flash become ubiquitous, and when did we all stop hating it? Just two years ago, Flash was largely known for being the technology behind all the goddamned annoying interactive banner ads: Club the baby panda and win a PlayStation 3!!! There were good uses of the technology of course, but it certainly feels more respectable today than it did then. And that’s funny, if you think about it, because the rise of Flash’s acceptance as a rich client-side technology runs concurrent with the rise of Ajax — if Flash was loathed then, and faced a challenge from the hip and trendy Ajax, then what made it succeed?

Oh that’s right, people like media

Obviously, video happened. More specifically, YouTube happened. It legitimized Flash and made it palatable for many other kinds of applications, even those that don’t use multimedia.

As Bryan Young points out in JavaFX in Perspective, JavaFX isn’t strictly a competitor to Flash, because it doesn’t do the same things, multimedia being one of the most egregious examples: “multimedia support has always been a weakness in Java. JavaFX is the neon sign pointing out one of Sun’s often ignored weaknesses.”

A Flash presentation isn’t that much different than a Java applet, except that:

  1. It uses a different runtime
  2. It loads a lot faster
  3. It has rich multimedia support
  4. People like it

I’m going to take 1 as irrelevant to our discussion — if anything, Java bytecode should be faster than interpreted ActionScript — and 2 is getting genuine attention and help. Assuming that 4 is the goal state, where does that leave number 3, multimedia support, in the Java world?

That’s kind of my old warhorse, of course, as you might remember my four-part three-act bloggage on the matter (parts 1, 2, 3, and followup). So I went to JavaOne particularly interested in the state of multimedia support for the platform, in the desktop and set-top configurations. In fact, the BoF proposal that I sent in, thinking it was far too hostile and ambitious to get approved, got approved.

Spoiler Alert: the fact that an outside critic like me was speaking on multimedia and not, you know, anybody from Sun, is a pretty clear hint that Java media is not going anywhere anytime soon.

BOF-0904

So, on Wednesday night, I offered up 35 minutes of my crazy ideas in a BoF called “Java SE Media: Take 2″. I’ve uploaded my slides to java.net ( PDF, PPT), and re-recorded the audio and made a movie of the slides, compressing it all into an MPEG-4 (H.264 video, tested with QuickTime Player and VLC). The funny thing is, I don’t remember speaking to some of these slides; maybe it was the heat of the moment, or maybe my clicker slipped. But here, in summary, is the gist of what I presented:

Media support in Java is dire; everybody knows it, and a surprising number of people want it fixed. But what’s the point? There’s no value throwing money into API’s whose purpose is for current Java developers to “screw around” with them; we need to leverage Java’s strengths to the degree that people who are developing media applications will choose Java because they perceive it to be the best option. So, is there an argument for doing media apps in Java, if the libraries can be brought up to snuff? I offered three:

  1. Java is well suited to a certain level of complexity, greater than that of simple players, but not as sophisticated as high-end tools (Final Cut Pro, Avid, etc.). There are a lot of apps — screencast makers, podcast producers, capture-and-go uploaders, transcoders — for which the scripting languages and their API’s are inadequate, but for which the giant API’s (QuickTime, etc.) and the star-C-star languages are more hassle and power than we need.

  2. If we see Web 2.0 as “the read/write web”, then it is inevitable that we will want to embed media creation and manipulation features in web pages. Web 2.0 currently facilitates user-contributed data and text, and higher-bandwidth data like audio and video is inevitable. Several sites, like Elf Yourself and Odeo are already starting to do this with Flash. Java, being cross-platform and web-embeddable, could have much to offer in this space.

  3. An emerging Flash hegemony in web multimedia is counter to the interests of many parties, including backers of other formats (Microsoft, Apple, the MPEG licensing authority), and possibly Google, whose need to amass and catalog information is not served by a ubiquitous, proprietary format.

If we buy that there is a good reason for Java to get into this space, then there’s the question of what a multimedia library should provide, and how to get there. Going back to the “read/write web” point, I reiterated that I think any such library must support capture, editing, and sample-level access. Without capture, you don’t get podcast creation, video chat, capture-based games (ala Karaoke Revolution or the PS2’s EyeToy games), or barcode reading. Without editing or effects, there’s almost nothing practical you can do with captured data. And without sample level access, I’ll still get a question every week on the java.net forums about making movies from a series of images, usually a JOGL animation. Writeability, to me, is the most important thing to get right; as we’ll see later, almost nobody agrees with me on this point.

There are two approaches for developing such a library: write it all in Java, or wrap some native library. Each has much to recommend against it. For the Java library, you’re talking about a massive architecture, design, and coding effort, outside the core competencies of pretty much every major player in the Java world. There’s also a problem in that there is no cross-platform standard (ala TWAIN) for media capture devices. They are tightly coupled to the native media frameworks for which their manufacturers write drivers, and are paperweights on unsupported platforms. The USB standards for media devices (like the “USB video class” that some webcams implement) help somewhat, but nobody wants to depend on a single I/O bus, and thereby write off FireWire, PCI capture cards, etc. So an all-Java approach might entail launching a new JSR to create a Java spec for capture devices, that any Java media library could then use. Theoretically, with JSR-80 (USB) support, vendors could ship all-Java drivers for USB capture devices. That said, this is a hard road to hoe.

The alternative is to put Java wrappers around native frameworks. The problem here is that the dominant library is different for every platform, so you have to write one binding for DirectShow on Windows (or whatever Microsoft’s media library is called this year), another for QuickTime on Mac, something else for Linux, etc. But at the Java Posse Roundup, Dick Wall made an interesting counter-proposal: build a Java wrapper around GStreamer, the popular open-source multimedia framework that is popular on Linux and has been ported to Windows and Mac. As Dick argues, this is not nearly as difficult as starting over on an all-Java solution, works on the major desktop platforms, gets Java back in the game faster.

Hey, wait a minute, you suck!

I left things open in my conclusions, and intentionally left a lot of time for Q&A, to get different people in front of the mic to offer their opinions, challenge my arguments, etc. Fortunately, I got exactly what I asked for, because the audience’s questions proved that they cared deeply about this issue, and had considered my presentation carefully.

They also thought that the idea of supporting capture and editing was batshit crazy, and by an overwhelming show of hands when I put it up to a poll, thought that just offering playback was plenty good for a Java media library. I’ve also heard through the back-channel that while higher-ups at Sun want to get back into media, they generally believe that playback-only functionality is sufficient.

But I am not giving up on this. I fully believe that creative multimedia capabilities are not only important, they are the only thing that can get Java back into this race. A “me too” approach to media support is going to accomplish absolutely nothing — app developers will look at Java and say “oh, so it doesn’t do as much as Flash, isn’t as well distributed, takes longer to start up, and is harder to maintain. Um, thanks anyways.”

A playback-only strategy isn’t just “me too”. It’s “me too, but not nearly as good,” and is doomed to failure.

OK, seriously, who would need this?

I’ve tried to show by theory that read/write media access inside webapps is an inevitable evolution of the Web 2.0 trend. But I don’t think that most people buy that. So let me give you an example of a practical application (my second in this blog, if you’re counting), that could exploit this kind of functionality in a unique way not possible with Ajax, Flash, or even native apps.

Consider the humble interview podcast. If the participants are not co-located, there are two options for mixing the sound, neither pleasant. Either the host uses an audio grabber like Audio Hijack Pro to capture the other participants over Skype (at typical low Skype voice-chat quality), or all the participants need to record themselves, then have the host collect these recordings and edit them together with a multi-track editing application. When the host is done, he or she needs to compress the mix, upload it to a server, and publish it. It is a tricky process that requires using many unrelated applications and services, and isn’t practical for the general public.

Here’s an alternative: we have a website with a Java-based chat applet (perhaps using the Mobicents guys’ SIP support). This is already a compelling application, because it gets us multi-party chat without downloading and installing a native app, meaning it can be used from public terminals, unprivileged logins, etc. But we’re just getting started. let’s imagine this chat is destined to live forever as a podcast. The chat GUI, if used by our friends at the Java Posse, might look something a little like this:

relena-mockup.png

The applet records each participant locally to get high sound quality, and at the end of the chat, it uploads the file to the server, which knows exactly where to put it, and offers the host some editing functionality to remove dead space, adjust volumes, cut profanity, etc. The mix and MP3 transcode can happen on the server side, and without ever leaving the page, the host can bang in some show notes and send out the podcast.

We’ve just replaced at least five native programs, some of which cost cash money. For all the hassle we’ve eliminated, we can offer this as a service aimed at podcasters, businesses, families, social groups, etc. Would you pay $15/month to combine Skype (conferencing), Audio Hijack Pro (capture), Soundtrack (audio editing), iTunes (tagging and compression), sftp (uploading), and Libsyn (hosting)? This makes podcast production available to a much wider audience, those that couldn’t afford or learn all the currently necessary tools. Oh, and did I mention that it run on every platform? I bet I could find a lot of takers for this.

Death or glory

In fact, this reminds me of an idea I once had in this space. Instead of arguing and speculating about what Java media developers need, why not just take it directly to them? My idea went something like this: take all the money you’d spend on developers for a media library and instead skip directly to the applications, figuring that they’ll write their own libraries, and in so doing include the functionality that really matters. The rules for my scheme work like this:

  1. We (Sun, Google, some new foundation, whatever) fund your team
  2. You develop at least one Java Media application or service
  3. You open-source the libraries you create
  4. You’re allowed to collaborate with other teams
  5. We cut off the money after 18 months, so whatever you do had better be able to pay your salary by then

We’re dying for some entrepreneurial spirit in the Java world right now, and maybe us media types are just desperate/crazy/motivated enough to provide it. The fuddy-duddy Old Java Developer crowd can keep their zero-risk lifestyle of hiding behind requirements documents and never dealing with end-users; I’ll take the “death or glory” scenario any day of the week.

Seriously, though, with this kind of application-focused approach, we could cut through all the speculation and focus very directly on the questions of what about Java media has value and how do we realize it. The worst thing that could happen is that we’d end up with some semi-capable media libraries, which was the original idea anyways. In the best case, one or more of the apps/services succeeds and gives Java a big client-side boost.

One more crazy idea

One other idea I thought of on the plane on the way home. In the BoF, I presented two strategies for getting a capable Java media framework: writing it from scratch in Java, or wrapping some native library, probably GStreamer. But what about a radical third option: what about porting GStreamer to Java? We’d get all the advantages of a pure Java library, without having to re-architect and rewrite from scratch.

Obviously, this has a high potential to piss off a lot of people (potentially the GStreamer developers, whom we should like and admire), as it is effectively a big-ass one-time-only fork. But on the other hand, the benefits could be immense. We’d have a super-capable media library that wouldn’t be dependent on a native implementation. Any Java-GStreamer app could be provided as an applet or Web Start without having to install and maintain native libraries; the all-Java version would also be easier to maintain once the Java Module System comes together.

There’s also the matter of Java’s reach being somewhat broader than GStreamer’s. As phones seem to be moving towards beefier ME support, even CDC support in the case of the SavaJe, it’s entirely possible we’ll soon see a Java SE phone become popular. What if such a phone (or other consumer device) had SE, but didn’t offer a means of writing and installing native libraries? Wrapping a native library wouldn’t work in this case, and going all-Java would be the only practical approach.

Pause for effect

I had intended to cover some of the sessions and implications from the consumer-oriented JavaOne “TV Day” in this blog, but this idea dump is already long and dense, so I’ll defer that stuff to a later blog. A teaser: I’m less pissed at Blu-Ray than I was last year, but the prospects for developing BD-J apps and the platform’s viability and usefulness are kind of murky at the moment.

Anyways, surely I’ve said something about Java Media in this blog to either annoy you or prove to you that I’m a complete jackass, so do make use of the comment section below…