[Editor's Note: Once again, we've invited Danger, Inc., sound designer Peter Drescher to predict the future of mobile audio. We think you'll agree that his latest prophecies are both tantalizing and frighteningly plausible. This essay is based on his October 2007 Audio Engineering Society presentation "Game Audio for Broadband Phones."]
I recently got an iPhone, in part because it reminds me so much of the Star Trek data PADD, a fictional technology consisting of a thin slab of glass and plastic, which is held in one hand and tapped on with the other (see Figure 1).
Fig. 1: The iPhone is practically a prototype of the Star Trek PADD.
The PADD (Personal Access Display Device) rendered any kind of information, in a variety of formats, via a subspace connection to the central computer. The interface was gestural, multi-touch, and self-configuring. Of course, there was no keyboard and, in fact, no typing was ever required. It's irrelevant to enter text by hand when there's perfectly accurate voice recognition and transcription services built in to the device.
While the iPhone may not be up to 24th-century standards, the technology is obviously heading in that direction, and I love it when science fiction invents reality. However, the laws of physics currently prevent even the most futuristic phones from generating loud, high-fidelity audio. Despite sophisticated techniques and materials, no cell phone speakers will ever be good enough for anything except producing annoying ringtones.
This is because sound production is all about pushing air, and the tiny bits of vibrating metal and plastic that pass for loudspeakers in mobile devices are incapable of pushing very much of it. Nor can they produce frequencies below about 200Hz, meaning no bass or kick drum in your music mix, nor engine rumble or explosive thud in your game soundtrack. Think of it this way: a speaker the size of a dime simply cannot produce a six-foot long wave.
Headphones are great when you want to block out the world and listen to music. But the problem with headphones is that they are exclusive, insulating, and antisocial. That's fine for many situations, but people also like to share their music with their friends. Sure, you can send them a link or put the tune on your MySpace page, but it's much more fun to play it for them in person and see their reaction. One of the reasons portable multimedia devices are so popular is because they provide a neat solution to a social problem that everybody has, namely: "how do I show off how cool I am?"
To paraphrase H.L. Mencken, nobody ever went broke overestimating the vanity of the American public, and broadband phones are the perfect attention-getters. Here, look at pictures of my dog, isn't he cute? Hey, listen to these grooves I downloaded from iTunes! C'mere, check out this Weird Al Yankovic video on YouTube! [Note the O’Reilly shoutout at 1:12. —Ed.]
But there is an obvious problem with sharing media like this, which is: You Can't Hear for Squat on cell phone speakers. To really grab your attention, the audio has to be loud enough to annoy the living crap out of the guy sitting next to you. Lucky for him, trying to produce really annoying volume levels using tiny speakers is usually an exercise in futility. Some might consider this a good thing for polite society, yet the desire to share our noise remains.
Fig. 2: Back to mono(nucleosis).
That's lovely and romantic if you want to get your head next to some hot girl, but it's not something you want to do with everyone. There's a television commercial featuring a big sweaty meathead jock, at the gym, pumping iron, endorphin grin on his face, a wild look in his eye, yelling at the camera, "Hey dude! Check out this groove I just downloaded, it totally rawks!" Then he pulls the sweaty earbuds out of his head and moves to stick them into yours — and I'm thinking, "Dude, I use my ears for a living. Get those disgusting things away from me!"
Sharing earbuds does not solve the problem of how to make a cell phone act more like a boombox. Some manufacturers try to make the phones louder by increasing the speaker size. Others install two speakers for stereo playback, though given the palm-width speaker separation, you don't get true stereo, you just get double loud. Some models apply "3D" audio processing to provide a (slightly) increased sense of presence and space.
But it's all ultimately pointless, because if you really want to listen to your music, movies, or game soundtracks, you're gonna have to use...
There are about 18 different kinds of these things. There are the "instantly twisted, horribly uncomfortable, proprietary plug" headsets that can be used only with specific models. I always figured this was the manufacturer's way of telling its customers: "Do not listen to this device using headphones." (See Figure 3.)
Fig. 3: Proprietary plug headsets are more trouble than they're worth.
Then there's the 3.5mm (1/8-inch) "miniplug mono headsets," like those dorky Plantronics things telemarketers wear. I remember the first time I saw one of these: I looked at the plug, saw three conductors, and thought, "Uh, stereo?" No, that's mono plus mic, but it looks just like a stereo miniplug.
Fig. 4: The mono headset connector looks exactly like a stereo mini-plug, but don't try using it for music playback.
Ironically, some music phones are equipped with 2.5mm minijacks, so you have to use an adapter if you want to listen to them using regular headphones, like those in Figure 5.
Fig. 5: The standard Walkman-style, 3.5mm stereo miniphone plug is found on everything from iPod headphones to high-end studio cans, but implemented inconsistently on cell phones.
Further, the plug on the 2.5mm-to-3.5mm RadioShack adapter in Figure 6 looks exactly like a normal headphone plug — three conductors — but it doesn't work, because you need four conductors, for left, right, ground, and mic. The gray adapter in Figure 6, sold by T-Mobile, will route the audio correctly for a T-Mobile Sidekick, but it may not work on other cell phones, because there's no standard for this stuff.
Fig. 6: Adapters are ultimately pointless for mobile devices.
The iPhone, for example, uses a four-conductor jack that is compatible with standard, three-conductor headphone plugs (you use the mic on the iPhone itself), but the jack housing is so recessed into the case that standard plugs don't fit in without surgery.
There are adapters to solve that, too, but it doesn't really matter, because adapters are so confusing and aggravating for customers, nobody uses them anyway. Show me a phone without a standard headphone jack on it, and I'll show you a phone nobody's using for listening to music.
But there's a basic problem with all of these units: the damn wires! They get tangled, they get unplugged, they pull on your ears, they limit your movement, and they're just a huge pain in the neck, not to mention completely anachronistic. (See Figure 7.) I mean, really, what's the point of having a futuristic wireless device if you have to plug it in to hear the frackin' thing?
Fig. 7: Another good argument for going wireless.
That's why Bluetooth was invented. It's a short-range radio network for mobile devices that works exactly like invisible wires. You pair one device to another, then transfer data back and forth as if the two devices were connected by cable...except there's no cable (and no tangled wires).
Bluetooth devices correspond almost exactly to their wired counterparts. There's the standard mono Bluetooth headset: you see them in people's ears everywhere (see Figure 8). Despite Apple's attempt to make them more stylish and comfortable, they still have a dorky reputation and remain the wireless equivalent of the telemarketer's headset. Still, they're fine if all you ever do is talk on the phone.
Fig. 8: Monophonic Bluetooth headsets.
But for music, you gotta have stereo, and for this you can get Bluetooth headphones. These are more for dancing around your living room than walking around outside, and are for some reason designed to be as uncomfortable and oddly shaped as possible.
Maybe manufacturers don't want you to wear them too long, because the battery life sucks. Plus there's no mic, so they don't even accept phone audio. If you're listening to music on your music phone and the phone rings, you need to take the headphones off to answer the phone. This seems fairly pointless.
Fig. 9: The Jabra 8010 is a stereo Bluetooth headset with mic.
Obviously, the answer is a Bluetooth stereo headset with mic (see Figure 9). This way you can listen to your music, play your kick-ass game, and still answer the phone when it rings. This configuration is becoming more popular, but it's still a fairly new technology and the physical designs are evolving rapidly.
Personally, I like the two-piece concept, but it makes me wonder how long it will be before miniaturization simply turns them into earbuds. In fact, since Bluetooth is all about eliminating wires, how about a headset that consists of a wireless left earbud, plus a wireless right earbud, plus a wireless "mic and control" unit (possibly worn as a pin on your left shoulder? Now we really are talking about Star Trek technology! :)
Until recently, talking on the phone was, without exception, a monaural experience. Even now, I almost always pull out one earbud out when I'm on a call. But the case of "listening to music, then the phone rings" is so common you quickly get used to the schizophrenic feeling of the voice in your head. In fact, it can even make you feel more connected to your caller, and facilitate communications in high-noise environments, like, say, every street-corner call you've ever made.
Stereo headphones create an audio barrier around your head. The world goes silent (or at least gets a lot quieter), and you navigate through the environment with your own soundtrack. But with stereo headsets, people who have your phone number can now pierce that barrier and join you inside it (and in the exact center of it). If your caller is also wearing a stereo headset, it's as if your bubbles are connected, like a yin-yang. You're inside of their head, and they're inside of yours (see Figure 10).
Fig. 10: Stereo headphone conversations put you and your caller inside a yin-yang bubble of communication.
Which makes me wonder: how long will it be before voice data is transferred at the same rate as everything else? If I can stream high-resolution video to my cell phone, then surely, eventually, scratchy, noisy, band-limited phone calls will be a thing of the past.
There's another good reason why high-definition voice data via broadband connection is a Really Good Idea™ — conference calls. Right now, when you're on a conference call, you get multiple streams of crappy audio, all mixed together crappily by the phone network. In a mobile broadband world, you could receive multiple streams of conferenced calls and position them in the stereo field for increased intelligibility. If you wanted to get really fancy, you could use 3D audio processing to put the boss at the front of the room and your colleagues on either side.
Currently, the technology doesn't work that way. There's only ever a single pairing between Bluetooth devices, for obvious security reasons. But the range of these things is only a few feet, so sharing audio streams would be an up-close-and-personal experience anyway. All I'm really talking about is connecting an additional virtual cable to my phone, the equivalent of using a wireless Y-jack.
So now I'm wearing headphones, and you're wearing headphones, and we can both hear the music...but not each other. It's like being under water — or not! These headsets have built-in microphones, so there's no reason why you couldn't mix your voice into the shared music stream. Then I can talk to you, you can talk to me, and we can both still hear the music.
The network then becomes like a virtual boombox that only those in close proximity can hear. When you move away, the virtual cable is pulled and the music drops out of your headset. But since your local network is also connected to phone/data networks, you don't even need proximity for this feature.
Given a high-speed, high-resolution, phone audio network, you and your friend could conference-call into a music server or live performance and chat with each other while the music plays in the background. Since you're both on stereo headsets, you could also use 3D audio processing to position yourselves in the best seats in the house, with your friend on your right (who, of course, would hear you on the left).
To be honest, I'm not sure what effect broadband phones will have on multiplayer gaming, but I'm pretty sure it'll be profound. Social networking and mobile technology go together like apple pie and ice cream, and Mobile Web 2.0 is what all the cool kids are into these days. That trend will only continue to increase, and I can easily imagine mobile multiplayer games, where everyone in the group shares a common audio experience. It could be battlefield bullets, concert footage, proximity alerts, or who knows what!
That's the wild card, the "who knew?" factor. You can track trends, look at the hardware, and make all the predictions you like, but there will always be that one new idea, that unforeseeable circumstance or confluence, that turns things around in ways you hadn't even considered before.
They will become like acoustic contact lenses, or a heads-up display for your ears. They'll let you access and control a virtual audio reality that streams in from wireless networks all around you and is mixed with voice data from your phone and from everybody's phone. And although the ubiquitous audio network I'm describing does not yet exist, you can actually listen to what it might sound like today.
It's completely analogous to being in a recording studio, isolated by big headphones, auditioning multiple tracks, and talking to the control room via live mic. I remember my first time in a real studio: I put on the cans and was astounded by the sense of space, the detailed audio field, and the sound of my own voice — in my head, through the mixing board. Now imagine that feeling as a mobile experience, but instead of talking to the engineer on the other side of the glass, you're walking down Broadway, talking to someone on the other side of the world.
First thing, of course, is coffee, and as Joe enjoys his morning brew, he unplugs his mobile device from the charger, puts on the headset, and checks news, weather, and sports, before getting his email. It's such a gorgeous morning that he does it all from his front porch, since he's got broadband connectivity everywhere he goes. He takes an extra moment to watch his favorite video blogger rant about President Obama's reelection.
During the commute to work, Joe checks the online catalog and notices that the new Spiderman game is available. A few button presses later, he's web-slinging his way uptown, enjoying the way the "thwip" sound seems to shoot out and away from his mobile device. But then the game pauses, and the Darth Vader theme plays incongruously, with a screen indicating an incoming call from his boss. Joe sends it to voice mail; he'll listen to it later.
While the game is paused, he selects the "gameplay music" menu item, which takes him to a submenu of his iTunes playlists. He notices a "recommended songs" option, and clicks it out of curiosity. An iTunes screen appears, displaying various playlists intended for use as background music during levels. The first one, of course, is the official movie soundtrack album, remixed for gameplay. Then there are popular DJ mixes of songs from the movie, a user-compiled collection of Swedish death metal and industrial goth, and some music written specifically for the game by a well-known composer.
Joe, being a purist, wirelessly downloads the movie score, and starts webbing up bad guys while grooving to the Danny Elfman theme. But only for a few more minutes, because now he's at the office, and logged into the corporate network. He works at his computer, listening to some deep house grooves, and talks on the phone, switching back and forth easily.
A friend stops by to gossip, so Joe turns his music off and turns on the external mic. His friend does the same thing with his headset, because he wants to show off the outrageous YouTube video everybody's talking about. The friend pulls out his phone, taps it a few times, and plays the video. The audio is streamed to both headsets, and about halfway through, Joe can clearly hear his friend say, "Here it comes!" An office worker passing by is startled when Joe and his friend suddenly, and for no apparent reason, laugh simultaneously.
After work, Joe gets a MySpace alert on his phone, telling him about a party his friends are going to. He uses the phone's built-in GPS locater to navigate to the venue. On the way, he passes by a group of kids, sitting on a stoop, all wearing matching headsets, all nodding in unison to a pounding beat only they can hear. It's a little surreal, but a common enough occurrence these days.
When Joe gets to the party, he finds a group of people playing an MMO tournament with folks in Saskatchewan, Seoul, and Stockholm. Each player is looking at his own device, but they (mostly) share the same audio experience. Joe joins the game, and when he makes a winning move, shouts, "Yeah!" — and opposing players all over the planet moan in dismay.
Later, he chats up a cute girl by noticing the illuminated quicksilver headset she's wearing. It reminds him of the in-ear monitors stage musicians wear, and she shows him how they glow and pulsate in response to the music she's listening to. Joe tunes his own headset to her frequency (with her permission, of course), and together they dance to a song only they can hear. Before he leaves, he takes her picture, enters her email and phone number into his phone's address book, and assigns her a ringtone of the song they were dancing to.
When he finally gets home, he watches a little late-night TV (streaming the sound to his headset, of course) before removing the earbuds to go to sleep. As he plugs the phone into the charger, he realizes he hadn't take his headset off once, all day.
Thank you for listening to my speculations about cell phone networks, hardware, and audio. I hope you found it informative (or at least entertaining). For more rants on interactive audio and mobile technology, check out the "Annoying Audio" blog.
Peter Drescher ("pdx") is a musician and composer with more than 25 years of performance experience. He has produced audio for games, the Web, and mobile devices, using his "Twittering Machine" project studio.
Return to digitalmedia.oreilly.com.
Copyright © 2009 O'Reilly Media, Inc.