[Editor's Note: This article is a lightly edited version of Peter Drescher's presentation at the 2006 Austin Game Audio Conference, "Audio UI as Interactive Music." It features more than 30 sounds he played. Here we've highlighted the sound links in yellow brackets, [like this]. Clicking the links will open the sounds in a pop-up window. There's no need to close the pop-up after each sound plays; the next sound you click will replace it.]
Good afternoon. My name is Peter Drescher, and I'm currently Sound Designer at Danger, Inc., makers of the Hiptop mobile internet device, also known as the T-Mobile Sidekick. I used to be a road dog bluesman piano player until about 15 years ago, when I got into multimedia audio, and since then, through no planning on my part, I've become something of an expert on creating user interface sounds for mobile devices. My first audio user interface (UI) was for the General Magic PDA in 1994, and more recently, I've produced system sounds and game soundtracks for all versions of the Sidekick. Today I'm going to explore how the music and user-interface worlds can play together.
Here's the idea: you're using a mobile device, pushing buttons, scrolling web pages, typing instant messages, and while you work, the device emits various sounds to tell you, "Yes, you hit that button," and "No, don't do that," and "Excuse me, you've got mail," and "YO, MAN, PICK UP THE PHONE!" Many of the sounds (like ringtones) can be customized by users and can generate huge revenue flows, but the audio UI is usually a set of related sounds built into the operating system. They're intended to convey information, confirm actions, and issue warnings to the user during device operation.
System sounds have traditionally been quite limited, consisting of the occasional annoying beep. But given today's portable computing power, many devices feature sophisticated interactive audio engines like Beatnik, which allow the production of complex sounds played in multiple ways in response to user commands and data input. The question then becomes, "What sounds do you play?" Here's what I think.
The next time you see a rerun of Star Trek: Next Generation, look in the credits for supervising sound editor Bill Wistrom. I've never met the man, but I'd really like to shake his hand because he was responsible for the incredibly great [user interface sounds] on the starship Enterprise. Watch the show and listen to Commander Data working at his console, typing rapidly, receiving "confirmed" responses, error alerts, and warning voiceovers by Majel Barrett. I'm tellin' ya, that was by far the best audio UI ever.
Of course, they had a distinct advantage being a television show, not reality, and so were able to create long sequences of very cool sounds specifically crafted to convey emotional messages like "Shields are overloading. We're all gonna die!" or "Holodeck program initiated. Let's play a game." Actual devices can't follow a script that way, but as an inspiration for what can be done, Star Trek rules!
Author Peter Drescher tries to capture the Star Trek vibe in his Twittering Machine studio.
An audio UI should provide interactive feedback for device use without being completely annoying. That's a tough trick, since any sound heard over and over again is going to get annoying, really quickly, no matter what you do. That's why so many customers set their devices to buzz only, and why there's usually less work for audio than for graphics. (Nobody ever uses the device with the screen turned off, right?)
But let's say you design a set of sounds that are short, simple, and convey information by their form alone, so that the audio contains some sort of message or meaning, built into the sound itself. A cliché example is using intervals like fifths and thirds to denote "good," and tritones and half-steps to denote "bad." Make them soft and simple, and they won't tire your ear as quickly. They may even blend into the background like a good movie score—you hear it, but you don't listen to it.
Does this sound familiar to anyone in the audience doing game soundtracks? It should. It's all the same stuff. Now that I've created audio UIs for numerous devices, I've come to realize that it's exactly like doing game soundtracks! In fact, these days, I think of audio UIs as a form of interactive music.
Interactive music is a new art form; a way of thinking about video game music; a production technique required by hardware and bandwidth limitations; a talking point for sales pitches; a compositional tool; a parlor trick; a godsend; "The Most Important Innovation in Music Since Equal Temperament!"; and a giant steaming pile of horse manure. Interactive music is many things, but there's only one thing it's not.
Interactive music is not linear. There's no beginning, middle, or end—unlike, oh, I don't know, every other piece of music ever written in the history of mankind! All compositions, from the most complex symphonies to the simplest nursery rhymes, have a beginning, a middle, and an end. And they always play in exactly that order. Completely sequential. Linear in one direction, like life, time, and entropy.
So, let's take this concept of linearity and throw it right out the window. Now write interesting music! Of course, game composers know exactly what I'm talking about, because that's how game soundtracks are created. You never know when the player will finish a level, or when an enemy will jump out from behind a tree, or when the puzzle will be solved. There's no "sync to picture" like in the movies. In fact, since the music changes according to how the game is played, a game score will never be heard the same way twice. In fact, that's kind of the point. Because users play games far longer than a movie's duration, you want to avoid repetition as much as possible.
When you design an audio UI, you want numerous elements working together in multiple ways, playing at unpredictable times. And you want to avoid repetition as much as possible. The audio should be a background soundtrack for navigating the user interface, something you hear but don't really listen to, just like a good game score.
Now, audio UIs aren't going to win any Grammies, but they can be interesting, entertaining, and even useful. I like to think of all the various beeps and boops a device makes as the notes, and the audio UI as an interactive song playing those notes. Each song uses a set of thematically related material designed to produce a certain attitude or mood. Think of it this way. If the device is a superhero, the audio UI is the superhero's theme song.
The trick when writing an interactive song is that you can't predict when any particular button—or combination of buttons—will be pressed, or how the device will react to those commands. Therefore, you must design sounds that will work together in many different ways.
However, the matrix of sounds that will be heard either simultaneously, or sequentially, or in related pairs, is not a mathematical factoring of all possible combinations. For example, you'll never hear "command accepted" and "command rejected" at the same time, nor will you ever hear a "menu open" without a "menu close" soon afterwards. Therefore, some sounds will be connected either sonically or functionally, while others will be totally separate.
Most important, the sequence of generated notes will not be completely random, because it reflects a pattern of input by the user.
There's a word for this kind of "not completely but only mostly" random process: stochastic. When applied to music, a stochastic algorithm can generate notes that are astonishingly more musical than a simple series of random frequencies (which just sounds like noise—and in fact is noise). A series of button presses, command entries, and device responses can be considered a stochastic process. While device use is completely unpredictable, the series of command entries and responses will be related, though never with exactly the same sequence or timing.
When writing an audio UI, it helps to choose sounds that can work like the notes in a song. These would be mostly diatonic (meaning closely related by sonic character), plus a seasoning of chromatic notes (meaning anything else). It also helps to have a theme to write about. Again, we're not talking about love songs or teen angst, but some way of choosing sounds to help create a mood. When you have an infinite variety of content available, limiting your scope and making your choices becomes the most important task.
Audio UI themes can come from the design or marketing departments, and can be an aesthetic concept or branding opportunity. It helps if the theme is related to the hardware, so that the sounds are somehow "appropriate" to the physical device producing them. But they don't have to be, and can actually be completely contradictory, like a sleek plastic cell phone emitting a Bell telephone ring.
Sometimes thematic material is defined by the technical limitations of the device. For a while in the '90s, whenever you turned on a Sprint PCS phone, it went [beedeebeep]. Yup, that was one of mine. It was originally supposed to be a full-blown, heavily synthesized, digital audio branding sound, but when they tried to implement it on the phone, they discovered they couldn't make it loud enough to be heard without blowing up the phone speaker.
So I "rearranged" it for the piezo ringer, which was only able to produce one-voice polyphony and square waves. (Hard to get more limited than that!) Other limitations, like output sample rate, speaker size, and audio engine capabilities, will also narrow your focus. But you can use these technical constraints to your advantage, by designing sounds you know can be rendered clearly on the target platform. I call this "learning to love your limitations," a useful skill in many situations.
The first audio UI I produced was for General Magic's MagicLink in 1994. A software synthesizer, then called SoundMusicSys (now known as the Beatnik Audio Engine) had been licensed to Apple for the QuickTime Musical Instrument Set, and a couple of former Apple guys wanted to use that technology to produce sound for a new kind of handheld device.
The MagicLink had a touch screen, a plug-in phone connection, and a brand new operating system.
Many of the desktop actions had sounds associated with them, and I contributed a number of audio UI effects, instrument samples, and "music stamps." These were MIDI files associated with icons that you could select from a "drawer" and "stamp" on your postcard (now called an email attachment). When you received the postcard on your MagicLink and opened it, the MIDI would play automagically.
The challenge was to produce all the built-in audio using sample and MIDI data totalling not more than 128KB...uncompressed! Even IMA 4:1 wasn't available then, so all the sounds had to be 8-bit, and at very low sample rates. Nonetheless, we were able to squeeze in about 15 system sounds, a dozen musical instruments, a dozen music stamps, and a few extras.
When doing audio UI design, the first thing I like to think about is which sound is going to be heard the most. With the General Magic device, every time you touch the screen on a selectable item, you get the [touch]. I derived that sound from tapping a pencil eraser on, well, pretty much every surface I could find. I recorded a bunch of pencil eraser taps on wood, metal, concrete, glass, and the actual device to find something that worked. Other sounds came from effects libraries, like the [door], the [type], and the [switch], while others were more musical, like [ba-ding] and [magic].
Given the cartoon nature of the interface, it's not surprising that some of the sounds are very "mickey mouse." I don't mean that as a put-down, but as a way to describe an audio cliché, like movie music that follows a character's footsteps exactly. That can work great for a Disney cartoon, but it becomes annoying after a while. The General Magic UI received some negative press as being "too cartoony," and the very literal sounds like [slurp] to put something in a folder, and [dismiss] to dismiss a dialog did nothing to curtail that impression.
One last point. This device has a phone jack, and can make and receive phone calls, but the only sound it makes when the phone rings is [ring]. Ringtones as we know them today had yet to be invented. But you could attach songs to email "postcards," which would play for the recipient when opened—kind of a reverse ringtone. ["Bogie's Boogie"] and ["Let's Go!"] were apparently the most popular.
Eight years later, in the fall of 2002, Danger released the first version of the T-Mobile Sidekick, a wireless internet device with a brand new operating system using the Beatnik technology as the audio engine. Sound familiar? Since we were building this thing from the ground up—and, well, because we could—we went "all in" with the audio UI. All the buttons make different sounds. All the actions, like flipping the lid, and opening a menu, and getting a system alert, have their own audio signatures, not to mention multiple built-in alerts for email and instant messaging. And don't forget ringtones for the phone!
The first Sidekick contained about seven minutes of audio.
It even has voice-over alerts—[attention], [new message]—and for good reason: the design director specifically requested a "retro sci-fi" audio UI, and nothing says retro sci-fi like voiceover alerts. (Remember Star Trek.) I'm tellin' ya, everything I know, I learned on Star Trek. Not being able to afford Majel Barrett, I recorded the voice of my good friend Pino the Clown.
The first version of the device was extremely limited, both in power and in memory. My entire audio capacity was about 200KB, the engine output sample rate was only 11.025kHz, and the speaker was the size of a dime. That severely limits the frequency range available for making sound. Basically you got no bass, no high end. Just mids, and a narrow band of mids at that. Other limitations included a slow CPU that required system sounds to be WAV audio to reduce latency, and very tight RAM space, which required that MIDI be used for pretty much everything else.
Of course, the flip screen is the coolest thing about the device, and it quickly became clear that (A) flip open would always be followed by flip close, and (B) you'd be hearing those two sounds a lot, because people would be constantly opening and closing the device.
On my first attempt, I hope I can be forgiven for mickey-mousing it a little. The prototype sound was, you guessed it, [star trek communicator]. But I went with a open chord mixed with a little clap [open, close]. When played on the tiny speaker at the low sample rate, the sound reminded some people of a switchblade snap, which had a kinda "dangerous" feel to it.
Because the device had one of the best thumb keyboards around, another sound we thought would be heard a lot was the key clicks, the sound of typing. In this case, we really didn't want to mickey-mouse it with Underwood manual typewriter keys (like on the Magic Link). Finding the right tone was surprisingly difficult, though the solution was surprisingly simple. I took a sine wave [blip], and drew a click at the beginning of the sound with BIAS Peak's pencil tool—one of the few times I've hand-drawn a digital waveform.
But of course, the sounds that would be heard most often of all were the four buttons: [Menu], [Jump], [Back], and [Wheel], which has one sound for pushing it, and others for scrolling [up] and [down]. Notice how the buttons are all sonically related—in this case by overtone series—fifths and octaves. Notice not only how the Menu button is always followed by a Menu Open, but also how Menu Close can follow any button. It's a complex set of variables, and they all have to work together in multiple ways.
The [sounds] are awfully noisy, mid-rangey, and lo-res, but you can hear why I started thinking about audio UI as interactive music. It's no Top 40 hit, but it's definitely musical, certainly interactive, and each set of sounds has its own character. To produce that character, here's what you do.
First, gather source material that somehow fits the desired theme. This can consist of original recordings, synthesized noises, CD libraries, or movie soundtracks. Use whatever inspires you or fits your aesthetic. For example, I recorded this [sound] as source material for the buttons.
From there, it's a whole lot of trial and error. First you take clips from various sources, edit them, twist them around, and do your audio voodoo until you come up with interesting and hopefully useful sounds. Then you compress them and play them on the device—and of course, they don't sound anything like they did on the headphones, so then it's back to the drawing board.
After a while, you get a rough draft and start playing it for people to gauge their reactions, which will range from "Cool, man" to "What is that horrible noise?" So then it's back to the drawing board again.
And right before you ship, the decision comes down that the system sounds use too much power, so they're turned off by default! That's okay, battery life is way more important than audio, so we'll just have to see what happens.
We sold every single product we made! Suddenly Jay-Z is prominently displaying his Sidekick in a video, and boom, every rapper's gotta have one. Then Demi Moore is instant-messaging Ashton Kutcher during the Letterman show, and bam!—everybody in Hollywood's gotta have one too. Suddenly, we're under the gun to produce the sequel.
From an audio point of view, there have been a few improvements, most importantly with the speaker. It's big, the size of a quarter. Even better, this sucker is LOUD! The output sample rate is now 16kHz for a clearer high end, but we're still using 11.025kHz samples and IMA 4:1 compression. The audio capacity has been doubled to 500KB, which enabled dramatically improved audio quality for the built-in ringtones.
The audio UI theme for the Sidekick II was called Abstract Technical.
We wanted the [UI sounds] to be subtle, clean, sythesized, and, most important, not mickey-moused. I'd realized that it almost doesn't matter what sound you play in response to a button push or menu option, as long as it's consistent. Your ear will associate this sound with that action quite rapidly, almost regardless of what the sound actually is. As long as it's not jarringly wrong (like a explosion for a button press), or too "on the money" (like AOL's "You've got mail!"), practically any abstract synthesized sound will do.
The trick comes when they all have to work together without clashing. And again with the Sidekick II, I ended up being fairly diatonic with the occasional chromatic accent, almost against my better judgement. That's just what seemed to work best.
Sometimes, you'd like the form of the sound alone to convey the intended message. That's useful when you need to alert the customer to a specific condition, like low battery. On any cell phone, running out of power's bad, but with the Sidekick, it was double bad because all your stored data had to be redownloaded from the service. God forbid you should run out of power in an area with no service; your device became a useless brick. Better to turn it off rather than let the battery drain.
In the first version, Joe Britt, one of our founders, told me to make the low battery sound "as horrifying as possible," which I did, all tritones and nasty harmonics. But people were actually frightened by the abrupt noise of the [alert], and so didn't understand it meant "plug me in!" The second time around, I wanted to make the alert more onomatopoetic and came up with this, now universally known as [sad clown]. I have been told that new users have associated "sad clown" with "low battery" on the very first listen, which I count as a personal triumph.
Well, we sold every one of those devices we could make, too! The day after the "Paris Hilton got hacked" story, we were sold out in New York and L.A.—apparently, there really is no such thing as bad publicity. Ringtone sales went through the roof, and Sidekicks started popping up everywhere, prominently featured in movies and on TV, and most especially in music videos. We even had [Snoop Dog] doing catch phrases. And so, we began work on the next version....
The Sidekick 3 supports "CD-quality" audio because of the new MP3 player application.
The new device is a big departure from the previous one, and turned out to be a good news/bad news joke for the audio. Good news: MPEG compression is now available, yay! All that high-end "frying bacon" noise and crunchy audio you get with IMA 4:1 is gone—replaced by lo-rez MP3. Output sample rate was quadrupled to 44.1kHz, more than doubling the frequency response.
More RAM was available, so my audio budget doubled again, up to 1MB. With an almost 20:1 MPEG compression rate, I could make big fat audio ringtones without taking up a lot of space. As you might imagine, I was fairly excited to get to work.
Until the new hardware came in, and then I discovered the bad news: the speaker was the size of a penny...and there was only one! In previous versions, there'd been two speakers, one on the front, wired into the phone (small and quiet, for your ear) and another on the back, wired into the OS (big and loud, for open air). This time, to make the device smaller, there was just the one speaker to pull double duty—and it suuuucked! But as with any multimedia product, you simply deal with what you got.
Because the new hardware is all shiny and smooth, with transparent keys and a glowing trackball, the audio UI came to be called ["crystalline"]. The design director said he wanted it to sound "like drops of water in a cave" or "a shimmering crystal ball"; he wanted people to go, "Ooooooo." And every sound designer knows what that means: more reverb!
Reverb on a mobile device seems conceptually awkward—this tiny thing emiting gigantic sounds like you're in a cathedral. And of course, it can't make giant sounds. The device has no bottom end, no stereo (no separation even if it did have stereo), no clarity, and no volume. In short, it has none of the things necessary to fool the ear into thinking you're in a cathedral. But it turns out that keeping the reverb tails on many of the sounds gave them an increased sense of space.
Gathering source material was fun. I recorded all kind of things that went [bing!]: glass bowls, agogo bells, pipe clanks, you name it. Wine goblets and drinking glasses were the most useful, and I recorded a lot of them, because you never know how the file will sound when it's compressed and played on the device.
On the other hand, knowing that low MPEG compression rates create a characteristic blurring of transients, I purposefully encoded sounds with extremely sharp attacks, to be smoothed out by the algorithm. That worked surprisingly well, saved me a lot of space, and is an excellent example of learning to love your limitations.
And yet again, even though I set out to use non-musical tones, the sounds that worked the best were in tune with each other, and the whole thing became a kind of glass harmonica after all.
Audio UIs may currently be considered an esoteric corner of the interactive music world, but that may change, and for a very good reason: money, and lots of it! Given the wild popularity of ringtones, it seems likely that customizable system sounds represent a sizable revenue stream for carriers. How much would you pay to download a package of Simpsons wallpapers, ringtones, and "annoyed grunt" button effects, or even cooler, make your iPhone not only look like a Star Trek datapad, but sound like one too?
Bootup sounds can also represent a valuable branding opportunity. Remember that the next time you land safely at your chosen destination, the captain turns off the Fasten Seatbelt sign, and a planeload of passengers turn on their cell phones. Compare and contrast the various sounds and audio technologies demonstrated. Do the phones just beep, or do they do a little song and dance? Hear anything you [recognize]? If yes, then you have been successfully indoctrinated by T-Mobile's marketing department.
I'd like to give special thanks to the Game Audio Conference advisory board for letting me rant about this topic. For more rants about interactive audio, mobile games, and other things that annoy the living crap outta me, check out the Annoying Audio blog, published periodically by the kind (and brave) folks at O'Reilly Digital Media. Thank you.
Peter Drescher ("pdx") is a musician and composer with more than 25 years of performance experience. He has produced audio for games, the Web, and mobile devices, using his "Twittering Machine" project studio.
Return to digitalmedia.oreilly.com.
Copyright © 2009 O'Reilly Media, Inc.