[Editor's Note: This article is a lightly edited version of Peter Drescher's presentation at the 2006 Austin Game Audio Conference, "Audio UI as Interactive Music." It features more than 30 sounds he played. Here we've highlighted the sound links in yellow brackets, [like this]. Clicking the links will open the sounds in a pop-up window. There's no need to close the pop-up after each sound plays; the next sound you click will replace it.]
Good afternoon. My name is Peter Drescher, and I'm currently Sound Designer at Danger, Inc., makers of the Hiptop mobile internet device, also known as the T-Mobile Sidekick. I used to be a road dog bluesman piano player until about 15 years ago, when I got into multimedia audio, and since then, through no planning on my part, I've become something of an expert on creating user interface sounds for mobile devices. My first audio user interface (UI) was for the General Magic PDA in 1994, and more recently, I've produced system sounds and game soundtracks for all versions of the Sidekick. Today I'm going to explore how the music and user-interface worlds can play together.
Here's the idea: you're using a mobile device, pushing buttons, scrolling web pages, typing instant messages, and while you work, the device emits various sounds to tell you, "Yes, you hit that button," and "No, don't do that," and "Excuse me, you've got mail," and "YO, MAN, PICK UP THE PHONE!" Many of the sounds (like ringtones) can be customized by users and can generate huge revenue flows, but the audio UI is usually a set of related sounds built into the operating system. They're intended to convey information, confirm actions, and issue warnings to the user during device operation.
System sounds have traditionally been quite limited, consisting of the occasional annoying beep. But given today's portable computing power, many devices feature sophisticated interactive audio engines like Beatnik, which allow the production of complex sounds played in multiple ways in response to user commands and data input. The question then becomes, "What sounds do you play?" Here's what I think.
The next time you see a rerun of Star Trek: Next Generation, look in the credits for supervising sound editor Bill Wistrom. I've never met the man, but I'd really like to shake his hand because he was responsible for the incredibly great [user interface sounds] on the starship Enterprise. Watch the show and listen to Commander Data working at his console, typing rapidly, receiving "confirmed" responses, error alerts, and warning voiceovers by Majel Barrett. I'm tellin' ya, that was by far the best audio UI ever.
Of course, they had a distinct advantage being a television show, not reality, and so were able to create long sequences of very cool sounds specifically crafted to convey emotional messages like "Shields are overloading. We're all gonna die!" or "Holodeck program initiated. Let's play a game." Actual devices can't follow a script that way, but as an inspiration for what can be done, Star Trek rules!
Author Peter Drescher tries to capture the Star Trek vibe in his Twittering Machine studio.
An audio UI should provide interactive feedback for device use without being completely annoying. That's a tough trick, since any sound heard over and over again is going to get annoying, really quickly, no matter what you do. That's why so many customers set their devices to buzz only, and why there's usually less work for audio than for graphics. (Nobody ever uses the device with the screen turned off, right?)
But let's say you design a set of sounds that are short, simple, and convey information by their form alone, so that the audio contains some sort of message or meaning, built into the sound itself. A cliché example is using intervals like fifths and thirds to denote "good," and tritones and half-steps to denote "bad." Make them soft and simple, and they won't tire your ear as quickly. They may even blend into the background like a good movie score—you hear it, but you don't listen to it.
Does this sound familiar to anyone in the audience doing game soundtracks? It should. It's all the same stuff. Now that I've created audio UIs for numerous devices, I've come to realize that it's exactly like doing game soundtracks! In fact, these days, I think of audio UIs as a form of interactive music.
Interactive music is a new art form; a way of thinking about video game music; a production technique required by hardware and bandwidth limitations; a talking point for sales pitches; a compositional tool; a parlor trick; a godsend; "The Most Important Innovation in Music Since Equal Temperament!"; and a giant steaming pile of horse manure. Interactive music is many things, but there's only one thing it's not.
Interactive music is not linear. There's no beginning, middle, or end—unlike, oh, I don't know, every other piece of music ever written in the history of mankind! All compositions, from the most complex symphonies to the simplest nursery rhymes, have a beginning, a middle, and an end. And they always play in exactly that order. Completely sequential. Linear in one direction, like life, time, and entropy.
So, let's take this concept of linearity and throw it right out the window. Now write interesting music! Of course, game composers know exactly what I'm talking about, because that's how game soundtracks are created. You never know when the player will finish a level, or when an enemy will jump out from behind a tree, or when the puzzle will be solved. There's no "sync to picture" like in the movies. In fact, since the music changes according to how the game is played, a game score will never be heard the same way twice. In fact, that's kind of the point. Because users play games far longer than a movie's duration, you want to avoid repetition as much as possible.
When you design an audio UI, you want numerous elements working together in multiple ways, playing at unpredictable times. And you want to avoid repetition as much as possible. The audio should be a background soundtrack for navigating the user interface, something you hear but don't really listen to, just like a good game score.
Now, audio UIs aren't going to win any Grammies, but they can be interesting, entertaining, and even useful. I like to think of all the various beeps and boops a device makes as the notes, and the audio UI as an interactive song playing those notes. Each song uses a set of thematically related material designed to produce a certain attitude or mood. Think of it this way. If the device is a superhero, the audio UI is the superhero's theme song.
The trick when writing an interactive song is that you can't predict when any particular button—or combination of buttons—will be pressed, or how the device will react to those commands. Therefore, you must design sounds that will work together in many different ways.
However, the matrix of sounds that will be heard either simultaneously, or sequentially, or in related pairs, is not a mathematical factoring of all possible combinations. For example, you'll never hear "command accepted" and "command rejected" at the same time, nor will you ever hear a "menu open" without a "menu close" soon afterwards. Therefore, some sounds will be connected either sonically or functionally, while others will be totally separate.
Most important, the sequence of generated notes will not be completely random, because it reflects a pattern of input by the user.
There's a word for this kind of "not completely but only mostly" random process: stochastic. When applied to music, a stochastic algorithm can generate notes that are astonishingly more musical than a simple series of random frequencies (which just sounds like noise—and in fact is noise). A series of button presses, command entries, and device responses can be considered a stochastic process. While device use is completely unpredictable, the series of command entries and responses will be related, though never with exactly the same sequence or timing.
When writing an audio UI, it helps to choose sounds that can work like the notes in a song. These would be mostly diatonic (meaning closely related by sonic character), plus a seasoning of chromatic notes (meaning anything else). It also helps to have a theme to write about. Again, we're not talking about love songs or teen angst, but some way of choosing sounds to help create a mood. When you have an infinite variety of content available, limiting your scope and making your choices becomes the most important task.
Audio UI themes can come from the design or marketing departments, and can be an aesthetic concept or branding opportunity. It helps if the theme is related to the hardware, so that the sounds are somehow "appropriate" to the physical device producing them. But they don't have to be, and can actually be completely contradictory, like a sleek plastic cell phone emitting a Bell telephone ring.
Sometimes thematic material is defined by the technical limitations of the device. For a while in the '90s, whenever you turned on a Sprint PCS phone, it went [beedeebeep]. Yup, that was one of mine. It was originally supposed to be a full-blown, heavily synthesized, digital audio branding sound, but when they tried to implement it on the phone, they discovered they couldn't make it loud enough to be heard without blowing up the phone speaker.
So I "rearranged" it for the piezo ringer, which was only able to produce one-voice polyphony and square waves. (Hard to get more limited than that!) Other limitations, like output sample rate, speaker size, and audio engine capabilities, will also narrow your focus. But you can use these technical constraints to your advantage, by designing sounds you know can be rendered clearly on the target platform. I call this "learning to love your limitations," a useful skill in many situations.