The first audio UI I produced was for General Magic's MagicLink in 1994. A software synthesizer, then called SoundMusicSys (now known as the Beatnik Audio Engine) had been licensed to Apple for the QuickTime Musical Instrument Set, and a couple of former Apple guys wanted to use that technology to produce sound for a new kind of handheld device.
The MagicLink had a touch screen, a plug-in phone connection, and a brand new operating system.
Many of the desktop actions had sounds associated with them, and I contributed a number of audio UI effects, instrument samples, and "music stamps." These were MIDI files associated with icons that you could select from a "drawer" and "stamp" on your postcard (now called an email attachment). When you received the postcard on your MagicLink and opened it, the MIDI would play automagically.
The challenge was to produce all the built-in audio using sample and MIDI data totalling not more than 128KB...uncompressed! Even IMA 4:1 wasn't available then, so all the sounds had to be 8-bit, and at very low sample rates. Nonetheless, we were able to squeeze in about 15 system sounds, a dozen musical instruments, a dozen music stamps, and a few extras.
When doing audio UI design, the first thing I like to think about is which sound is going to be heard the most. With the General Magic device, every time you touch the screen on a selectable item, you get the [touch]. I derived that sound from tapping a pencil eraser on, well, pretty much every surface I could find. I recorded a bunch of pencil eraser taps on wood, metal, concrete, glass, and the actual device to find something that worked. Other sounds came from effects libraries, like the [door], the [type], and the [switch], while others were more musical, like [ba-ding] and [magic].
Given the cartoon nature of the interface, it's not surprising that some of the sounds are very "mickey mouse." I don't mean that as a put-down, but as a way to describe an audio cliché, like movie music that follows a character's footsteps exactly. That can work great for a Disney cartoon, but it becomes annoying after a while. The General Magic UI received some negative press as being "too cartoony," and the very literal sounds like [slurp] to put something in a folder, and [dismiss] to dismiss a dialog did nothing to curtail that impression.
One last point. This device has a phone jack, and can make and receive phone calls, but the only sound it makes when the phone rings is [ring]. Ringtones as we know them today had yet to be invented. But you could attach songs to email "postcards," which would play for the recipient when opened—kind of a reverse ringtone. ["Bogie's Boogie"] and ["Let's Go!"] were apparently the most popular.
Eight years later, in the fall of 2002, Danger released the first version of the T-Mobile Sidekick, a wireless internet device with a brand new operating system using the Beatnik technology as the audio engine. Sound familiar? Since we were building this thing from the ground up—and, well, because we could—we went "all in" with the audio UI. All the buttons make different sounds. All the actions, like flipping the lid, and opening a menu, and getting a system alert, have their own audio signatures, not to mention multiple built-in alerts for email and instant messaging. And don't forget ringtones for the phone!
The first Sidekick contained about seven minutes of audio.
It even has voice-over alerts—[attention], [new message]—and for good reason: the design director specifically requested a "retro sci-fi" audio UI, and nothing says retro sci-fi like voiceover alerts. (Remember Star Trek.) I'm tellin' ya, everything I know, I learned on Star Trek. Not being able to afford Majel Barrett, I recorded the voice of my good friend Pino the Clown.
The first version of the device was extremely limited, both in power and in memory. My entire audio capacity was about 200KB, the engine output sample rate was only 11.025kHz, and the speaker was the size of a dime. That severely limits the frequency range available for making sound. Basically you got no bass, no high end. Just mids, and a narrow band of mids at that. Other limitations included a slow CPU that required system sounds to be WAV audio to reduce latency, and very tight RAM space, which required that MIDI be used for pretty much everything else.
Of course, the flip screen is the coolest thing about the device, and it quickly became clear that (A) flip open would always be followed by flip close, and (B) you'd be hearing those two sounds a lot, because people would be constantly opening and closing the device.
On my first attempt, I hope I can be forgiven for mickey-mousing it a little. The prototype sound was, you guessed it, [star trek communicator]. But I went with a open chord mixed with a little clap [open, close]. When played on the tiny speaker at the low sample rate, the sound reminded some people of a switchblade snap, which had a kinda "dangerous" feel to it.
Because the device had one of the best thumb keyboards around, another sound we thought would be heard a lot was the key clicks, the sound of typing. In this case, we really didn't want to mickey-mouse it with Underwood manual typewriter keys (like on the Magic Link). Finding the right tone was surprisingly difficult, though the solution was surprisingly simple. I took a sine wave [blip], and drew a click at the beginning of the sound with BIAS Peak's pencil tool—one of the few times I've hand-drawn a digital waveform.
But of course, the sounds that would be heard most often of all were the four buttons: [Menu], [Jump], [Back], and [Wheel], which has one sound for pushing it, and others for scrolling [up] and [down]. Notice how the buttons are all sonically related—in this case by overtone series—fifths and octaves. Notice not only how the Menu button is always followed by a Menu Open, but also how Menu Close can follow any button. It's a complex set of variables, and they all have to work together in multiple ways.
The [sounds] are awfully noisy, mid-rangey, and lo-res, but you can hear why I started thinking about audio UI as interactive music. It's no Top 40 hit, but it's definitely musical, certainly interactive, and each set of sounds has its own character. To produce that character, here's what you do.
First, gather source material that somehow fits the desired theme. This can consist of original recordings, synthesized noises, CD libraries, or movie soundtracks. Use whatever inspires you or fits your aesthetic. For example, I recorded this [sound] as source material for the buttons.
From there, it's a whole lot of trial and error. First you take clips from various sources, edit them, twist them around, and do your audio voodoo until you come up with interesting and hopefully useful sounds. Then you compress them and play them on the device—and of course, they don't sound anything like they did on the headphones, so then it's back to the drawing board.
After a while, you get a rough draft and start playing it for people to gauge their reactions, which will range from "Cool, man" to "What is that horrible noise?" So then it's back to the drawing board again.
And right before you ship, the decision comes down that the system sounds use too much power, so they're turned off by default! That's okay, battery life is way more important than audio, so we'll just have to see what happens.