I am a musician. I spend a reasonably large proportion of my life creating and recording music. My home studio has all the marks of a musician - guitars, drums, mics, mixers and of course a computer. Despite the fact that pretty much all of my computers run Linux, the studio box is running Windows 2000 so I can use my sound recording tool of choice; Cubase.

From a Open Source consultant and advocates perspective, that computer is obviously a chink in the armour. To replace it with a Linux box and achieve the same results is a real challenge though. There are simply no multi-tracking applications on Linux that provide a comparative experience in terms of functionality and integration. Don’t get me wrong, there are certainly efforts going in to this area and applications such as Ardour, Wired and Rosegarden, but these tools face a number of uphill battles in winning me over. The interesting point is that the challenge is not focused so much on features but on usability and integration.

It is fair to say that the requirements for audio engineering are fairly complex. The need to record audio at different levels of quality, layer on further tracks, mix them, apply effects, edit waves, perform overdubs and mix down are all essential requirements for the sound engineer and musician. Each of these features is no or less important than the other, and they all play a key role in creating a quality recording. When you read through the feature list for many of these tools, they offer the kind of features that I am talking about above. They certainly allow you to record tracks, cut them, adjust their volume and EQ in the mixer, apply some effects and mix them down. On a hard technical level, the feature list is largely satisfied - it is the ’soft’ requirements that are the issue.

When you are identifying the requirements in any kind of software development, it is always essential to prioritise both the hard technical issues and also the ’soft’ social issues. As much as supporting the above features is essential, it is also essential to match the mental mode of operation that the user operates in. When people are recording music, this mode is creative, and technology is typically relegated to unimportance - it should just work. When I am making music, I don’t care for technology. I don’t care about the spec of amps and guitars, I don’t care about the technical characteristics of the mixer; I just want to plug in and record. The time between the birth of a song and getting it down on disk must be short - the creative mind is hampered tiny technical issues, and these issues are unacceptable. As such, any technical barrier in front of creativity is a real issue, and this where the Open Source solution really needs focus. The problems here are not just for those who create the multi-track software applications, but for the entire software stack from the kernel up.

The two flaws; integration and usability

Integration is a key problem in the current Open Source offering, and this is a responsibility of both the application developers and the distributors. If you try to run one of the many multi-track applications, it will need to talk to one of any number of sound systems. This not only requires me to understand what a sound system is, but I also need to dig through the documentation and determine which one it is, how I run it and in which mode. As I am sat there with my guitar resting on my lap, this is one of those frustrating technical barriers.

The integration issue is proportionate throughout the entire system. If I plug in a USB sound card, I want all of my applications to make use of it. Not only that, but I want to be able to configure the sound card from within my application. If you have a simple single channel sound card, the audio mixer will suffice, but if you have a complex card with 10 ins and 10 outs and multiple recording modes, you need an application to manage this. This is where cards such as the M-Audio Delta range fly on Windows - they come with a little control application to manage these parameters. You can certainly control levels with the ALSA mixer, but it will not allow you to deal with the many other options for the card.

The components in the system that do not affect the production of a recording from an interaction perspective need to be fundamentally invisible from the user.

Usability

The second issue is usability. Multi-track tools are renowned for being complex to use. This complexity is not necessarily an issue with the concept of recording audio into tracks, but the issue of having the requisite knowledge to spit shine the track with EQ, dynamics an effects to get the best out if it. This knowledge sits outside of the application. The same can be said for IDE’s - creating a project in an IDE is fairly straightforward; the challenge lies with understanding the code - an entirely separate issue.

The solution to this problem is presets. The vast majority of users who record music are recording within the established remit of a genre. As an example, I record a lot of metal. This genre has some common traits when recording - the guitars are present and fairly scooped, the vocals are up front but slightly recessed in the mix, the bass drum plays a prominent role and requires a ‘clicky’ tone with high mid-range. I also record the entirely opposite ambient/classical style, which also has modes of practise - warm acoustic stringed instruments that are layered and panned throughout the stereo field, very present and up front vocals, plenty of reverb and delay etc.

Each of these modes of practise can be reasonably implemented in sensible defaults throughout the entire application. This not only applies to effects, but to other areas. Some ideas:

  • You could create a new song based on genre. As an example, for a rock band there are typically two guitars, a bass, vocals and guitar. This feature would create the tracks, name them and apply the default effects and panning.
  • All effects need sensible defaults. The common effects such as chorus, reverb, compression, limiting, wah, flanger and others can all have reasonable defaults, and tools such as Cubase do include some impressive defaults.
  • Mixing can also have reasonable defaults. EQ is a science that many don’t understand, and a solid set of defaults can satisfy both common mixing needs and special effects such as simulating AM radio and phone lines.

Many of the issues of usability can be easily solved by identifying the kind of steps required to achieve a common goal. For many people who record, they are often stood up holding an instrument in a small room filled by a band. Interactions with the computer need to be kept to a minimum. The kind of visual interface requirements for recording and the requirements for mixing are entirely different. Recording is simple - you need to manage the stream of audio coming into the computer and assign it to a track, with some minimum level management. Mixing is entirely different beast in which the entire range of features in the tool need to be readily at hand. Mixing is a process that you conduct on your own with a beer, recording is a process you conduct with amps, guitars and band members to contend with.

The application should also hook into the desktop be intuitive. Although Ardour has been touted as one of the tools with the greatest potential, a real sticking issue is the fact that it looks so drastically different from the rest of my GNOME driven desktop (Ardour uses GTK) and is rather unintuitive. With some experience behind me of using Cubase, Cakewalk and Magix Audio Studio, I suspected Ardour with be a cinch to pick up - unfortunately I found it impossible to be productive straight away. If I can’t use it, how is someone with no knowledge of audio recording supposed to use it? Ardour is certainly not the only offender here, and this seems to apply with a number of tools.

Finding a solution

The solution to the problem is integrating key, predictable components and making them work flawlessly. In all honestly, if I cannot download the software and make it work straight away without tinkering around with sound servers and such, it will not get a look in. Period. When you download and use Firefox it just works, when you use OpenOffice.org it just works, when you use The GIMP it just works - when you use Cubase it just works.

Part of this challenge is using comprehensive frameworks for building applications. It seems that GStreamer is becoming a very prominent framework with good support from a range of different applications for different desktops. In addition to this, HAL and DBUS are becoming the de-facto solution for managing hardware. with this in mind, hardware specific issues should really be directed to the kernel and HAL/DBUS teams. This will ensure that changes will propagate upwards through the stack and ease integration. From some discussions with the GStreamer and HAL teams, it seems that the kind of plug and play philosophy regarding hardware and software is becoming reality. With GStreamer and HAL shipping with all distributions, there is the opportunity for the application to just work. The work can then concentrate on being a great multi-tracker.

I am convinced the the problems discussed here have readily available solutions, but I think opening some dialog with the providers of different parts of the stack needs to happen to allow the solution to develop. Creating an integrated and usable system for audio engineering is something that will require cooperation from different parts of the community. This has worked elsewhere with other problem domains, and I see no reason why it cannot work here. Lets see how the story pans out…

What are your thoughts and experiences? Can audio production on Linux get easier? Can we achieve the simplicity experienced on other systems? Are there any interesting developments occurring that will solve these problems? Share your thoughts below…