Linux DevCenter    
 Published on Linux DevCenter (http://www.linuxdevcenter.com/)
 See this if you're having trouble printing code examples


Achieving Low-Latency Response Times Under Linux

by Dave Phillips
11/17/2000

Latency can be defined as the elapsed time (delay) between the generation of an event and its realization. If the delay is great enough to be perceptible, you have a latency problem.

For instance, when you play an interactive game, you expect no delay between the initiation of an event (turning a steering wheel or firing a weapon) and its realization (the vehicle turns when you turn the wheel, the weapon fires when you pull the trigger). You expect the event to occur instantaneously, or nearly so. The MIDI keyboard to synthesizer (or soundcard) connection is another simple example: You press the key, you expect the sound to occur immediately. If system latency is great enough, there will be a perceptible delay between the keypress and the resulting sound.

In computer systems, multimedia software and other real-time applications are severely affected by long latency times, but system performance at all levels can be compromised. Video and animations suffer frame-loss, and sound applications suffer audio dropouts. Extreme high latency may endanger data integrity during disk read/write/copy activity. Obviously, lowering the latency response time in a system enhances I/O through the entire system. This is good.

Any real-time or interactive software hopes for zero perceptible delay, but short of quantum connections, it's impossible to completely eradicate latency in a system. However, latencies below a certain threshold may be referred to as measurable but not perceptible. So when does measurable latency become perceptible?

In the audio domain, studies have indicated that the ear is sensitive to timing differences at the millisecond level, perhaps even down to a single millisecond. However, latencies under 7 msec are not typically perceptible and are considered acceptable for desktop and semiprofessional applications. Systems achieving average latencies of less than 5 msec should be considered ideal platforms for professional latency-critical applications.

Frustrated expectations lead to dissatisfied and unhappy users. Unhappy users lead to unhappy programmers. I have some good news for users and programmers alike: A simple user-applied patch to the Linux kernel sources (and an uncomplicated disk tune-up) can reduce latency times to under 4 msec. Programmers can easily exploit this new low-latency condition by setting scheduler priority from within their applications, achieving performance with latency well within professionally acceptable limits. This article will show you what's involved and how you can do it yourself.

What causes latency in a system?

Latency is introduced into a computer system by a variety of factors. Some are hardware-related and can be improved by simply adding a faster disk, speedier CPU, more RAM, or a better video card. Other factors are found within your software. At the application level, programmers may miss opportunities for more efficient data input/output. At the system level, overall load will certainly have an impact on latency. And within the operating system itself, latency may be adversely affected by an accumulation of untuned or poorly tuned scheduling requests.

In the MIDI keyboard example noted above, several possible sources of latency exist: the keyboard's physical response time, the type and density of the MIDI message stream (coupled with the fact that MIDI is a serial protocol), and delays introduced by the synthesizer's processor scheduling factors. Engineers devote considerable efforts to minimizing latencies at every point in a time-sensitive system, and the typical acceptable latency on a professional MIDI synthesizer ranges between 2 and 5 msec.

There are patches designed to reduce latencies introduced by various activities of the Linux kernel. Although I'll be focusing on using these patches to improve audio software, these enhancements also apply to other forms of streaming and multimedia.

How can latency be minimized?

You can easily enhance hardware performance by installing faster disks, memory, processors, and video cards. But hardware upgrades can't defeat the effects of latencies introduced at the software level, so what can you do with software in order to reduce latency times?

Some obvious solutions include eliminating all non-necessary activity, particularly any disk-intensive or CPU-intensive work. Staying off any networks is probably a good idea too, especially if you intend to run an application as root.

Under certain circumstances it may be possible to run an application in a condition known as setuid root. As the name implies, this condition sets the user's ID to root status, raising the application's execution priority to real-time scheduling. Unfortunately, setuid applications can be a security hazard on networked machines, so check with your system administrator for advice on running setuid root applications on a network. Standalone and single-user machines can freely ignore the security risk (off-line, of course) of setuid root, giving multimedia applications performance priority not available otherwise. Some Linux audio applications such as XMMS and ecasound already provide a switch for setting execution priority to real-time. Other applications such as SoundTracker and the "unofficial" Linux version of Csound can be compiled with setuid enabled for better scheduling.

SCSI disks are often preferred for multimedia work, and conventional wisdom advises the purchase of SCSI disks for best performance in audio/video-intensive applications. However, in an informative on-line article at Prorec.com, D. Glen Cardenas states that "... there is enough evidence to say that either SCSI or IDE can offer the full level of drive performance necessary to take your system to its full potential. Although there are places for SCSI where IDE dare not go, one of them is NOT the digital audio workstation. Here, IDE can outperform SCSI just as often as SCSI can outperform IDE ..."

Whether you own SCSI or IDE/EIDE drives, Mark Lord's hdparm utility is a safe and simple way to increase your disk's performance, turning on support for such enhancements as DMA (direct memory access) and 32-bit I/O. For instance, these settings

hdparm -m 8 -d 1 -u 1 -c 1 /dev/hda

will enable the first IDE/EIDE drive (/dev/hda) with support for DMA, multicount (multiple data blocks transferred per single interrupt), 32-bit mode, and IRQ unmasking. I must stress that running hdparm is essential if you wish to achieve low latency from IDE/EIDE disks. Latency is dramatically increased if you don't run hdparm, so tune that drive! If your disks are IDE/EIDE, you owe it to yourself to use hdparm. It is included with most mainstream Linux distributions, and it even has a very nice manual page (man hdparm) that contains far more information about the utility than I can present here.

Note: Some very old IDE drives may not like hdparm. Be sure to read the hdparm manual page before using it on such a drive. Also, you might want to revisit Rob Flickenger's Speeding Up Linux Using hdparm, published earlier this year on the O'Reilly Network.

After upgrading your hardware and optimizing your system usage, you will still be left with latencies caused by factors within the Linux kernel itself. But thanks to the wonder of open source software, you can even fix your operating system kernel. Roger Larsson and Benno Senoner have designed tools to measure and represent latencies in the Linux kernel. Using these and other tools, Ingo Molnar and Andrew Morton have created simple patches that focus on points in the kernel that most critically affect latency figures. We will take a closer look at one of those patches, but first let's look at the utilities that are used to measure and identify sources of latency in the kernel (and other applications).

Identifying latency sources in the Linux kernel

Ingo Molnar identified six sources of latency in the Linux kernel:

Each of these routines delays returning control to the scheduler for several milliseconds, and with enough delay we get audio dropouts and video frame-loss. Fortunately for the Linux community, Ingo didn't stop with merely identifying the sources of latency in the kernel. He created a series of patches designed to remedy the situation, and those patches yielded impressive results. On an unpatched 2.2.10 kernel, latencies measured as high as 150 msec; with Ingo's patch, latency dropped to less than 3 msec, a truly dramatic reduction and definitely within the range needed for professional applications.

Ingo's achievement is especially remarkable with regard to the efforts required to gain low latency on other operating systems, most notably the Microsoft Windows family. In a presentation of the results from a roundtable conference at the February 2000 NAMM (National Association of Music Merchandisers) show, Ron Kuper (CTO of Cakewalk) stated that "... an obtainable target for audio latency under Win2k is 5 msec, even under heavy system loads" (from Audio I/O, Today and Tomorrow). However, in order to hit that target, it is necessary to bypass Microsoft's KMixer (kernel mode audio mixer) in their Win32 driver model, because (quoting Mr. Kuper again) "... KMixer nominally adds 30 msec of latency to audio playback streams. (At present, Microsoft does not provide a method to allow host applications to bypass KMixer.)" (also from Audio I/O, Today and Tomorrow). A plan for a set of IOCTLs (input/output controls) to bypass KMixer was presented at the Windows Professional Audio Roundtable, sponsored by Cakewalk, intended for adoption by the professional audio software industry. Clearly, latency on Windows is a problem commanding the attention of the most serious audio software manufacturers. The comparative ease with which Linux achieves latency even lower than 5 msec ought to be of compelling interest to these same manufacturers and organizations.

Bennomarks: measuring latency in the Linux kernel

Benno Senoner has designed a series of stress tests to measure latency at critical points in Linux performance, including the impact of the X window system, calls to the /proc filesystem, and read/write/copy disk activity. A test sound is played, the program measures the time taken by the write() call to /dev/dsp for each iteration of the sound, and the resulting data is represented in graphs with the ideal latency (the time it takes to play a single audio fragment) and the real measured latency superimposed on each other. The test results ("bennomarks") are also directed to the console, but the graphs are an impressive visual display of the efficiency gained by the low-latency patches.

Follow these links to view the performance graphs for the following kernels:

The graphs tell the tale: Average latencies under 2 msec with Ingo Molnar's original patch for the 2.2.10 kernel, and less than 4 msec with Andrew Morton's patch for kernel version 2.4.0-test9. The comparative views of the 2.4.0-test9 graphs are most instructive: Latencies decreased from almost 400 msec (unpatched kernel, untuned drives) to less than 4 msec (patched kernel, tuned drives).

Note: In all cases the test platform was a 550 MHz Pentium III with 256 MB RAM and two 15-GB Maxtor EIDE hard disks.

I urge you to run Benno's latency test suite. Two versions of the software are available (one for creating GIF-format graphs and one for PNG graphs); the build is uncomplicated, and it is easy to run. Simply become root on your system, tune your hard drives with Benno's tunedisk script, turn off any screensaver you have installed (you must run latencytest in X), and then start the program with this command sequence:

./do_tests none 3 256 0 350000000

The do_tests script will perform five successive latency tests (stress_x11, stress_proc, stress_diskwrite, stress_diskcopy, stress_diskread) on a simple cyclic waveform (the none in the command sequence; a WAV file may be named instead), using three audio fragments of 256 bytes each, with 0 syncing, and with a test file size of 350 MB (for the disk I/O tests). Console output should resemble this report from a run on the 2.4.0-test9 kernel.

Although Benno's program is generally safe and easy to use, you will be running it as root, and I do have a warning for certain users: If you have a SBLive soundcard and are using the ALSA driver, do_tests may freeze your system completely, requiring a hard system reset. Version 3.9.3q of the OSS/Linux driver for the SBLive gave me no troubles, and I assume the kernel modules will also work without problems. I must emphasize that the problem with ALSA occurs only with the SBLive: I ran the tests with the ALSA 0.5.9d driver for my SB PCI128 and experienced no problems.

Patching your own Linux kernel for low latency

You can easily prepare your own Linux system for low-latency performance. Simply apply one of the patches available from Andrew Morton's scheduling latency page or Ingo Molnar's low-latency patches page to a specific Linux kernel source package, then build and install the patched kernel. Patches are available for kernel versions from 2.2.10 through the 2.4.0-testN series: Be sure to choose the correct patch for your selected kernel version!

The patches address latencies introduced at a variety of points within the Linux kernel, including scheduling for interrupt requests (IRQ), the virtual memory system, and TCP socket connections. Versions up to 2.4.0-test9 are for uniprocessor systems only. Versions from 2.4.0-test10-pre3 onwards support multiprocessor systems, but I was unable to test those patches.

Patching a 2.2.x kernel should present no special problems. Simply apply the patch to the source package as detailed below, then rebuild and install the new kernel. Most mainstream distributions (e.g., Red Hat, Mandrake, Debian, SuSE) will have the required versions of the various support packages for compiling and installing the 2.2.x kernels. The situation is quite different for the 2.4.0-test kernels.

If you decide to patch a 2.4 kernel, I strongly advise reading Paul Winkler's excellent mini-HOWTO on upgrading to 2.4 with the low-latency patch and support for the ALSA driver. Building a 2.4 kernel is not especially difficult, but it requires some updated packages (described in /usr/src/linux/Documentation/Changes) and some special treatment with regard to the new modutils package (see below).

To those of you who have never built a Linux kernel, I say: Have no fear, the procedure is simplicity itself, especially with the graphic configuration utilities (the curses-based menuconfig and the Tk-based xconfig). But before you can build your kernel, you have to get its source package. Go to the Linux Kernel Archive and follow the instructions there on retrieving your desired sources. Once you have the source package on your local disk, become the root user (with su root), move it to /usr/src, and unpack it.

Note: You must be root to successfully execute the rest of the commands indicated in this section !

Here's how to unpack a gzipped source tarball:

cd /usr/src
tar xzvf linux-2.4.0-test9.tar.gz

And here's how to unpack a bzipped tarball:

cd /usr/src
tar xIvf linux-2.4.0-test9.tar.bz2

RPM users need only run this command:

rpm -ivh linux-2.4.0-test9.src.rpm

Whatever package you used, after unpacking you should have a new directory named /usr/src/linux-2.4.0-test9. The next two commands will create a link to that directory simply named /usr/src/linux:

rm /usr/src/linux
ln -s /usr/src/linux-2.4.0-testX /usr/src/linux

Now you can apply Andrew Morton's patch to the 2.4.0-test9 kernel sources. First, download the patch from Andrew's site and move it to /usr/src. Then follow these commands (while still root):

cd /usr/src/linux
patch -p1 < 2.4.0-test9-low-latency.patch

Now change to the source directory, then build and install your patched kernel:

cd /usr/src/linux
make mrproper
make config

You'll need to make a few specific option selections during the configuration process (make config, make menuconfig, or make xconfig). If you have an IDE/EIDE hard-disk, go to "ATA/IDE/MFM/RLL support" and select two options in the "IDE, ATA and ATAPI Block devices" section: Say "Yes" to "Generic PCI Bus-master DMA Support" and "Use PCI DMA by default when available." You must enable these options in order to use the hdparm utility to turn on DMA for your hard disks. Next, go to the "Character devices" section and enable "Enhanced Real-time Clock Support" for access to your computer's hardware clock. Finally, set up sound support. If you want to use the ALSA drivers, enable sound support as a module but don't select anything else; otherwise, select the modules that work for your particular soundcard.

Now you can proceed to build and install the kernel. While still in the /usr/src/linux directory, run these commands:

make depend
make bzImage
make install

At this point you may want to reconfigure LILO. Run vi /etc/lilo.conf, make whatever changes you prefer, and then run /sbin/lilo to update the loader.

The new modutils package must be compiled under the 2.4 kernel itself, so before you can build and install the new kernel modules you must reboot, then compile and install the new modutils. Return to /usr/src/linux and run the following commands:

make modules
make modules_install

Now reboot your system.

That's it, except for tuning your disks and heeding Andrew Morton's advice regarding proscribed activities (don't scroll the framebuffer console, don't switch consoles, don't run a server with hundreds of TCP connections per second, and so forth). As Andrew says, none are particularly "show-stoppers", but see his web site for more details.

Low-latency in the real world: performance & programming examples

The POSIX specification provides a "soft" real-time API with the SCHED_FIFO macro. Programmers can add real-time scheduling to their applications by adding a very small piece of code to raise the application's performance priority. Benno Senoner has suggested this fragment:

#include <sched.h>  
int set_real-time_priority(void)
{
struct sched_param schp;
    /*
     * set the process to real-time privs
     */
    memset(&schp, 0, sizeof(schp));
    schp.sched_priority = sched_get_priority_max(SCHED_FIFO);

    if (sched_setscheduler(0, SCHED_FIFO, &schp) != 0) {
            perror("sched_setscheduler");
            return -1;
    }

     return 0;

}

Benno also advises programmers that "... A low latency kernel alone is not enough to get dropout-free real-time audio I/O: the application must be written in such a way that the audio thread never blocks (except when doing audio I/O), and all communication with the outside world must be done by using lock-free datastructures like shared-mem and lock-free FIFOs."

Ecasound in interactive mode.

Figure 1. Ecasound in interactive mode.

For a practical demonstration of an application running with real-time scheduling on a low-latency Linux kernel, I decided to employ Kai Vehmanen's ecasound [Figure 1] in its capacity as a hard-disk recorder. The project included recording from four-track and stereo tapes to the EIDE drive on my computer. Sessions in ecasound can be prioritized to real-time scheduling by setting the -r flag, calling this code from eca-main.cpp.

Here is the full command sequence I used to launch ecasound:

ecasound -c -b:4096 -r -i:alsa,0,0,0 -o:my_foo.wav

The -c flag starts ecasound in interactive mode, the -b setting declares the audio buffer size, the -r flag raises priority to real-time scheduling, the -i settings declare the input device (ALSA here, with card, device, and subdevice numbers), and the -o flag names the output file (by default a 16-bit stereo recording at 44.1 kHz).

My project has proceeded smoothly. Ecasound performs flawlessly in console mode and in X, recording multitrack input to stereo WAV files that I burn to CDs using the excellent gcombust (a GTK-based front-end for Joerg Schilling's cdrecord and Eric Youngdale's mkisofs, both standard items on most mainstream Linux distributions).

Figure 2: SoundTracker (click for full-size view).

In the source code to Michael Krause's SoundTracker [Figure 2], you can find a switch to compile the tracker so that it runs as a real-time process. (For those who don't know: A tracker is basically a sequencer for sound samples, typically handling dozens of samples within a composition. A tracker is thus a prime candidate for enjoying the benefits of lowered latency times.) In SoundTracker's audio handling thread (audio.c), once again we see this priority scheduling code.

Michael's comment about his kernel crash was made regarding an unpatched 2.2 kernel. I ran SoundTracker 0.6.0 with no problem under the patched 2.4.0-test9 kernel, so I decided to do a little stress-testing of my own. I played a series of mods (music modules created by trackers) in SoundTracker while browsing the Web and catching up on my e-mail (in Netscape, a well-known resource hog). I also opened a few extra terminal windows in the five workspaces I had open in Blackbox (my window manager of choice). Even switching between workspaces caused no skipping or dropouts while the modules played.

MusE

Figure 3: MusE (click for full-size view).

Werner Schweer's MusE [Figure 3] is a fine audio/MIDI sequencer in development. It already has an excellent features set including syncronization with external MIDI devices and real-time editing. It is also one of the few Linux sound applications specifically requesting a kernel built with low-latency and real-time clock support. I built and ran version 0.2.7 using the ALSA 0.5.9d drivers and subjected it to one of my denser MIDI files. Performance was very good when using the external MIDI port of my SBLive, less so when accessing the SBLive's onboard synth, but I believe the problem to be due more to the card than to MusE. Many of my own MIDI compositions utilize particularly dense tempo tracks, often with tempo changes occurring continuously at the 16th-note triplet level throughout an entire piece. I downloaded other MIDI files from various sites on the Web and had no performance problems when playing them through MusE.

XMMS with DeFX and QuiXound plugins

Figure 4: XMMS with DeFX and QuiXound plugins (click for full-size view).

XMMS [Figure 4] is a wonderful media player for X, with support for a variety of audio and video formats, thanks largely to its plug-in structure. Plug-ins are also available for a wide range of audio processors, including echo generators, sound spatializers, and multi-effects panels. So as a final test I set XMMS to run with real-time priority under the 2.4.0-test9 low-latency kernel. I then activated the Effects Output plug-ins to test various effects in real-time. The DeFX plug-in (seen in Figure 4) does not pretend to replace a professional audio DSP box, but it was great fun to adjust its parameters in real time. Response was immediate and smooth, even when switching between effects. The QuiXound 3D Surround plug-in (also seen in Figure 4) was equally responsive, creating some striking localization and spatialization effects in real time.

Andrew Morton suggested that I repeat these tests using an unpatched 2.4.0-test9 kernel with untuned drives. I did, and the results were quite interesting. SoundTracker ran well even without the patch and tuning, probably because it runs setuid and I compiled it with SCHED_FIFO support. XMMS also worked well without real-time scheduling, but with somewhat sluggish response during the real-time effects parameter adjustments. MusE was audibly affected when run without real-time priority: Timing dragged during especially heavy MIDI data streams, and playback skipped when I switched workspaces in the X. Ecasound was the most severely affected, even with the real-time priority flag set, probably due to the nature of the test (CD-quality stereo recording). The program reported multiple under-runs through the ALSA PCM device (/dev/pcmC0D0), and the resulting WAV file was ruined by sixteen dropouts in a three-minute recording. As you have already learned, none of these problems occurred when running the same tests under the kernel patched for low latency on a machine with tuned drives.

Status of the low-latency patches and Linux kernel development

On-line Resources

Benno Senoner's Web sites are at gardena.net and linuxdj.com

Andrew Morton's patches

Ingo Molnar's patches

Paul Winkler's low-latency mini-HOWTO

Cakewalk's very interesting NAMM presentation

D. Glen Cardenas's in-depth research on SCSI vs. IDE disks

An excellent collection of technical and explanatory material on low latency can be found at John Littler's M-station

Information about the Linux Audio Development group can be found on the LAD Home Page

Most of the software mentioned in this article can be found linked on the Linux Sound & Music Applications Pages

In June 2000, a semi-official "expected features" list was posted to the Linux kernel mailing list in preparation for the 2.5/2.6 kernel tracks. The Linux audio community was dismayed to see that the status of the ALSA drivers had dropped to "wish" instead of "needed," while Ingo's low-latency patches were not even mentioned.

A short time later, a message was drafted by the members of the Linux Audio Development group and sent to Linus Torvalds and the kernel development group. The message explained the desirability of low latency in audio and multimedia applications and the importance of Ingo's patches.

The resulting "tempest in a teapot" eventually settled on a few critical points: First, Linus rejected Ingo's patches because of their inelegance (an opinion shared by Ingo himself) and their effect on kernel maintenance, not because he necessarily believed that optimizing Linux for multimedia is a bad idea. Next, recommending RTLinux (a "hard real-time" kernel especially suited for industrial and embedded applications) was itself an inelegant solution to the problems facing Linux audio and MIDI developers. Finally, work began on cleaning up Ingo's patches, and the results from Andrew Morton bode well for eventual inclusion into the kernel.

Of course, nothing stops you from applying one of the available patches yourself, just as you are free to replace the kernel sound modules with the ALSA drivers (which are now likely to be included in the 2.5 kernel development cycle). I urge you, if you're interested in this, to try installing one of the supported kernels and testing the appropriate low-latency patch. And I only ask that when you've been duly impressed, maybe you could drop a little hint to Linus and the kernel development team, just to let them know how much better all your audio and video applications ran with the low-latency patch to the kernel of their otherwise most excellent operating system.

Acknowledgments

The author would like to thank Benno Senoner and Andrew Morton for their extensive assistance in the preparation of this article. They also deserve, along with Ingo Molnar and Roger Larsson, great thanks for their work to make Linux an exceptionally capable multimedia platform.

Dave Phillips maintains the Linux Music & Sound Applications Web site and has been a performing musician for more than 30 years.


Also this week:

LAMP Lighter: The Apache Toolbox

Basic Installation of PHP on a Unix System

Exploring the /proc/net/ Directory

BSD Tricks: Linux Compatibility, the Hard Way


Discuss this article in the O'Reilly Network Linux Forum.

Return to the Linux DevCenter.

 

Copyright © 2009 O'Reilly Media, Inc.