Recent advances in Linux's threading implementation are expected to continue to ease migration from other Unix-like operating systems. These advancements have arrived with intense activity on two fronts. First, thread-handling improvements have greatly enhanced the kernel's scalability even to thousands of threads. Second, there are now two fresh, competing implementations of the POSIX pthreads standard (NGPT and NPTL) set to replace the aging LinuxThreads library.
In typical open source fashion, only time will tell exactly who wins out in which arena. However, both new library implementations should be API-compatible with the standard, so the choice will depend on performance and stability. The required changes will appear in the upcoming Linux 2.6 (or 3.0) kernel and can already be tested in late development versions.
Threading implementations typically have components in both user and kernel space. It is possible to do everything from the one side or the other, but each approach has problems. With everything on the user side, all related threads are part of one single process (which can only run on one CPU at a time), and multi-processor systems are underutilized. With everything on the kernel side, the kernel scheduler must bear a heavy load.
Approaches have ranged from the 1:1 pure kernel thread model in which each user thread has its own kernel thread, to the M:1 model in which the kernel sees only one normal process, with an arbitrary number of threads with which to schedule in user space. The M:N model falls in between, associating M user threads with each of N kernel threads.
The Linux kernel uses the
clone() function to create new
processes. Flags control parent/child resource sharing, where resources range
from everything (memory, signal handlers, file descriptors, etc.) to nothing.
While the usual
fork() inherits resources from the parent, it may
share nothing. Copy-on-write techniques ensure each process gets its own copy
as soon as either one tries to modify a shared resource.
Programs can call the
clone() function as a system call, using
it directly to produce multithreaded programs. However, it is completely
Linux-specific and non-portable. Since there is no external standard, there is
no guarantee that its interface will be stable. Threading library
implementations do in fact use the
clone() system call, and it is
the job of library maintainers to keep up with kernel changes.
The LinuxThreads implementation of the POSIX threads standard (pthreads),
originally written by Xavier Leroy, has been the dominant one for years and is now
incorporated and maintained in
glibc. It has two problems on Linux:
Virtually all compliance problems can be traced to the decision to use
lightweight processes (LWPs, or the 1:1 model described above) as the basis of
the implementation. New processes are created by
clone(), with a
shared-everything approach. While the new process is lighter due to the
sharing, fundamentally it is still a process in its own right, with its own
process identifier (pid), process descriptor, etc.
This led to the following standards compatibility problems:
psshows all threads in a process.
getpid()returns a different result for each thread.
If a pthreads application were written for Linux, one could expect easy portability. However, the inverse process, porting to Linux, was more difficult and slowed Linux deployment since important applications were now broken.
Some problems were resolvable by relatively minor kernel adjustments. For
example, by modifying the basic data structure describing each process,
struct task_struct) to store a thread group identifier and some
other bookkeeping, and then modifying the
getpid() system call to
return this identifier rather than the process identifier, one problem could be
However, many key kernel developers resisted attempts to modify the kernel for compliance sake. On one hand, their taste runs to technically superior solutions rather than to "cut the toes to fit the shoes" to comply with standards. On the other hand, there was an aversion to creating many threads. Sentiments like "there is no need to create more threads than there are processors" were common. Thread-prolific languages such as Java were looked at with contempt for many reasons.
The main performance problem has been scalability with growing numbers of threads. These difficulties are not unique to threads, but apply in all cases where the number of processes grows large.
Consider the process of obtaining a new
pid. In the 2.4
kernel, Linux has to loop through all processes to ensure a candidate
pid is not already assigned. With an outer loop on possible
candidates, the time spent may scale quadratically; if there are thousands of
processes, the system can slow down to a crawl. Since each thread has its own
pid, creating zillions of threads is poisonous.
A group at IBM and Intel, led by Bill Abt at IBM, released the first version of the New Generation POSIX Threads (NGPT) library in May 2001. This consisted of a drop-in replacement for LinuxThreads, together with patches for kernels beginning with 2.4.0.
To ease acceptance, the group made a conscious effort to keep the impact on the kernel small. They worked to get the kernel modifications they needed through patient, piece-by-piece promotion and expected to have NGPT eventually replace LinuxThreads in the glibc system.
NGPT is a derivative of the GNU Pth (GNU Portable Threads) package, which up to now is based on an M:1 model. A user space priority and event-based, non-preemptive scheduler manages the M user threads. This was seen as an improvement over the 1:1 pure kernel thread model used by LinuxThreads where the kernel has to do a lot of scheduling work.
NGPT adopted the M:N hybrid model. Many developers saw this as the best path to good performance: keep all CPU's humming, minimize context switching between kernel threads, and switch mostly between user space threads. However, the M:N model is complex. It requires two cooperating schedulers, one each in user and kernel space. Signal handling is difficult and much work has to be done in user space. It takes fancy footwork to prevent one blocked thread from blocking other threads running in the same process.
Nonetheless, the NGPT team succeeded in implementing the full pthreads standard, and the kernel changes they needed were accepted in the mainline kernel early in the 2.5 development process (at kernel 2.5.4). They were also back-ported into the 2.4.19 kernel. Depending on the metric used, performance gains were claimed of up to 100 percent, and work continues on improvements.
On March 26-27, 2002, Compaq hosted a meeting to discuss the future replacement for the LinuxThreads library. In attendance were members of the NGPT team, some employees of (then distinct) Compaq and Hewlett-Packard, and representatives of the glibc team, including the head maintainer, Ulrich Drepper (a Red Hat employee), who wrote a summary of the meeting.
Pursuing the M:N approach, the report said:
"This is one of the reasons why it is absolutely necessary to think about two-level scheduling for the threads. I.e., the actual user threads are different from the kernel threads (or light-weighted process, or what ever one wants to call them) and scheduled separately. This is generally called the M-on-N model for a thread implementation. The 1-on-1 model dedicates a unique kernel thread for each user-level thread; this is the model used by the current, inadequate thread library implementation."
The report contains detailed analysis of how to get kernel and user-space schedulers to cooperate using the scheduler activations method.
It seemed the replacement for LinuxThreads would be based on NGPT.
On September 19, 2002, Ulrich Drepper and Ingo Molnar (also of Red Hat) released an alternative to NGPT called the Native POSIX Thread Library (NPTL). The project included a new user space library, changes to glibc, and kernel modifications. The initial announcement said in part:
"Unless major flaws in the design are found this code is intended to become the standard POSIX thread library on Linux system, and it will be included in the GNU C library distribution."
NPTL is based squarely on the 1:1 pure kernel thread model. A white paper explains why in detail.
Recent changes to kernel thread handling (mostly due to Ingo Molnar) had vastly improved thread performance. With these changes in place, the relative simplicity of the 1:1 model became very attractive.
There is only one scheduler. Signal handling remains in the kernel's hands. Blocking problems are handled naturally because each kernel thread schedules independently. In addition, the user space implementation becomes fundamentally simpler.
In some sense, one has come full circle; developers who wanted to ensure full Posix compliance were frustrated by the kernel maintainers' unwillingness to adapt the Linux kernel to fit their needs, and thus NGPT was developed in part as a polite end run requiring minimal kernel changes. Then a programming tour de force, mostly by one key kernel programmer, is now claimed to enable reversion to a much simpler approach.
What changes have been made in the Linux kernel to make threads perform and scale better?
Consider the previous example of obtaining a new
potentially quadratic search is gone. Instead, the kernel sets aside a small
but dynamic number of memory pages as bitmaps for process identifiers.
Obtaining a new
pid means finding a page with free entries and
then finding and clearing the first set bit. No locking is required, and the
search time is very short and almost independent of the number of running
Another key improvement is the
O(1) scheduler, which no longer
has to cycle through all processes to find the most deserving one. Each CPU has
its own queue, a simple priority-sorted bitmask. Once again finding a new
process is very fast and scales fantastically.
The NPTL team posted some benchmarks, such as this display of the minimum time needed to create a number of top-level threads.
In general, while NGPT beat the old methods by a factor of two, NPTL could do better by another factor of two.
It remains to be seen exactly how the two implementations will rank against each other. NGPT may not yet be tuned to take advantage of recent kernel improvements the way NPTL has. Furthermore, benchmarks are often used to misrepresent. It will take further development by both teams, independent benchmarks, and real life comparisons to see who really beats whom.
You can test drive NGPT by simply downloading the library and installing it, as long as you have kernel 2.4.19 or later. For NPTL,
you can download the library, but you will need a very recent development
kernel as well as bleeding edge
gcc. The announcement contains
While there may be some hard feelings on the socio-political side about how NPTL seemed to come out of the blue, the maintainers of NGTP have not griped in public. It seems that any battle between the two implementations will now be played out in public, in good open source fashion. Either one library will win out over the other, or each will become the preferred tool in some universe for some load. At any rate it will be fun to see what happens. Linux will benefit by having a standards-compliant, and well-performing threads implementation(s).
Jerry Cooperstein is a senior consultant and Linux training specialist at Axian Inc., in Beaverton Oregon, and lives in Corvallis, Oregon.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.