O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


Demystifying LDAP

by Brian K. Jones
07/27/2006

If you've been struggling to understand what LDAP is and how it can be useful to you without picking up a 1,000-page tome, look no further. LDAP is great for some problems, pretty good for some others, and completely inappropriate for yet another batch of problems. In this first part of a series on understanding just what LDAP is, I hope I can help make LDAP easier to deal with by explaining, in English, what LDAP is and what it is good at. After that, looking at the data and writing code will be much easier.

Definition and Components

LDAP stands for Lightweight Directory Access Protocol, which is to say that, by definition, LDAP is a protocol, and nothing else. However, the protocol exists to perform operations on data, and is really pretty useless without it. This brings up the components that make up an LDAP deployment: client software used to send LDAP requests, the server daemon that handles incoming LDAP requests, and the back-end data store. I will refer to the last two collectively as a "directory service."

Back-end Data Storage

Of these components, the back-end data storage mechanism is the least relevant to you unless you're administering a production LDAP deployment. Developers writing code that accesses an LDAP server and end users who access a directory service via some client utility should be happy to let the protocol do the job of getting data to them without knowing anything about the back end. Adding, removing, updating, deleting, and fetching data from a directory service occurs through the LDAP protocol.

LDAP Server Daemon

The LDAP server daemon handles incoming requests for data. It uses port 389 for unencrypted and TLS-encrypted requests, and port 636 for SSL requests by default. In every LDAP product I've ever seen, open source or not, the server daemon is also responsible for determining who can perform what operations on which data, which clients can connect, and other security measures. In other words, the data is the castle, and the server daemon is the gatekeeper and the moat.

In addition to these duties, the server daemon also keeps a wealth of logging and accounting information. For example, most daemons will record the user who committed the last change to a piece of data and the date of that change.

On top of all of this, some daemons even store their configuration in the back-end data store, meaning that you can configure and manage them almost completely using LDAP itself!

The good news, if you need to deploy an LDAP service, is that the server daemon and back end almost always come in a single package from software vendors and open source projects alike. The more popular open source LDAP servers are OpenLDAP and Fedora Directory Server (FDS). On the commercial side, products include Novell's eDirectory, Sun's Java System Directory, and solutions from IBM, Computer Associates, and other heavyweights in the software arena.

What Is LDAP Used For?

An LDAP directory service stores information for use by systems as well as end users (and their various applications). Probably the most common use of LDAP is for replacing either flat-file authentication (think /etc/passwd) or legacy networked authentication (think NIS). The benefit of any networked authentication mechanism over a flat file system is clearly that it lifts the burden of having to keep files on all of your systems in sync. The benefit of LDAP over, say, NIS is (among other things) a finer-grained control over the data and how it is accessed (and by whom). You can also make encrypted connections to LDAP servers using TLS or SSL, and you never have to muck with flat file "maps" or complicated Makefiles to change the data.

Because LDAP is a transaction-based system, operations that complete successfully are immediately "live." Modern Unix-based systems (including Linux, BSD, and OS X) can rely on LDAP to get just about any information they could store in flat files or NIS, including hosts, automounter configuration, users, groups, and more. Add to that the ability to have Samba, Apache, PAM, tcpwrappers, Sendmail, and other applications talk to LDAP for authentication, aliases, and other tidbits of useful information, and you have the beginnings of a very well-integrated, easily maintained, authoritative data source for your entire infrastructure.

LDAP is also popular for use as a "white pages" directory for a department or corporation. For example, most email applications, from Mutt and Pine to Outlook, Evolution, and KMail all know how to talk to an LDAP server. This makes it very easy to, for example, tell KMail to autocomplete addresses as you type using an LDAP directory as its addressbook source instead of (or in addition to) local files.

Why Yet Another Data Store? Why Not Just Use a Database?

Why not just use a relational database? There are multiple reasons to consider LDAP instead of a relational database for certain subsets of your data. One main issue here is standardization: LDAP is a standard protocol, with published, standardized schemas available to manage data in LDAP directory services.

For example, if I were to write an authentication module for a corporate website, LDAP does away with the need to schedule a meeting with a database guru to have him explain the layout of the user database. Instead, I can just ask the LDAP administrator for a link to the published schema he uses to store user information, and code for that schema. Even better, the schema is likely to be widely used (because it is published), so there might already be some code out there that does exactly what I need, which means less time coding, and a quicker delivery time for my project.

The benefits to an open source project are even greater. When you write, for example, a portal application that you want everyone in the world to use, you have the design option of requiring users to register, storing the resulting user information in a database. Why not simply allow users to point the application at a directory server for the user information? Decent models are available from Moodle and Horde.

By using a standard schema for storing user data, you don't have to ask the user, "What does your data look like?" Instead, just have the user type in the location of the LDAP server, and the name of the subtree to search under for user information, and you're practically finished! What makes it even easier is that you don't have much of a data access API to write, and most programming languages (including Perl, PHP, Python, Ruby, and plenty of others) already have modules for talking to LDAP servers no matter what brand of LDAP server you choose. Further, because LDAP is a standard protocol, you can use one interface to talk to just about any LDAP server product on the market. No goofy database abstraction layers required!

A Closer Look at LDAP Data

It's extremely important when learning about LDAP and how it deals with data to separate the structure (or topology) of the data from the definitions of the objects themselves.

Simply, the structure of LDAP data is a hierarchical collection of objects. Objects can represent anything from people to printers and take their places within the hierarchy using whatever logic you like.

Objects?

Yes, objects. Each object has a list of attributes associated with it that describe that particular object. When you add or delete an object, make a request for an object, or change the value of an object's attribute, you do so solely using the LDAP protocol. In short, LDAP exists to manipulate or fetch data about objects.

Hierarchical?

The layout of the data in an LDAP directory is the Directory Information Tree (DIT). You can customize it to the needs of your organization, but it's still a hierarchical tree structure. This tree is not dissimilar to a typical filesystem; there's a "top" or "root" directory, under which are high-level objects (directories in a filesystem). Those help you to categorize the lower level objects that you're really interested in (in a filesystem, these are the files themselves).

Suppose you want to store information about people using a hierarchical collection of objects. Viewing things as a filesystem, you could create a /People directory, and under that, create a file--/People/jonesy. That file contains attribute name and value pairs to describe "jonesy." One attribute might be "firstname," with a value of "Brian." Save the file, and create a new one for each person. Eventually, you have a filesystem that looks something like:

/People
    jonesy
    mary
    tom
    jane

Of course, your organization might have slightly different needs. Because the structure of the DIT is completely up to you, you might choose something more like:

/People
    /Engineering
        jonesy
        mary
    /Accounting
        tom
        jane
    /IT
        fred
        mark
    /Groups
        mygroup
        yourgroup

Notice that, of course, I can have more than one high-level object, representing a different type of object I want to store data about. In this case, I've stored information about People and Groups. I can also have as many subtrees as I wish, arranged in any order I want.

A Peek at a Person

Just to whet your appetite, here's a quick look at a request for a "person" object. It's actually a fake account entry from our departmental LDAP server that I've changed to look like a typical "person" object that systems use to authenticate users. The request comes from a Linux workstation running the OpenLDAP client utility ldapsearch.

$ ldapsearch -x '(uid=ajonesy)'

dn: cn=ajonesy,ou=People,dc=cs,dc=princeton,dc=edu
    objectClass: top
    objectClass: person
    objectClass: organizationalPerson
    objectClass: inetOrgPerson
    objectClass: posixAccount
    objectClass: inetLocalMailRecipient
    objectClass: shadowAccount
    objectClass: jabberUser
    cn: ajonesy
    uid: ajonesy
    uidNumber: 30406
    homeDirectory: /u/ajonesy
    sn: Jones
    givenName: Brian
    roomNumber: 101B
    displayName: Brian Jones
    facsimileTelephoneNumber: 1113
    gecos: Brian K. Jones, GUEST, 6080
    mail: ajonesy@cs.princeton.edu
    telephoneNumber: 6080
    loginShell: /bin/bash
    gidNumber: 931
    description: GUEST
    labeledUri: http://www.cs.princeton.edu/~ajonesy

The output format is LDAP Data Interchange Format (LDIF). As you can see, LDIF is a way to make pointy-haired bosses recoil in fear at what amounts to key-value pairs. The very first line in the entry is the dn (distinguished name). The value of that attribute uniquely identifies this object among all of the objects in the directory. After that are a list of objectclass attributes, each having a different value. What those values point to are objectclass definitions which exist in human-readable schemas that the server daemon has read. The definitions of the objectclasses consist mainly of a list of attribute definitions, followed by a statement saying "if an object is of this objectclass, it MUST have these attributes, and it MAY also have these attributes."

Schemas get only slightly more complicated than that if you want to create your own objects or entire schemas. For most, just understanding that using an objectclass implies the use of certain attributes will do just fine.

Lest you think that all of this data must necessarily come at you in the form of a whole object and nothing else, there are a couple of quick notes to make. First, notice that there's no userPassword attribute in the above output. That isn't because I've sanitized the output, it's because the user account used to perform the search that returned this object doesn't have sufficient rights to see this user's password. Second, I could alter my search command to return, say, only ajonesy's roomNumber attribute:

$ ldapsearch -x '(uid=ajonesy)' roomNumber
dn: cn=ajonesy,ou=People,dc=cs,dc=princeton,dc=edu
roomNumber: 101B

You can see how this is likely to dramatically reduce the amount of time needed to retrieve, say, the room numbers for the 3,000 people in your Los Angeles headquarters.

Stay Tuned!

This article has just scratched the surface of LDAP, but hopefully by now you're better able to visualize your directory service and its data. The next part will take a closer look at the definition of objects that live in the directory themselves, and how to create and access objects and their attributes with code.

Brian K. Jones is an infrastructure architect, and system/network/database administrator, and co-author of Linux Server Hacks, Volume Two .


Return to O'Reilly SysAdmin

Copyright © 2009 O'Reilly Media, Inc.