ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Managing Component Dependencies Using ClassLoaders

by Don Schwarz
04/13/2005

Java's class loading mechanism is incredibly powerful. It allows you to leverage external third-party components without the need for header files or static linking. You simply drop the JAR files for the components into a directory and arrange for them to be added to your classpath. Run-time references are all resolved dynamically. But what happens when these third-party components have their own dependencies? Generally, it is left up to each developer to determine the full set of required components, acquire the correct version of each, and ensure that they are all added to the classpath properly.

JAR Manifest Files

But it doesn't have to be like this; Java's class loading mechanism allows for more elegant solutions to this problem. One such solution is for each component's authors to specify the dependencies of their component inside of its JAR manifest. A manifest is a text file (META-INF/MANIFEST.MF) that can be included inside of a JAR to specify metadata about the file. The most popular attribute, Main-Class, specifies a main class that java -jar can use to locate which class to invoke. However, there is a related, but much less well-known, attribute called Class-Path that lets a JAR specify that it has dependencies on other JARs. Java's default ClassLoader knows to check for these attributes and to automatically append the specified dependencies to its internal classpath.

Let's look at an example. Consider a Java application that implements a traffic simulation. This application is composed of three individual JARs:

simulator-ui.jar depends upon simulator.jar, which in turn depends upon rule-engine.jar.

The naive way to execute this application is:

$ java -classpath
   simulator-ui.jar:simulator.jar:rule-engine.jar
   com.oreilly.simulator.ui.Main

Editor's note: the above command should be entered on one line; it has been wrapped to fit the constraints of our web layout.

But we could also specify this information in JAR manifest files. simulator-ui's MANIFEST.MF file looks like this:

Main-Class: com.oreilly.simulator.ui.Main
Class-Path: simulator.jar

While simulator's MANIFEST.MF simply contains:

Class-Path: rule-engine.jar
rule-engine either does not have a manifest, or it is empty.

Now we can just do:

$ java -jar simulator-ui.jar

Java will automatically parse the manifest entries to extract the main class and modify the classpath accordingly. It will even determine the path of simulator-ui.jar and interpret all Class-Path attributes relative to this path, so we could just as easily have done one of the following:

$ java -jar ../simulator-ui.jar
$ java -jar /home/don/build/simulator-ui.jar

Dependency Conflicts

Java's implementation of the Class-Path attribute presents a big improvement over specifying the entire classpath manually. However, both approaches have some important limitations. One of the biggest limitations, which may not have even crossed your mind, is that you can only load one version of each component. This may seem obvious because most programming environments have this limitation. However, it is not uncommon for large multi-JAR projects with many third-party dependencies to encounter conflicts in those dependencies.

For example, let's say that you're developing a meta-search engine that queries multiple search engines and collates the results. Google and Amazon's Alexa both support web services APIs that use SOAP as a communication mechanism, and both provide Java libraries that can be used to conveniently access these APIs. This is a bit contrived, but for the sake of argument, let's assume that your JAR, metasearch.jar, depends upon google.jar and amazon.jar, each of which depend upon a common soap.jar.

This is fine for now, but what happens in the future when the SOAP protocol or API changes in some way? It's quite likely that these two search engines will not choose to upgrade at exactly the same time. There may come a day when accessing Amazon requires SOAP v1.x and accessing Google requires SOAP v2.x, and the two versions of SOAP were not designed to co-exist in the same process. In this case, we might have the following JAR dependencies specified:

$ cat metasearch/META-INF/MANIFEST.MF
Main-Class: com.onjava.metasearch.Main
Class-Path: google.jar amazon.jar

$ cat amazon/META-INF/MANIFEST.MF
Class-Path: soap-v1.jar

$ cat google/META-INF/MANIFEST.MF
Class-Path: soap-v2.jar

This captures the dependencies correctly, but there's no magic here--this won't do what we want. If soap-v1.jar and soap-v2.jar define many of the same classes, we're almost certainly going to have problems.

$ java -jar metasearch.jar
SOAP v1: remotely invoking searchAmazon
SOAP v1: remotely invoking searchGoogle

As you can see, soap-v1.jar was added to the classpath first, so it is used in both cases. Just as in the previous example, this is equivalent to:

$ java -classpath
   metasearch.jar:amazon.jar:google.jar:soap-v1.jar:soap-v2.jar
   # WRONG!

Editor's note: the above command should be entered on one line; it has been wrapped to fit the constraints of our web layout.

It's interesting to note that Yahoo has also released a web services API, and they do not seem to have introduced a dependency on an existing SOAP/XML-RPC library. On smaller projects, conflicting component dependencies are often cited as a reason not to use a full-scale component (such as a collections library) when you can get by with a small, hand-rolled solution or with including just the one or two classes needed. Hand-rolled solutions have their place, but it is almost always better to use a real component if one is available. And copying other components' classes into your own codebase is never a good idea; in effect, you've just forked the development of that component and no one is ever going to merge in bug fixes or security updates.

Many larger projects, primarily commercial components, have even adopted the disturbing practice of consuming entire components and bundling them inside of their own JAR. To do this, they mangle the package name to make it unique (e.g., com/acme/foobar/org/freeware/utility) and include the classes directly in their JAR. This has the advantage of preventing any clashes between multiple versions of these component JARs, but at considerable cost. Doing this completely hides the third-party dependencies from the developers. If this process became widespread, it would lead to extreme inefficiencies (both in terms of the size of JAR files and the inefficiency of loading multiple versions of each JAR into one process). The problem with this approach is that if two components depend on the same version of a third component (or can be made to do so), there is no central mediator which can determine this and ensure that the shared component is only loaded once. This is something that we'll be investigating in the next section. In addition to any inefficiencies, it is quite likely that your ability to legally bundle third-party software with your own project may be restricted by the license under which that software is released.

Another approach to this problem is for each component's developers to encode a version number explicitly in your package name. Sun's javac code takes this approach--there is a com.sun.tools.javac.Main class that simply forwards calls on to com.sun.tools.javac.v8.Main. Each time a new Java version is released, the package of this code changes. This allows multiple releases of a component to live in a single class loader and it makes the choice of version explicit; however, this is not a very good solution, overall. Either clients need to know exactly what version they plan to use and must change their code to switch to a new version, or they must rely on wrapper classes that forward method calls to the latest version (in which case, these wrapper classes suffer from the same problems that we highlighted above).

Java in a Nutshell

Related Reading

Java in a Nutshell
By David Flanagan

Loading Multiple Releases

The problem that we're facing here is that in most projects, there is a single global namespace into which all classes are loaded. What if, instead, each component had its own namespace and it could load all of its dependent components into this namespace without affecting the rest of the process? We can actually do this in Java! Class names do not need to be unique--only the combination of class names and their defining ClassLoader must to be unique. This means that each ClassLoader acts like a namespace, and that if we can load each component with its own ClassLoader, it will have full control over how its dependencies are satisfied. It can delegate class lookups to another ClassLoader that contains only the specific version of each of its dependent components. For example, see Figure 1.

Decentralized Class Loaders
Figure 1. Decentralized class loaders

However, this architecture is not much better than the approach of bundling each dependent JAR with your own. What we need is a central authority that can ensure that each component version is only loaded by a single class loader. The architecture in Figure 2 will ensure that each component version is only loaded once.

Class Loaders with Mediator
Figure 2. Class loaders with mediator

To implement this, we'll need to create two different kinds of class loaders. Each ComponentClassLoader will extend Java's URLClassLoader to provide the logic needed to extract .class files from a single JAR. However, it will also perform two other tasks. When created, it will retrieve the JAR manifest and look for a new attribute, Restricted-Class-Path. Unlike Sun's Class-Path attribute, this one implies that the specified JARs should be available only to this component and no others.

public class ComponentClassLoader extends URLClassLoader {
  // ...

  public ComponentClassLoader (MasterClassLoader master, File file)
  {
    // ...
    JarFile jar = new JarFile(file);
    Manifest man = jar.getManifest();
    Attributes attr = man.getMainAttributes();

    List l = new ArrayList();
    String str = attr.getValue("Restricted-Class-Path");
    if (str != null) {
        StringTokenizer tok = new StringTokenizer(str);
        while (tok.hasMoreTokens()) {
            l.add(new File(file.getParentFile(),
                           tok.nextToken());
        }
    }

    this.dependencies = l;
  }

  public Class loadClass (String name, boolean resolve)
    throws ClassNotFoundException
  {
    try {
      // Try to load the class from our JAR.
      return loadClassForComponent(name, resolve);
    } catch (ClassNotFoundException ex) {}

    // Couldn't find it -- let the master look for it
    // in another components.
    return master.loadClassForComponent(name,
                           resolve, dependencies);
  }
    
  public Class loadClassForComponent (String name,
                                   boolean resolve)
    throws ClassNotFoundException
  {
    Class c = findLoadedClass(name);
    
    // Even if findLoadedClass returns a real class,
    // we might simply be its initiating ClassLoader.
    // Only return it if we're actually its defining
    // ClassLoader (as determined by Class.getClassLoader).
    //
    if (c == null || c.getClassLoader() != this) {
        c = findClass(name);
    
        if (resolve) {
            resolveClass(c);
        }
    }
    return c;
  }
}

When a request is made to load a class that does not exist in the specified JAR, rather than simply forwarding on to the parent class loader, it will explicitly call the MasterClassLoader and pass in its list of JAR dependencies. The MasterClassLoader then forwards the request on to the ComponentClassLoader for each of the specified dependencies.

public class MasterClassLoader extends ClassLoader {
  // ...

  public Class loadClassForComponent (String name,
                      boolean resolve, List files)
    throws ClassNotFoundException
  {
    try {
      return loadClass(name, resolve);
    } catch (ClassNotFoundException ex) {}

    for (Iterator i = files.iterator(); i.hasNext(); ) {
      File f = (File)i.next();

      try {
        ComponentClassLoader ccl =
            getComponentClassLoader(f);
        return ccl.loadClassForComponent(name, resolve);
      } catch (Exception ex) {
        // simplified for clarity
      }
    }

    throw new ClassNotFoundException(name);
  }
}

This approach has a number of beneficial properties. The most important is that we can now satisfy that original dependency diagram with no coding changes needed to any of the components (in theory--see the caveats given below). This decreases the coupling of the components, since each can depend on whatever version of the component that it desires, without forcing other components to upgrade or downgrade to match it.

Another advantage of this technique is increased transparency. Each component's runtime dependencies are listed explicitly, and they are enforced. Even when using the Class-Path manifest attribute, you can never be quite sure that you haven't missed a dependency that is fulfilled accidentally. Consider the case where your component uses the commons-log component, which in turn uses log4j to do logging. You may have another component that depends upon log4j but does not specify it as a dependency. Because it is already added to the classpath, you wouldn't detect this, and if it came time to replace log4j with a competitor, you'd have a problem. Instead, by using Restricted-Class-Path if you didn't list log4j as a dependency, you'd get a ClassNotFoundException.

Overriding the System Class Loader

Now that we have a class loader capable of implementing our new versioning policy, we need to have some way to install it. If our code was going to be embedded in an application server, or some other kind of shell, that shell code could create the new class loader programmatically and use it to load our code. This way, a single server process could be used to execute multiple versions of our code, by specifying the desired version in a field of the request. But what if we just want to use this with an ordinary Java application?

An ideal way to do this would be with the -javaagent command-line argument added in Java 1.5. This would let us tell Java to initialize a specific JAR (called an agent) before loading the main class of our application. Unfortunately, agent classes are loaded by the same class loader that loads your main class (the system class loader), so it's already too late to install our custom class loader when our agent's premain method is executed.

Another approach is to create a "bootstrap" main class that simply sets up the class loader and uses it to locate our real main class and invoke its main method. This approach is very simple, but removes some of the elegance of using Java's -classpath and -jar options and requires that we invoke the main method ourselves.

Instead, we will override the java.system.class.loader system property so that our class loader is initialized as the system class loader. To do this, we'll create a third class loader, WrapperClassLoader, to serve as our replacement for the system class loader. Its parent will be the bootstrap class loader, that will contain the Java Runtime Library (rt.jar) as well as our classloader.jar. When initialized, it will read the java.library.path system property and create a ComponentClassLoader for each JAR specified.

public static List initClassLoaders (MasterClassLoader master)
  throws MalformedURLException, IOException
{
  List loaders = new ArrayList();

  String classpath =
                System.getProperty("java.class.path");

  StringTokenizer tok = new StringTokenizer(classpath,
                                  File.pathSeparator);

  while (tok.hasMoreTokens()) {
    File file = new File(tok.nextToken());
    loaders.add(master.getComponentClassLoader(file));
  }

  return loaders;
}

We can now run our meta-search engine like this:

$ java -Xbootclasspath/a:classloader.jar \
    -Djava.system.class.loader=
        com.onjava.classloader.WrapperClassLoader \
    -jar metasearch.jar
SOAP v1: remotely invoking searchAmazon
SOAP v2: remotely invoking searchGoogle (with newFlag = true)

Conclusion

In this final version, we actually went a few steps beyond the original requirements. Instead of embedding the version number for the SOAP component in a static field, we're now extracting it from a properties file. This means that resource loading through our class loaders is supported, and must contain logic very similar to actual class loading. We also changed the API a bit in soap-v2.jar, from

 public Object invokeMethod (String name, Object[] args)

to:
 public Object invokeMethod (String name, Object[] args,
                              boolean newFlag)

It may seem strange, but this means that if we put the source code for what we just ran into a single directory, we couldn't compile it together! If we tried to build both google and amazon with the same version of soap.jar, the method signatures of one would not match. If we tried to build with both versions of soap.jar, we would get duplicate class errors. However, we can compile google.jar and amazon.jar separately--without any thought to whether they are using compatible versions of soap.jar--and then we can run them in separate class loaders within the same process.

Think about it. If you paired this technique with a build tool such as Maven that manages component dependencies at build time, you might never run into missing dependencies or conflicting JARs again.

Resources

Don Schwarz is a Java developer for a large investment bank who specializes in metaprogramming and language integration.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.