ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Enhance Collection Performance with this Treasure Trove

by Dion Almaer
06/12/2002

Ever use the Collections API? How can you say no?! The Collections framework is probably one of the best API sets available to a Java programmer. Since many parts of our applications use a HashMap, ArrayList, or LinkedList at some point, enhancing the performance of these guys can do a lot for us.

Eric D. Friedman wrote a high performance set of collections called Trove. Trove allows you to plug in their versions of certain containers (HashMap, HashSet, LinkedList), and use them just like you did with the standard versions. There are also ways to utilize primitive collections to gain even more performance. Don't you love open source?

In this article, we will discuss:

Using Trove Classes

Trove is ridiculously simple to work with. You can simply download the code and set up your CLASSPATH to see the lib/trove.jar file.

To use the Trove classes, you need to do the following:

  1. Import the Trove classes:

    import gnu.trove.*;
  2. Change collection creation:

    // Map map = new HashMap(); // standard hashmap
    Map map = new THashMap();   // trove version of hashmap
  3. You can see the leading T in the Trove classes. This is there to prevent name collisions, and means you don't have to fully qualify all collections (e.g. java.util.HashMap vs. gnu.trove.HashMap).

How is Trove Faster?

Related Reading

Learning Java
By Patrick Niemeyer, Jonathan Knudsen

The THashMap class is a fair amount faster than java.util.HashMap. How is this so? The standard HashMap holds a Map.Entry object for each of its elements. Trove uses open addressing, and hence gets around this problem with chaining. You can still use Map.entrySet on a Trove Map, but it is a Bad Thing. This will cause Trove to create the Map.Entry objects that it tries to get away from, so if possible, don't use these operations. If you buy into Trove, you can go down a path that couples you tightly to the Trove collections. You can implement Trove specific APIs (forEachEntry, forEachKey, forEachValue, and transformValues) to potentially gain higher performance.

Hash tables have a size, and have to grow when the data set exceeds its current capacity. It has been shown that if the size is a prime number, the performance of the hash table increases. Trove makes sure that its capacity is always a prime number, which is something that the JDK doesn't do.

When I run a benchmark on my system, I get the following results for putting values into a hash table:

Comparing 100000 Map.put() operations
--------------------------------------------------------------
JDK   average (msec): 321
Trove average (msec): 93

Difference: Trove is 3.45 times faster!

How is Trove Smaller?

Another nice side effect of not containing the Map.Entry objects is that the resulting data structures take up a lot less space. This is particularly noticeable with the Set implementations, as they are not backed by Maps like the JDK.

Comparing size of Set implementation: 1,000 Integer's
--------------------------------------------------------------

JDK   size: 46216 bytes
Trove size: 28856 bytes

Trove's collection requires 62% of the memory needed by the JDK collection.

Using a Factory to Allow You to Switch Between Trove and JDK Collections

So, we can add Trove collections to our code fairly easily, but wouldn't it be nice to be able to "flick a switch" to have the program use Trove or JDK collections? Let's create a CollectionFactory to allow this to happen.

The trigger that lets the factory know what type of collection to create is a Java System property. If the user sets the property usetrove to true, the Trove classes will be used. If the property isn't set, or doesn't have the value of true, then the JDK classes will be created instead. The user configures this parameter in two ways:

How do we code a factory that gives us this functionality? Let's walk through it.

Collection Factory Code

The core of the code is a static {} area that looks for the System property (from the properties file or passed in). This check happens only once (hence, it is static), and sets the boolean value useTrove if usetrove is set to true:

public class CollectionFactory {
	static boolean useTrove = false;

	/**
	 *  Try to find the usetrove property on command line, or  
         *  in a trove.properties in the CLASSPATH
	 *  If usetrove is set to true, then set the useTrove 
         *  boolean, else leave it as false
	 */
	static {
	  try {
	    System.getProperties().load( 
      Thread.currentThread().
      getContextClassLoader().
      getResourceAsStream("trove.properties") );

	    useTrove = "true".equalsIgnoreCase( 
         ((String) System.getProperty("usetrove") ).trim() ) 
         ? true : false;
	  } catch (Exception e) {
	    // do nothing.  keep the default: false
	  }
	}

Now, for the base types, we simple check the useTrove boolean, and return the correct class:


        /**
	 *  Return a hashmap based on the properties
	 */
	public static Map getHashMap() {
		if ( useTrove ) return new THashMap();
		else            return new HashMap();
	}

	/**
	 *  Return a hashset based on the properties
	 */
	public static Set getHashSet() {
		if ( useTrove ) return new THashSet();
		else            return new HashSet();
	}

	/**
	 *  Return a linkedlist based on the properties
	 */
	public static List getLinkedList() {
		if ( useTrove ) return new TLinkedList();
		else            return new LinkedList();
	}

And there we have it, a clean way to get access to either a Trove or JDK class. To use this, you would simply do something like:

 Map map = CollectionFactory.getHashMap();
 map.put("name", "dion");
 System.out.println("name = " + map.get("name"));
 System.out.println("Class is: " + map.getClass());

If I put this in a program, I could run it in two ways to test that it is working OK (assuming no trove.properties file exists):

dion@frag [~]> java CollectionFactory
name = dion
Class is: class java.util.HashMap

--------------------------------------------------------------

dion@frag [~]> java -Dusetrove=true CollectionFactory
name = dion
Class is: class gnu.trove.THashMap

Troves Primitive Collections

So far, we are living in a world that is compatible between Trove and the JDK classes. If performance really becomes an issue (based on profiling the code!) we can jump into Trove's world and enhance the collections even more.

When we store primitives in our collections what do we end up doing? We have to wrap them in their Object wrappers (int -> Integer, float -> Float, etc.). What if we know that either the key or value (or both) should always be integers? If this is the case, we can use a special case collection from Trove.

Imagine if we wanted a map containing objects as keys and integers as values. We would use the following code:

TObjectIntHashMap intmap = new TObjectIntHashMap();
intmap.put("NumInStock", 2);
int numInStock = intmap.get("NumInStock");

Let's look at the performance, comparing a JDK map to the ObjectInt map that we see above:

Comparing TObjectIntHashMap to HashMap:
--------------------------------------------------------------
JDK   average (msec): 502
Trove average (msec): 183

Difference: Trove is 2.74 times faster!

If you compare a HashMap that takes two primitives (e.g., key = int, value = int) then you see a really large performance improvement:

Comparing TIntIntHashMap to HashMap:
--------------------------------------------------------------
JDK   average (msec): 502
Trove average (msec): 67

Difference: Trove is 7.5 times faster!

Another side effect of using these primitive collections is that it enforces rules on that collection. HashMaps work with generic java.lang.Object, so people can put in whatever they want. The TintIntHashMap is going to give you a compiler warning if you try to put something naughty in it.

Note: In future versions of Java we will see "Generics" (kind of like C++ templates), where we will be able to create collections that are tied to a given type, and we won't have to do the nasty casting of objects when we take them out, and we will have checking of values that we put in.

Conclusion

If we need to increase performance with our collections, GNU Trove offers a simple solution. We have seen how we can simply use Trove's classes without even really know they are being used. We have also seen how we can use Trove's primitive collections to get another performance boost if really needed. Check it out!

Dion Almaer is a Principal Technologist for The Middleware Company, and Chief Architect of TheServerSide.Com J2EE Community.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.