ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Analyze Your Classes

by Vikram Goyal
10/22/2003

Most of us never need to go beyond the basics of coding and compiling our classes. The Java Virtual Machine (JVM) is a highly efficient engine that executes our classes and for the most part, we are happy with the way it runs. However, to extend and enhance the JVM to improve runtime performance, among other things, we need to take a deeper look inside this engine and the structure of the class files that it loads and executes. The Byte Code Engineering Library (BCEL) from the Apache-Jakarta stable helps the average developers by analyzing and manipulating the structure of class files.

This article gives an introduction to this API. I will start with an introduction to the Java class file format, which is important to understand in order to manipulate it. I will follow it up with the basics of BCEL, the BCEL Application Programming Interface (API), and some examples of how to use it. Finally, I will round the article off with pointers to where you can get more information and help.

Java Class File Format

When the Java compiler compiles your source code, it creates a machine- and operating-system- independent byte code that gets stored in a class file. This file is binary and contains the instruction set and data of your class for the JVM to execute. This file accurately defines your class to the JVM, according to Java class file format. Basically this means that all class files have a predefined structure.

Each class file contains the definition of a single class or interface. This file, as I said earlier, consists of binary data represented as a stream of 8-bit bytes. This means that if you have a data type that is 16-bit, 32-bit, or 64-bit, it will be read in chunks of 2, 4, or 8 consecutive 8-bit bytes, respectively.

Each class file, to be a valid Java class file, must contain the following elements that completely describe your class and form the right Java class file structure. The list below enumerates these elements and their valid data types. Following this list is a description of each of these elements.

  1. Magic Number: Unsigned 4 bytes.
  2. Minor Version Number: Unsigned 2 bytes.
  3. Major Version Number: Unsigned 2 bytes.
  4. Constant Pool Count: Unsigned 2 bytes.
  5. Constant Pool: A table of structures.
  6. Access Flags: Unsigned 2 bytes.
  7. This Class: Unsigned 2 bytes.
  8. Super Class: Unsigned 2 bytes.
  9. Interfaces Count: Unsigned 2 bytes.
  10. Interfaces: An array of Unsigned 2 bytes.
  11. Fields Count: Unsigned 2 bytes.
  12. Fields: An array of type field_info, which in itself is a structure, described later.
  13. Methods Count: Unsigned 2 bytes.
  14. Methods: An array of type method_info, which in itself is a structure, described later.
  15. Attributes Count: Unsigned 2 bytes.
  16. Attributes: An array of type attribute_info, which in itself is a structure, described later.

These elements must be in the order specified and all of these elements must be present.

1. Magic Number

A magic number is an identifier that identifies each class file as a Java class file. A magic number is not just Java's way of identifying class files; a magic number is used to identify a file as a particular type by other types of file types as well, like GIF or JPEG, which have their own magic numbers. So what is a Java class file's magic number? Well, expressed in hex, it spells CAFE BABE! If a Java class file does not start with this number, the ClassLoader will throw the exception "Bad Magic Number".

2. Minor and Major Version Numbers

Together with the magic number, you can think of the Minor and Major version numbers as the header of the Java class file. The version numbers are not just for informational reasons, though. A JVM can run a class file only if the version number is within a specified range. This range is specified as:

Major.0 <= Version <= Major.Minor

For example, the version for the current JVM (1.2) can be anywhere between 45.0 and 45.6.

3. Constant Pool Count and Constant Pool

The Constant Pool represents the various String Constants, class and interface names, field names, and other Constants that are within the Java Class being represented. The Constant Pool Count is used to identify the number of entries in the Constant Pool, and equals the number of entries in the Constant Pool plus one. Each Constant is represented using a specialized data structure relevant to the type of the Constant. However, all of the Constants have a tag that identifies that structure and type of the Constant. This tag is an unsigned byte. Thus, each entry in the Constant Pool begins with this unsigned byte that allows the rest of the entry to be read accordingly. For example, if the first unsigned byte in the Constant Pool has a value of 8, it represents the corresponding Constant as of type CONSTANT_String_info. As you can imagine, the Constant Pool grows to be quite large, because not only does it contain the various String Constants, but also the symbolic references to class, interface, method, and field names which, at runtime, are resolved using String Constants and hence end up in the Constant Pool.

4. Access Flags

Bit mask flags that are used to define the various access rights of this file. These flags determine, or rather, inform, the JVM of the visibility and access rights of this class or interface. These flags include:

5. This Class and Super Class

These represent a valid index to the Constant Pool table or, in the case of Super Class, a value of 0. This index points to the structure within the Constant Pool that is of the type CONSTANT_Class_info (unsigned 1-byte tag and unsigned 2-byte index within the Constant Pool for the name) and that represents the name of this class and the super class, respectively. If the index is 0, as might be in the case of a Super Class, then this class file must represent the class Object.

6. Interfaces Count and Interfaces

The Interfaces array contains indices to Constant Pool, where each entry is of the type CONSTANT_Class_info. The Interfaces Count specifies the count of the implemented interfaces.

7. Fields Count and Fields

The Fields array contains items of the type field_info, described later, which completely describe a field. The Fields Count specifies the number of items in this array. The fields represented are both class and instance variables, but not superclass-inherited fields. The field_info structure is of the type:

8. Methods Count and Methods

Similar to Fields, the Methods array contains items of type method_info, described later, which completely describe a method. The Method Count specifies the number of items in the Methods array. As with Fields, no methods from superclass or superinterfaces are defined. If the method is native or abstract, the JVM instructions are not supplied.

9. Attributes Count and Attributes

As with Fields and Methods, these are a set of Class level attributes that represent extra information about a Class. The only attributes defined for Classes are the sourcefile attribute and the deprecated attribute.

Since the symbolic references to classes, fields, and methods is coded with String constants, the Constant Pool, which contains these String constants in the Java Class file, is the biggest portion of a class file. This is thus an easy target for APIs like the BCEL to manipulate and analyze.

BCEL Basics

BCEL was formerly known as JavaClass. It was incorporated as an Apache Jakarta project in October 2001. The original JavaClass was written by Markus Dahm. The main site is hosted at jakarta.apache.org, from which you can access binaries and source code.

At the heart of BCEL is the JavaClass. A JavaClass represents a Java Class file as described above, with all of the elements. There is a one-to-one mapping between the elements of a Java class as described in the JVM specification and the JavaClass. BCEL thus allows you to read a normal class file in your program and treat it like any other object. The properties of this object are the Java class file elements. Furthermore, a JavaClass, which has been created on the fly within your program, represents an actual class file. If serialized, you will be able to run this class file in a JVM, as you would do a normally compiled source file.

BCEL allows you, at a micro level, to model the instructions contained within the Java class file. This way, you can navigate and manipulate this instruction set programmatically, allowing you to introduce enhancements and improvements in the runtime of your class. However, this is not the only way to introduce such enhancements. Better compilers and source code optimization can also do the same trick. Furthermore, it is easier to manipulate source code than it is to manipulate raw bytes. Having said all this, direct byte code manipulation has the advantage of being faster than any enhancement that you can do via compiler or source-code manipulation. This comes at the price of extra complexity. BCEL alleviates this complexity to a certain degree by allowing you to manipulate class files via source code.

Another feature of BCEL is what is called as load-time reflection. As opposed to run-time reflection, which is implemented by using the Reflection API built into the Java language, load-time reflection refers to the ability to modify the byte code instruction set at the time it is loaded. This involves writing a custom classloader, which instead of passing the byte code directly to the JVM, first passes it through your runtime system written using the BCEL API. Your system can then access meta-level objects created at load time and manipulate them. This process can even create these objects without source code present. The result continues normally after this where it is passed to the byte code verifier and then executed in the JVM.

BCEL API

The BCEL API is roughly divided into two parts:

  1. Static API
    This is the part of the API that deals with mapping the data structures and binary components described in the JVM specification. You would use this part if you were analyzing existing classes without access to the source code.

    The main class in this part is called JavaClass, which represents a Java class and includes all the data structures, constant pool, fields, methods, and commands contained in a class file. It supports the Visitor design pattern, which allows developers to write their own visitor code to traverse and analyze the contents of a class file. The JavaClass itself derives from AccessFlags class, which is the class that is extended by all classes that have access flags. This thus applies to not only JavaClass, but also to the FieldOrMethod class, the super class for Field and Method as well.

    The Constant Pool is represented by the ConstantPool class. It contains an array of type Constant that represents the different constant types in the constant pool of a parsed class file. For example, it may contain ConstantInteger, which represents reference to an int object. You can access the constants using an index and by calling the method getConstant(int index). Note that this internal array may contain a null reference. This happens in the case of double or long references that, per the JVM specification, require a skip after an entry.

    Another interesting class in the API is the Repository class. This class is used to read existing class files into the system and for resolving class interdependencies.

  2. Generic API
    This part of the API deals with creating or transforming class files dynamically. It allows you to create a class file from scratch or read an existing class file and dynamically modify it.

    The central class in this API is the ClassGen class. This class allows you to create a new class file and to add methods, attributes and fields to it dynamically. You can load an existing class file in by passing in a JavaClass that represents a file loaded into memory as described in the Static API. This class also contains methods to search this class for particular methods and fields, to replace existing methods and fields, and remove existing methods and fields. You can also directly use the MethodsGen and the FieldsGen classes for generating methods and fields, respectively.

    Corresponding to the ClassGen class is the ConstantPoolGen class. This class allows you to add different types of constants and retrieve the ConstantPool once you are done adding the constants by calling getFinalConstantPool(). Constants are added using methods like addString(String str), addInteger(int n), etc. These methods return the index at which the constant was added. If you are not done adding constants to the pool and yet want to access the ConstantPool in the state that it is in, you can call getConstantPool(). This class also allows you to look up existing entries in the pool with corresponding lookupString(String str) and lookupInteger(int n) methods.

The BCEL API contains a stack of utility classes that allow you to get started with the API without worrying too much about the semantics or getting involved in the complexities. These include Class2HTML, a utility to transform class files into HTML, JavaWrapper, a utility that acts as a wrapper to modify and generate classes as they are requested using its own class loader, and BCELifier, which takes a JavaClass object and generates BCEL Java source code to build that class.

The following examples start with the utility classes and follow up with simple examples of using the static and dynamic API.

Source Code

Download the source code for the examples.

Examples

Important: Make sure that before you run these examples, you have set the CLASSPATH to include bcel.jar.

BCELifier

The easiest way to start with BCEL is to use the BCELifier, because it generates the BCEL source code used to generate the class file itself and is a very handy way of learning how BCEL works. It generates the code that you would have to write yourself if you were to write the BCEL code for generating the class file dynamically.

The following is a simple HelloWorld source file that I will use for this example.

public class HelloWorld{
  public static void main(String args[]){
   System.err.println("Hello World through BCEL!");
  }
}

Compile this class and produce the HelloWorld.class file.

Next, run the following command (all on one line) in the directory where you compiled HelloWorld.class:

java org.apache.bcel.util.BCELifier
HelloWorld.class >> HelloWorldCreator.java

Because the BCELifier class outputs the result to standard out, I have piped the output to the resulting source file. Note that BCELifier creates the source file as "ClassFileName" + "Creator". Hence, the BCELified HelloWorld.class gets named HelloWorldCreator.java.

Compile and run HelloWorldCreator.java. You will see the output on the console as: "Hello World through BCEL!".

Open HelloWorldCreator.java and examine it. You will see that creating such a simple class is quite a complex process, even through BCEL abstracts most of the functionality of the Java class file.

Class2HTML

This is the utility that traverses a class file and creates five HTML files. These HTML files completely describe the class file by dividing it into constant pool, attributes, byte code, and methods. The fifth file combines all of these into one easy-to-use HTML file.

Run Class2HTML on HelloWorld.class as shown below:

java org.apache.bcel.util.Class2HTML HelloWorld.class

This will create five files in the current directory. Open HelloWorld.html in your browser to see the class contents as illustrated in Figure 1:

Figure 1. HelloWorld.html
Figure 1. HelloWorld.html, as generated by Class2HTML

I have marked the frames into the corresponding class file elements. This is a quick and easy way to map out a class file with the Class2HTML utility.

Static API

Using the static API, let's implement a simple class viewer. This is a simple example that exercises the JavaClass class and is similar to how Class2HTML operates.

import org.apache.bcel.Repository;
import org.apache.bcel.classfile.Code;
import org.apache.bcel.classfile.Method;
import org.apache.bcel.classfile.JavaClass;  
public class ClassViewer{
   private JavaClass clazz;
   public ClassViewer(String clazz){
      this.clazz = Repository.lookupClass(clazz);
    }
   public static void main(String args[]){
      if(args.length != 1)
        throw new IllegalArgumentException(
          "One and only one class at a time!");
      ClassViewer viewer = new ClassViewer(args[0]);
      viewer.start();
    }
   private void start(){
      if(this.clazz != null){
        // first print the structure 
        // of the class file
        System.err.println(clazz);
        // next print the methods
        Method[] methods = clazz.getMethods();
       for(int i=0; i<methods.length; i++){
          System.err.println(methods[i]);
          // now print the actual
          // byte code for each method
          Code code = methods[i].getCode();
         if(code != null)
            System.err.println(code);
       }
     }else
        throw new RuntimeException(
          "Class file is null!");
    }
}

The first thing that this example does is to look up the class that is to be mapped by requesting that the Repository load it. This is done by the Repository.lookupClass(String classname) method. The repository loads this class as a JavaClass that contains all of the information in the Java Class file format. From then on, it is a simple matter of printing the class file structure using the toString conversion on the JavaClass file and the methods and code.

Dynamic API

We have already seen the code required to create a dynamic class when we visited the BCELifier example. HelloWorldCreator.java creates a dynamic class on the fly. Let us see how we can modify this class dynamically by adding a new method.

import org.apache.bcel.*;
import org.apache.bcel.generic.*;
import org.apache.bcel.classfile.*;  
public class ClassModifier implements Constants{
 private JavaClass clazz;
 private ClassGen classGen;
 private ConstantPoolGen cp;
 public ClassModifier(String clazz){
  this.clazz = Repository.lookupClass(clazz);
  this.classGen = new ClassGen(this.clazz);
  this.cp = this.classGen.getConstantPool();
 }
 public static void main(String args[]){
  if(args.length != 1)
   throw new IllegalArgumentException(
      "One and only one class at a time!");
  ClassModifier modifier = new ClassModifier(args[0]);
  modifier.start();
 }
 private void start(){
  if(this.clazz != null) {
   // print the methods BEFORE adding the new one
   Method[] methods =
    classGen.getJavaClass().getMethods();
   System.err.println(
    "++++ Before adding new method ++++");
   for(int i=0; i<methods.length; i++){
    System.err.println(methods[i]);
   }
  InstructionList il = new InstructionList();
  classGen.addMethod(
    new MethodGen (ACC_PUBLIC | ACC_STATIC,
                   Type.VOID,
                   Type.NO_ARGS,
                   new String[] { },
                   "newMethod",
                   clazz.getClassName(),
                   IL,
                   cp).getMethod());
   // print the methods AFTER adding the new one
   methods = classGen.getJavaClass().getMethods();
   System.err.println(
    "\n++++ After adding new method ++++");
   for(int i=0; i<methods.length; i++){
    System.err.println(methods[i]);
   }
  } else
   throw new RuntimeException("Class file is null!");
  }
}

This class loads a class file and represents it in the memory using JavaClass. This part is similar to what I did in the static ClassViewer example. Having created this representation in memory of the input class, the code above creates an instance of the ClassGen class, using this representation as the base:

this.classGen = new ClassGen(this.clazz);
this.cp = this.classGen.getConstantPool();

A new method is then added to this instance of classGen by using the MethodGen constructor. As you can see, this new method has the access flags of public and static and is called newMethod. The rest of the code simply prints a list of methods in the class before and after the method is added.

Run this code using HelloWorld.class as the input class file. You will see the following output:

++++ Before adding new method ++++
public void <init>()
public static void main(String[] arg0)
++++ After adding new method ++++
public void <init>()
public static void main(String[] arg0)
public static void newMethod()

As you can see, the newMethod is added to our class file dynamically without having to touch the source code.

This has been a superficial treatment of the dynamic part of the API. The idea is to improve performance and add enhancements by being able to dynamically modify class files and not just add trivial methods.

Resources

  1. The Java language specification is a must-read if you want to know more about the internal workings of the JVM and understand the class file structure.
  2. The BCEL home page contains a manual that gives a general introduction to the API.
  3. The mailing lists are a great place to ask experienced developers questions about the API.
  4. Finally, to really understand how BCEL operates, download the source code and start with a look at the utilities package. The utilities contain a good amount of code that exercise this API.

Vikram Goyal is the author of Pro Java ME MMAPI.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.