O'Reilly Network    


 Published on The O'Reilly Network (http://www.oreillynet.com/)
 http://www.oreillynet.com/pub/wlg/8096

DBUnit Made Easy

by Bill Siggelkow
Oct. 13, 2005
URL: http://dbunit.sourceforge.net

If you develop applications that access relational data, you need to verify that your application code works with the data, as designed. When using Hibernate, for example, you should confirm that your mappings direct Hibernate to store and retrieve data correctly. I implement these kind of tests using JUnit. Some testing purists argue that testing against a database belies unit-testing, but no one can refute that testing against a live database is critical -- there is no better way to verify your mappings.

But testing against the database requires that the data be in a known initial state. You can load the database manually before you run your test, but that becomes monotonous and error-prone. DBUnit solves this problem. It can pre-load the database for each test based on an XML data set. The data set format is easy to understand; here's a small data set taken from three tables of a MySQL database. Element names match table names and the attribute names match columns.

<?xml version='1.0' encoding='UTF-8'?>
<dataset>
	<recipe id="2" name="Bing Cherry Salad"
		description="Jello-like salad with bing cherries" recipe_type_id="2"
		staff_xref_id="1" is_deleted="N" />
	<recipe id="3" name="Chocolate Chip Pound Cake" description="Yummy cake"
		recipe_type_id="2" staff_xref_id="1" is_deleted="Y" />
	<recipe_ext_xref id="1" person_id="4" xref_type="P" />
	<bon_app_ext.person id="2" first_name="Bill" last_name="Siggelkow" />
</dataset>
For a small database where you only need to load a few tables with a few rows, you can hand code the data set. But you wouldn't want to do that with a complex enterprise databases containing hundreds of tables with cryptic column names. It's easier if you let DBUnit extract the data for you.

Here's the typical process you follow when using DBUnit:

  1. Populate your database tables as needed for the test.
  2. Extract data from the those tables into an XML data set.
  3. In your unit test's setUp method, call DBUnit to load the database from the data set.
You can extract the data using a class like this:
import java.io.FileOutputStream;
import java.sql.Connection;
import java.sql.DriverManager;

import org.dbunit.database.DatabaseConnection;
import org.dbunit.database.IDatabaseConnection;
import org.dbunit.database.QueryDataSet;
import org.dbunit.dataset.IDataSet;
import org.dbunit.dataset.xml.FlatXmlDataSet;

public class ExtractTestDataSet {
    public static void main(String[] args) throws Exception {
        // database connection
        Class driverClass = Class.forName("com.mysql.jdbc.Driver");
        Connection jdbcConnection = DriverManager.getConnection(
                "jdbc:mysql://localhost/bon_app", "scott", "tiger");
        IDatabaseConnection connection = new DatabaseConnection(jdbcConnection);

        // partial database export
        QueryDataSet partialDataSet = new QueryDataSet(connection);
        partialDataSet.addTable("recipe", "SELECT * FROM recipe where is_deleted != 'Y'");
        partialDataSet.addTable("recipe_ext_xref");
        FlatXmlDataSet.write(partialDataSet,
                new FileOutputStream("partial-dataset.xml"));

        // full database export
        IDataSet fullDataSet = connection.createDataSet();
        FlatXmlDataSet.write(fullDataSet, new FileOutputStream("full-dataset.xml"));
    }
}
DBUnit can create a data set containing all the data in your database given only an open JDBC connection. In practice, you'll probably want to limit the dataset to the specific tables and data used by your unit test. In the ExtractTestDataSetexample above, I am actually creating two dataset files. The first one, partial-dataset.xml, represents only a portion of the database. DBUnit populates the dataset based on specified tables and queries. If you give just a table name, the dataset will contain all rows and columns in that table. Specify a table and query, and DBUnit will only extract data returned by the query. If you want the entire database in the data set, simply call the IDatabaseConnection.createDataSet() method.

In your unit test's setUp method(), you use the data set as input for a DBUnit DatabaseOperation. DBUnit supports various combinations of operations for deleting, inserting, and updating the database using the data set. The most common operations are CLEAN_INSERT and REFRESH. CLEAN_INSERT deletes all rows from tables in the data set, then inserts the data in the data set. This operation is the option to use if you have your own database instance and you don't need to worry about stepping on someone else's shoes. If your DBA is not so gracious, you're probably sharing a database. In this case, REFRESH is your friend. It only performs updates and inserts based on the data set.

In the following unit test snippet, I use CLEAN_INSERT to re-initialize the database before each test method.

    protected void setUp() throws Exception {
        setUpDatabase();
    }

    private void setUpDatabase() throws Exception {
        // initialize your database connection here
        Class driverClass = Class.forName("com.mysql.jdbc.Driver");
        Connection jdbcConnection = DriverManager.getConnection(
                "jdbc:mysql://localhost/bon_app", "scott", "tiger");
        IDatabaseConnection connection = new DatabaseConnection(jdbcConnection);

        // initialize your dataset here
        IDataSet dataSet = new FlatXmlDataSet(new File("full-dataset.xml"));

        try {
            DatabaseOperation.CLEAN_INSERT.execute(connection, dataSet);
        } finally {
            connection.close();
        }
    }

This approach worked great for me, but I was annoyed that I had to hand code and recompile the extraction class whenever I needed to change how DBUnit creates the dataset. Since I am using Spring, and I already have this connection configured in my Spring configuration file. If I could leverage Spring to provide a configurable class for extracting data it would save a lot of hand-coding. The result of scratching this itch is the following class:

package com.jadecove.bonappetit.test;

import java.io.FileOutputStream;
import java.sql.Connection;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import javax.sql.DataSource;

import org.apache.log4j.Logger;
import org.dbunit.database.DatabaseConfig;
import org.dbunit.database.DatabaseConnection;
import org.dbunit.database.IDatabaseConnection;
import org.dbunit.database.QueryDataSet;
import org.dbunit.dataset.IDataSet;
import org.dbunit.dataset.xml.FlatXmlDataSet;

/**
 * Extracts a DBUnit flat XML dataset from a database. 
 * 
 * @author Bill Siggelkow
 */
public class DBUnitDataExtractor {

    private DataSource dataSource;
    private String dataSetName = "dbunit-dataset.xml";
    private List queryList;
    private List tableList;
    private Map dbUnitProperties;
    private Map dbUnitFeatures;
    private String schema;
    
    /**
     * A regular expression that is used to get the table name
     * from a SQL 'select' statement.
     * This  pattern matches a string that starts with any characters, 
     * followed by the case-insensitive word 'from', 
     * followed by a table name of the form 'foo' or 'schema.foo',
     * followed by any number of remaining characters.
     */
    private static final Pattern TABLE_MATCH_PATTERN = Pattern.compile(".*\\s+from\\s+(\\w+(\\.\\w+)?).*", 
            Pattern.CASE_INSENSITIVE);
    private static final Logger log = Logger.getLogger(DBUnitDataExtractor.class);

    /**
     * The data source of the database from which the data will be extracted. This property
     * is required.
     * 
     * @param ds
     */
    public void setDataSource(DataSource ds) {
        dataSource = ds;
    }
    
    /**
     * Set the schema.
     * @param schema
     */
    public void setSchema(String schema) {
        this.schema = schema;
    }

    /**
     * Name of the XML file that will be created. Defaults to <code>dbunit-dataset.xml</code>.
     * 
     * @param name file name.
     */
    public void setDataSetName(String name) {
        dataSetName = name;
    }
    
    /**
     * Performs the extraction. If no tables or queries are specified, data from entire database
     * will be extracted. Otherwise, a partial extraction will be performed.
     * @throws Exception
     */
    public void extract() throws Exception {
        Connection conn = null;
        try {
            conn = dataSource.getConnection();
            log.info("Beginning extraction from '"+conn.toString()+"'.");
            IDatabaseConnection connection = new DatabaseConnection(conn, schema);
            configConnection((DatabaseConnection) connection);
            if (tableList != null || queryList != null) {
                // partial database export
                QueryDataSet partialDataSet = new QueryDataSet(connection);
                addTables(partialDataSet);
                addQueries(partialDataSet);
                FlatXmlDataSet.write(partialDataSet,
                        new FileOutputStream(dataSetName));
            } else {
                // full database export
                IDataSet fullDataSet = connection.createDataSet();
                FlatXmlDataSet.write(fullDataSet, new FileOutputStream(dataSetName));
            }
        }
        finally {
            if (conn != null) conn.close();
        }
        log.info("Completed extraction to '"+dataSetName+"'.");
    }

    /**
     * List of table names to extract data from.
     * 
     * @param list of table names.
     */
    public void setTableList(List list) {
        tableList = list;
    }
    
    /**
     * List of SQL queries (i.e. 'select' statements) that will be used executed to retrieve
     * the data to be extracted. If the table being queried is also specified in the <code>tableList</code>
     * property, the query will be ignored and all rows will be extracted from that table.
     * 
     * @param list of SQL queries.
     */
    public void setQueryList(List list) {
        queryList = list;
    }

    public void setDbUnitFeatures(Map dbUnitFeatures) {
        this.dbUnitFeatures = dbUnitFeatures;
    }

    public void setDbUnitProperties(Map dbUnitProperties) {
        this.dbUnitProperties = dbUnitProperties;
    }
    
    private void configConnection(DatabaseConnection conn) {
        DatabaseConfig config = conn.getConfig();
        if (dbUnitProperties != null) {
            for (Iterator k=dbUnitProperties.entrySet().iterator(); k.hasNext(); ) {
                Map.Entry entry = (Map.Entry) k.next();
                String name = (String) entry.getKey();
                Object value = entry.getValue();
                config.setProperty(name, value);
            }
        }
        if (dbUnitFeatures != null) {
            for (Iterator k=dbUnitFeatures.entrySet().iterator(); k.hasNext(); ) {
                Map.Entry entry = (Map.Entry) k.next();
                String name = (String) entry.getKey();
                boolean value = Boolean.valueOf((String)entry.getValue()).booleanValue();
                config.setFeature(name, value);
            }
        }
    }

    private void addTables(QueryDataSet dataSet) {
        if (tableList == null) return; 
        for (Iterator k = tableList.iterator(); k.hasNext(); ) {
            String table = (String) k.next();
            dataSet.addTable(table);
        }
    }

    private void addQueries(QueryDataSet dataSet) {
        if (queryList == null) return; 
        for (Iterator k = queryList.iterator(); k.hasNext(); ) {
            String query = (String) k.next();
            Matcher m = TABLE_MATCH_PATTERN.matcher(query);
            if (!m.matches()) {
                log.warn("Unable to parse query. Ignoring '" + query +"'.");
            }
            else {
                String table = m.group(1);
                // only add if the table has not been added
                if (tableList != null && tableList.contains(table)) {
                    log.warn("Table '"+table+"' already added. Ignoring '" + query + "'.");
                } else {
                    dataSet.addTable(table, query);
                }
            }
        }
    }
}
This class serves as a Spring-friendly wrapper around the DBUnit extraction process, though it has no dependency on Spring (the beauty of Spring). Here's how you can configure the extractor in a Spring beans file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">

<beans>
    <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName"><value>com.mysql.jdbc.Driver</value></property>
        <property name="url"><value>jdbc:mysql://localhost/bon_app</value></property>
        <property name="username"><value>scott</value></property>
        <property name="password"><value>tiger</value></property>
    </bean>

	<bean id="dbUnitDataExtractor" class="com.jadecove.bonappetit.test.DBUnitDataExtractor">
		<property name="dataSource" ref="dataSource"/>
		<property name="dataSetName" value="dbunit-dataset.xml"/>
		<property name="tableList">
			<list>
				<value>recipe</value>
				<value>recipe_ext_xref</value>
			</list>
		</property>
		<property name="queryList">
			<list>
				<value>select * from bon_app_ext.person where first_name = 'Bill'</value>
			</list>
		</property>
	</bean>
</beans>
You can now extract the DBUnit dataset by acquiring a configured extractor via Spring, and calling the extract() method. The only required extractor property is the data source. Doing so will create a data set for the entire database. Optionally, you can list specific tables and queries to base the extraction on. In the example above, I am selecting all rows from the recipe and recipe_ext_xref tables. In addition, I am selecting only those rows from the person table, in the bon_app_ext schema, where the first_name equals 'Bill'.

When using queries, DBUnit requires you to specify both the table name and the query. To make configuration easier, my extractor uses a regular expression to determine the table name. If you happen to specify a query for a table that's also specifed in the tableList property, the extractor will ignore the query and fetch all rows for the given table.

The extractor also supports additional DBUnit configuration properties and features. DBUnit documents these settings in the DBUnit API and user documents.

I have used this extractor with simple local MySQL databases and with complex legacy enterprise databases. Through a combination of the connection settings, schema, and DBUnit features you can extract the data from most any database. In my current consulting job, I used this extractor to fetch data from some complex DB2 databases running on an IBM AS-400.

This exercise not only helped make life with DBUnit easier, it helped me understand how easy it is to leverage Spring for these mundane assembly tasks. The next time you need to automate tasks like these and you are Spring, don't forget that it's well-suited for jobs like this.

Bill Siggelkow is an independent consultant specializing in software design, development, and technical training.

oreillynet.com Copyright © 2006 O'Reilly Media, Inc.