I’ve been thinking about the Kudzu principle a lot lately. This particular “rule” was something I first observed in about 1999, and it goes something like this - XML, once introduced into a system, will over time continue to expand into that system. Kudzu was originally introduced into the American Southeast in the 1800s, in order to provide a way to more readily secure the loose clay soil that’s so predominant there. Unfortunately, like many invasive plants it very quickly expanded beyond its original boundaries and became one of the most aggressive weeds in the region.

XML is a mechanism for abstraction. Unlike OOP typed objects, however, the abstraction does not place specific requirements upon the local mechanism for implementation - instead XML forms a document object model where each particular element represents what in other languages would be considered classes or class properties (either implicitly declared - i.e., a string object, or explicitly declared). Because of this, it becomes possible to manipulate that particular object model using a generic set of commands that in general are unaware of the underlying semantics of the given class or property.

This is a part of the reason that languages such as XSLT are themselves so powerful. With XSLT, you can manipulate not just a given class but the underlying relational models inherent in the object model. Cross-cutting concepts such as aspect programming provide a rough analog to this process, but what is an absurdly simple idea in XML (its what any good SAX parser does, after all), implementing that same concept in languages such as Java or C# has proven to be considerably more complicated. The reason why this happens should be readily obvious - cross-cutting and other aspect related principles deal with the abstraction of a program, not the program itself … in other words, dealing with the domain at the level where XML is largely native but where most procedural languages aren’t.

Feeling eXist-ential

All this leads up to work that I’ve been doing with an AJAX/XForms application that I designed for a company recently and am in the midst of implementing. I will not go into detail about the venture just yet (soon, perhaps), but in the process of doing this I did manage to spend some time playing with XML databases, settling eventually on the eXist XML database (http://exist.sourceforge.net/). It is a pure XML database, open-source, with support for the current XQuery standards.

I have to admit, I’ve fallen in love with it.

Installation was remarkably simple - executing a JAR file, and setting a couple of optional (and easily understandable) parameters was all it took. The service runs as its own web service, and takes the practical approach of using a SOAP based interface for binding with other languages, including PHP5, the one I’m working with. Since PHP’s approach to SOAP/WSDL is remarkably transparent, once you create the object under its own WSDL (a single line operation) you have the core interfaces you need as PHP objects to do everything. While it is perhaps not quite as efficient as having the access under a local process it makes a great deal of sense for those kinds of applications that don’t require ultra-high performance data-access, probably the majority of all websites in existence, if it comes to that.

eXist uses what is becoming a model that’s becoming increasingly common in the XML database space - the collection. A collection can be thought of as either a database in the relational database sense or as the root node for a number of different XML objects that are used in conjunction with another. What I find rather delightful here is that creating such collections (and adding XML to them) can be done in any number of ways, from the localhost run website that eXist installs, via the use of the eXist Java API (or its corresponding SOAP equivalents elsewhere), or, perhaps coolest of all, via WebDAV.

To explain the last in a little more detail - eXist exposes a WebDAV interface that any WebDAV enabled software can use to both retrieve and write content from/to the database. For instance, from Oxygen (my all time favorite XML editor) I can select Open URL, pass the appropriate URL (http://localhost:8080/exist/webdav/db/ on the default) and you can then open XML “files” from the database, work on them, and save them, all without realizing that these are not files at all. From both Internet Explorer and Konqueror, you can also create Folders and drag XML content into them, creating collections and populating them very

simply (under IE, you do need to open it as a web folder). The system also pre-validates any entries, insuring that only valid XML content can be loaded into the XML database. Internally, these are not files at all, but are instead well indexed data resources. The beauty of databases like eXist is that you will likely only see this if you look fairly deep under the covers. For the most part, the information looks and acts like XML, can be queryable as XML, and can be retrieved via a URL. This in turn makes for an incredibly transparent code, especially given that you can additionally create xquery files that can be referenced directly via the web that work against the XML database.

Those who’ve read my commentary in the past may be aware of my feelings towards XQuery, though I believe a part of this came about because I co-wrote two books very early on about XQuery that basically were too far in advance of the curve. XQuery is not XSLT (although with eXist, this isn’t that big a deal, because you CAN call XSLT transformations from within your XQuery code — more in a bit), but it is a reasonably powerful language for querying data, and eXist makes exquisite use of the standard.

While the XQuery language has gone through a number of fluctuations over the years, the eXist capabilities are built using the build that seems (at least upon first inspection) close to if not identical to the XQuery 1.0 Candidate recommendation.

Those who are long time readers of my columns are probably aware of my opinions concerning XQuery (generally negative), though I should point out that I have held those opinions for very definite reasons:

  • A few companies have tried to view XQuery and XSLT2 as being competing technologies. I see them rather as complementary technologies that reinforce the strengths of each other, and that to make chosing which to develop as an either/or type of decision fails to take this into account.
  • XQuery implementations to date have had to play catch up with a rapidly evolving standard. Now that XQuery has reach Candidate Recommendation status (after only five years of development … sigh) it has become stable enough to actually build product on, making it much easier to review and get into products.
  • XQuery’s initial syntax and structure had some real problems. With a few minor nits (mostly having to do the belief that there’s no need for a thousand date functions polluting the initial functional namespace) I’ve actually come to feel that XQuery has a lot of benefits, especially if it can be tied into an XML Database.
  • XQuery is a read-only technology. Most database programming involves more than just reading data - it involves modifying the state of the database in response to countless user interactions. If XQuery can be extended to include an update mechanism, this makes XQuery a very nice replacement for SQL, rather than just an awkward adjunct.

eXist is one of the first XML databases that I’ve run into that has a more or less contemporary XQuery implementation, that includes both system and database extensions (in their own namespaces properly enough) to permit not only system access but database updates, recognizes the importance of transformations and makes it possible to invoke and use transformations from within an XQuery (with a few additional surprises on that I’ll discuss later), and that makes XQuery debugging easy and relatively safe. It’s also surprisingly fast - a re-design of eXist’s indexing engine paid for itself quite handsomely, and it is even possible to use eXist based XQueries directly as a web server pages, though this may not be preferable in all cases.

For instance, I recently created a simple server application where I wanted to create a server side set of newsfeed lists. I started the process out by creating in Oxygen an XML file (called newsfeeds.xml) and saved it to a WebDAV folder “newsfeeds” under an application folder I’d created.

<newsfeeds>
    <newsfeed title="Slashdot" src="http://rss.slashdot.org/Slashdot/slashdot"/>
    <newsfeed title="ScienceDaily" src="http://www.sciencedaily.com/newsfeed.xml"/>
</newsfeeds>            
        

I then wrote a (somewhat stripped down) version of the XQuery that I used to both display and add new newsfeeds, calling it updateNewsfeeds.xq:

declare namespace h ="http://www.w3.org/1999/xhtml";
let $newsfeeds := //newsfeeds
let $title := request:get-parameter("title","")
let $src := request:get-parameter("src","")
let $newFeed := <newsfeed src="{$src}" title="{$title}"/>
return (
if (($title != "") and ($src != "")) then update insert <newsfeed src="{$src}" title="{$title}"/> into $newsfeeds else "", 
response:set-header("Content-Type","text/xml"),
<h:html>
    <h:head>
        <h:title>Newsfeed Manager</h:title>
    </h:head>
    <h:body>
        {$newsfeeds}
        <h:h1>Newsfeed Manager</h:h1>            
        <h:table border="1">
            <h:tr>
                <h:th>Active</h:th>
                <h:th>Title</h:th>
                <h:th>URL</h:th>
            </h:tr>
            {for $newsfeed in $newsfeeds/* order by $newsfeed/@title ascending return
            <h:tr>
                <h:td><h:input type="checkbox"/></h:td>
                <h:td>{string($newsfeed/@title)}</h:td>
                <h:td>{string($newsfeed/@src)}</h:td>
            </h:tr>
            }               
        </h:table>
        <h:form method="GET" action="http://localhost:8080/exist/xquery/updateNewsfeeds.xq">
            <h:span class="label">Feed Name:</h:span><h:input type="text" name="title"/><h:br/>
            <h:span class="label">Feed URL: </h:span><h:input type="text" name="src"/><h:br/>
            <h:input type="submit"/>
        </h:form>
    </h:body>
</h:html>)            
        

eXist works by indexing individual elements by name, so if you have a unique parent node such as //newsfeeds, this query takes almost no time at all (the downside with this is that namespaces can become necessary if you have common elements with different semantics in different schemas. Once you have this node context, you can use it to retrieve all subordinate elements.

The XQuery extensions that eXist adds are visible early on as well:

    let $title := request:get-parameter("title","")
    let $src := request:get-parameter("src","")    

These retrieve parameters from either an HTTP GET or an HTTP POST to the server. The implications here should be pretty obvious - since such server calls are often the province of web page servers such as PHP, JAVA or ASP.NET, this implies that, for at least some operations, you can use an XQuery as a web page server. Indeed, this is exatly how this page is used - by placing this in the eXist webApp/xquery folder, this actually generates a web page. The query, incidentally, is compiled by default, which means that after the first time running the page, the performance goes way up.

Similarly, the response:set-header() method in the example illustrates how you can communicate back to the client - in this particular case informing it that the content coming down is xml.

If you’re not that familiar with XQuery, the content after the return statement can appear a wee bit confusing. XQuery uses the XPath 2.0 standard, and as such provides a data structure that may be unfamiliar to many readers - the sequence. A sequence is, as one might expect, a list of items, and it is the successor of the node-set of XSLT 1/XPath 1. However, a sequence can hold any collection of values, including void object which do not render into the final stream. This means that you can place multiple “instructions” in a sequence and have them execute in the order that they are encountered, without violating the integrity of the XQuery expression.

For instance, the XQuery’s return block begins:

(
if (($title != "") and ($src != "")) then update insert <newsfeed src="{$src}" title="{$title}"/> into $newsfeeds else "", 
response:set-header("Content-Type","text/xml"), ...
        

The first “item” is a basic validation (testing to see whether all of the data was entered correctly) which, if true, calls the exist update insertextention, placing a newly created <newsfeed> object into the database as a child of the <newsfeeds> object, with the previously defined $src and $title executing inside the appropriate attributes. Since all of the update commands (update insert, update delete, update replace, update value, and update rename) return a void, even when placed in a sequence they can perform their actions without it reflecting in the final output.

The remainder of the application performs the process of building the XHTML page. The first section, a table showing the existing entries, is very straightforward XQuery, with the most relevant part:

            {for $newsfeed in $newsfeeds/* order by $newsfeed/@title ascending return
            <h:tr>
            <h:td><h:input type="checkbox"/></h:td>
            <h:td>{string($newsfeed/@title)}</h:td>
            <h:td>{string($newsfeed/@src)}</h:td>
            </h:tr>
            }                           
        

generating the table, ordered alphabetically by title. Note that including straight references to elements within XQuery in the output (such as {$newsfeed/title}) will actually generate a block that looks like:

<title>My Title</title>

so you should generally retrieve the string content of the element or attribute instead, as shown above.

The final section is a simple form that passes the src and title attribute back to the server, completing the loop. A screenshot of this “app” looks as follows:

localhost_exist_xquery_updateNewsfeeds.xq.png

Obviously, this is a very simple application, though it illustrates the basic roundtripping that eXist enables.

While there are many other features to eXist that I could highlight, I think one that I personally feel very excited about is its ability to incorporate XSLT transformations directly into XQuery. For instance, if I had a transformation contained in the database as <xsl:stylesheet id="mytransform1" ...>, I could perform a transformation on a node $data, complete with parameterization, as simply as saying:

let $result := transform:transform($data,//xsl:stylesheet[@id="mytransform1"],<parameters><param name="param1">Foo</param></parameters>>                        
        

Now, the question that may come up is why you would want to perform a transformation inside XQuery. Actually there are several good reasons to do so:

  • You can create conditional pipelines of transformations within XQuery, analysing XPath from the results of one transformation in order to determine which secondary transformation to pass it to,
  • You can do additional processing of data from eXist using XQuery, then when the content is in suitable form you can pass the results to a transformation (for instance, transforming only those nodes which satisfy a given search criterion
  • You can use the built in ACL support to insure that you have the correct user context for transformations
  • You can keep all of your transformations within the database proper, providing an additional layer of security (you’d only be able to access the transformations, and hence the business logic) via predefined XQueries.

As if these weren’t compelling enough reasons to use XQuery to support transformations, there’s another, more subtle one. Within eXist, the default XSLT processor is that which ships with Java - Xalan. However, it’s the work of about ten minutes to switch over to using Saxon 8 as the XSLT processor ( http://wiki.exist-db.org/comments/Howtos/Adding+XSLT+2.0+(Saxon)), and thus open up the capability of working with both XQuery and XSLT 2 in the same application. It’s also conceivable (though I haven’t tested this yet) that it may be possible to use the extensive eXist extensions within Saxon - I’ll find out more about that when I get the chance.

Overall, there’s a lot to like about eXist. I’ll admit, I haven’t stress-tested it yet, and while I’ve put together a few fairly solid applications at this point, the (re-)learning curve for XQuery has taken some time. However, I think that if you are looking at a good, open source, fast, solid XML Database, you could do far worse than eXist - it’s coming very close to fulfilling my ideal web development environment - one where the entire pipeline, from back-end database to components and AJAX feeds on the client, is ultimately just the flow of XML back and forth. Yup, definitely in love.

Kurt Cagle is an author, software consultant (CTO of Metaphorical Web) and commentator on issues XML related. He lives in Victoria, British Columbia with his wife and kids, and plans to go swimming this afternoon.