I’ve been thinking about the Kudzu principle a lot lately. This particular “rule” was something I first observed in about 1999, and it goes something like this - XML, once introduced into a system, will over time continue to expand into that system. Kudzu was originally introduced into the American Southeast in the 1800s, in order to provide a way to more readily secure the loose clay soil that’s so predominant there. Unfortunately, like many invasive plants it very quickly expanded beyond its original boundaries and became one of the most aggressive weeds in the region.
XML is a mechanism for abstraction. Unlike OOP typed objects, however, the abstraction does not place specific requirements upon the local mechanism for implementation - instead XML forms a document object model where each particular element represents what in other languages would be considered classes or class properties (either implicitly declared - i.e., a string object, or explicitly declared). Because of this, it becomes possible to manipulate that particular object model using a generic set of commands that in general are unaware of the underlying semantics of the given class or property.
This is a part of the reason that languages such as XSLT are themselves so powerful. With XSLT, you can manipulate not just a given class but the underlying relational models inherent in the object model. Cross-cutting concepts such as aspect programming provide a rough analog to this process, but what is an absurdly simple idea in XML (its what any good SAX parser does, after all), implementing that same concept in languages such as Java or C# has proven to be considerably more complicated. The reason why this happens should be readily obvious - cross-cutting and other aspect related principles deal with the abstraction of a program, not the program itself … in other words, dealing with the domain at the level where XML is largely native but where most procedural languages aren’t.
Feeling eXist-ential
All this leads up to work that I’ve been doing with an AJAX/XForms application that I designed for a company recently and am in the midst of implementing. I will not go into detail about the venture just yet (soon, perhaps), but in the process of doing this I did manage to spend some time playing with XML databases, settling eventually on the eXist XML database (http://exist.sourceforge.net/). It is a pure XML database, open-source, with support for the current XQuery standards.
I have to admit, I’ve fallen in love with it.
Installation was remarkably simple - executing a JAR file, and setting a couple of optional (and easily understandable) parameters was all it took. The service runs as its own web service, and takes the practical approach of using a SOAP based interface for binding with other languages, including PHP5, the one I’m working with. Since PHP’s approach to SOAP/WSDL is remarkably transparent, once you create the object under its own WSDL (a single line operation) you have the core interfaces you need as PHP objects to do everything. While it is perhaps not quite as efficient as having the access under a local process it makes a great deal of sense for those kinds of applications that don’t require ultra-high performance data-access, probably the majority of all websites in existence, if it comes to that.
eXist uses what is becoming a model that’s becoming increasingly common in the XML database space - the collection. A collection can be thought of as either a database in the relational database sense or as the root node for a number of different XML objects that are used in conjunction with another. What I find rather delightful here is that creating such collections (and adding XML to them) can be done in any number of ways, from the localhost run website that eXist installs, via the use of the eXist Java API (or its corresponding SOAP equivalents elsewhere), or, perhaps coolest of all, via WebDAV.
To explain the last in a little more detail - eXist exposes a WebDAV interface that any WebDAV enabled software can use to both retrieve and write content from/to the database. For instance, from Oxygen (my all time favorite XML editor) I can select Open URL, pass the appropriate URL (http://localhost:8080/exist/webdav/db/ on the default) and you can then open XML “files” from the database, work on them, and save them, all without realizing that these are not files at all. From both Internet Explorer and Konqueror, you can also create Folders and drag XML content into them, creating collections and populating them very
simply (under IE, you do need to open it as a web folder). The system also pre-validates any entries, insuring that only valid XML content can be loaded into the XML database. Internally, these are not files at all, but are instead well indexed data resources. The beauty of databases like eXist is that you will likely only see this if you look fairly deep under the covers. For the most part, the information looks and acts like XML, can be queryable as XML, and can be retrieved via a URL. This in turn makes for an incredibly transparent code, especially given that you can additionally create xquery files that can be referenced directly via the web that work against the XML database.
Those who’ve read my commentary in the past may be aware of my feelings towards XQuery, though I believe a part of this came about because I co-wrote two books very early on about XQuery that basically were too far in advance of the curve. XQuery is not XSLT (although with eXist, this isn’t that big a deal, because you CAN call XSLT transformations from within your XQuery code — more in a bit), but it is a reasonably powerful language for querying data, and eXist makes exquisite use of the standard.
While the XQuery language has gone through a number of fluctuations over the years, the eXist capabilities are built using the build that seems (at least upon first inspection) close to if not identical to the XQuery 1.0 Candidate recommendation.
Those who are long time readers of my columns are probably aware of my opinions concerning XQuery (generally negative), though I should point out that I have held those opinions for very definite reasons:
- A few companies have tried to view XQuery and XSLT2 as being competing technologies. I see them rather as complementary technologies that reinforce the strengths of each other, and that to make chosing which to develop as an either/or type of decision fails to take this into account.
- XQuery implementations to date have had to play catch up with a rapidly evolving standard. Now that XQuery has reach Candidate Recommendation status (after only five years of development … sigh) it has become stable enough to actually build product on, making it much easier to review and get into products.
- XQuery’s initial syntax and structure had some real problems. With a few minor nits (mostly having to do the belief that there’s no need for a thousand date functions polluting the initial functional namespace) I’ve actually come to feel that XQuery has a lot of benefits, especially if it can be tied into an XML Database.
- XQuery is a read-only technology. Most database programming involves more than just reading data - it involves modifying the state of the database in response to countless user interactions. If XQuery can be extended to include an update mechanism, this makes XQuery a very nice replacement for SQL, rather than just an awkward adjunct.
eXist is one of the first XML databases that I’ve run into that has a more or less contemporary XQuery implementation, that includes both system and database extensions (in their own namespaces properly enough) to permit not only system access but database updates, recognizes the importance of transformations and makes it possible to invoke and use transformations from within an XQuery (with a few additional surprises on that I’ll discuss later), and that makes XQuery debugging easy and relatively safe. It’s also surprisingly fast - a re-design of eXist’s indexing engine paid for itself quite handsomely, and it is even possible to use eXist based XQueries directly as a web server pages, though this may not be preferable in all cases.
For instance, I recently created a simple server application where I wanted to create a server side set of newsfeed lists. I started the process out by creating in Oxygen an XML file (called newsfeeds.xml) and saved it to a WebDAV folder “newsfeeds” under an application folder I’d created.
<newsfeeds>
<newsfeed title="Slashdot" src="http://rss.slashdot.org/Slashdot/slashdot"/>
<newsfeed title="ScienceDaily" src="http://www.sciencedaily.com/newsfeed.xml"/>
</newsfeeds>
I then wrote a (somewhat stripped down) version of the XQuery that I used to both display and add new newsfeeds, calling it updateNewsfeeds.xq:
declare namespace h ="http://www.w3.org/1999/xhtml";
let $newsfeeds := //newsfeeds
let $title := request:get-parameter("title","")
let $src := request:get-parameter("src","")
let $newFeed := <newsfeed src="{$src}" title="{$title}"/>
return (
if (($title != "") and ($src != "")) then update insert <newsfeed src="{$src}" title="{$title}"/> into $newsfeeds else "",
response:set-header("Content-Type","text/xml"),
<h:html>
<h:head>
<h:title>Newsfeed Manager</h:title>
</h:head>
<h:body>
{$newsfeeds}
<h:h1>Newsfeed Manager</h:h1>
<h:table border="1">
<h:tr>
<h:th>Active</h:th>
<h:th>Title</h:th>
<h:th>URL</h:th>
</h:tr>
{for $newsfeed in $newsfeeds/* order by $newsfeed/@title ascending return
<h:tr>
<h:td><h:input type="checkbox"/></h:td>
<h:td>{string($newsfeed/@title)}</h:td>
<h:td>{string($newsfeed/@src)}</h:td>
</h:tr>
}
</h:table>
<h:form method="GET" action="http://localhost:8080/exist/xquery/updateNewsfeeds.xq">
<h:span class="label">Feed Name:</h:span><h:input type="text" name="title"/><h:br/>
<h:span class="label">Feed URL: </h:span><h:input type="text" name="src"/><h:br/>
<h:input type="submit"/>
</h:form>
</h:body>
</h:html>)
eXist works by indexing individual elements by name, so if you have a unique parent node such as //newsfeeds, this query takes almost no time at all (the downside with this is that namespaces can become necessary if you have common elements with different semantics in different schemas. Once you have this node context, you can use it to retrieve all subordinate elements.
The XQuery extensions that eXist adds are visible early on as well:
let $title := request:get-parameter("title","")
let $src := request:get-parameter("src","")
These retrieve parameters from either an HTTP GET or an HTTP POST to the server. The
implications here should be pretty obvious - since such server calls are often the
province of web page servers such as PHP, JAVA or ASP.NET, this implies that, for at
least some operations, you can use an XQuery as a web page server. Indeed, this is
exatly how this page is used - by placing this in the eXist webApp/xquery
folder, this actually generates a web page. The query, incidentally, is compiled by
default, which means that after the first time running the page, the performance goes
way up.
Similarly, the response:set-header() method in the example illustrates how
you can communicate back to the client - in this particular case informing it that the
content coming down is xml.
If you’re not that familiar with XQuery, the content after the return
statement can appear a wee bit confusing. XQuery uses the XPath 2.0 standard, and as
such provides a data structure that may be unfamiliar to many readers - the sequence. A
sequence is, as one might expect, a list of items, and it is the successor of the
node-set of XSLT 1/XPath 1. However, a sequence can hold any collection of values,
including void object which do not render into the final stream. This means that you can
place multiple “instructions” in a sequence and have them execute in the order that they
are encountered, without violating the integrity of the XQuery expression.
For instance, the XQuery’s return block begins:
(
if (($title != "") and ($src != "")) then update insert <newsfeed src="{$src}" title="{$title}"/> into $newsfeeds else "",
response:set-header("Content-Type","text/xml"), ...
The first “item” is a basic validation (testing to see whether all of the data was
entered correctly) which, if true, calls the exist update insertextention,
placing a newly created <newsfeed> object into the database
as a child of the <newsfeeds> object, with the previously
defined $src and $title executing inside the appropriate attributes. Since all of the
update commands (update insert, update delete, update
replace, update value, and update rename) return a
void, even when placed in a sequence they can perform their actions without it
reflecting in the final output.
The remainder of the application performs the process of building the XHTML page. The first section, a table showing the existing entries, is very straightforward XQuery, with the most relevant part:
{for $newsfeed in $newsfeeds/* order by $newsfeed/@title ascending return
<h:tr>
<h:td><h:input type="checkbox"/></h:td>
<h:td>{string($newsfeed/@title)}</h:td>
<h:td>{string($newsfeed/@src)}</h:td>
</h:tr>
}
generating the table, ordered alphabetically by title. Note that including straight references to elements within XQuery in the output (such as {$newsfeed/title}) will actually generate a block that looks like:
<title>My Title</title>
so you should generally retrieve the string content of the element or attribute instead, as shown above.
The final section is a simple form that passes the src and title attribute back to the server, completing the loop. A screenshot of this “app” looks as follows:
Obviously, this is a very simple application, though it illustrates the basic roundtripping that eXist enables.
While there are many other features to eXist that I could highlight, I think one that I
personally feel very excited about is its ability to incorporate XSLT transformations
directly into XQuery. For instance, if I had a transformation contained in the database
as <xsl:stylesheet id="mytransform1" ...>, I could perform a
transformation on a node $data, complete with parameterization, as simply as saying:
let $result := transform:transform($data,//xsl:stylesheet[@id="mytransform1"],<parameters><param name="param1">Foo</param></parameters>>
Now, the question that may come up is why you would want to perform a transformation inside XQuery. Actually there are several good reasons to do so:
- You can create conditional pipelines of transformations within XQuery, analysing XPath from the results of one transformation in order to determine which secondary transformation to pass it to,
- You can do additional processing of data from eXist using XQuery, then when the content is in suitable form you can pass the results to a transformation (for instance, transforming only those nodes which satisfy a given search criterion
- You can use the built in ACL support to insure that you have the correct user context for transformations
- You can keep all of your transformations within the database proper, providing an additional layer of security (you’d only be able to access the transformations, and hence the business logic) via predefined XQueries.
As if these weren’t compelling enough reasons to use XQuery to support transformations, there’s another, more subtle one. Within eXist, the default XSLT processor is that which ships with Java - Xalan. However, it’s the work of about ten minutes to switch over to using Saxon 8 as the XSLT processor ( http://wiki.exist-db.org/comments/Howtos/Adding+XSLT+2.0+(Saxon)), and thus open up the capability of working with both XQuery and XSLT 2 in the same application. It’s also conceivable (though I haven’t tested this yet) that it may be possible to use the extensive eXist extensions within Saxon - I’ll find out more about that when I get the chance.
Overall, there’s a lot to like about eXist. I’ll admit, I haven’t stress-tested it yet, and while I’ve put together a few fairly solid applications at this point, the (re-)learning curve for XQuery has taken some time. However, I think that if you are looking at a good, open source, fast, solid XML Database, you could do far worse than eXist - it’s coming very close to fulfilling my ideal web development environment - one where the entire pipeline, from back-end database to components and AJAX feeds on the client, is ultimately just the flow of XML back and forth. Yup, definitely in love.
Kurt Cagle is an author, software consultant (CTO of Metaphorical Web) and commentator on issues XML related. He lives in Victoria, British Columbia with his wife and kids, and plans to go swimming this afternoon.


Glad to see you've spent the time to play with eXist, Kurt... There are a few others that kinda/sorta fill the XQuery/XSLT Application space, but the only one with any real credibility (as an application server. Plenty of XQuery DB's that are really quite nice) is SQL Server 2005/Express which fills things nicely from an XQuery/XSLT 1.0 space, and its FAST AS... + you can code in any other language supported on the .NET platform.
The downside to SS2005/E is thats its not cross-platform and not open-source, though the OSS side of things is not something I see as a problem... I can build and extend from it all I want, which fills my needs just fine.
Hi Kurt,
"a re-design of eXist's indexing engine paid for itself quite handsomely" Was that in the transition from 1.0 to 1.1? I found 1.0 to be a bit slow, and was encouraged by what I saw at http://exist.sourceforge.net/index.html#download.
People interested in starting with eXist from scratch may find this article useful.
http://mailman.ic.ac.uk/pipermail/xml-dev/1998-August/005720.html
http://aspn.activestate.com/ASPN/Mail/Message/xml-dev/1397752
http://lists.xml.org/archives/xml-dev/200602/msg00130.html
Three of a number of hits in order of idea emergence.
Credit isn't just about 'showing up', as Tim says. It is about
checking sources. This is why PageRanking works and why it doesn't.
Sustainability trumps authority.
Bob,
The transition was 1.0 to 1.1 - I agree with you that 1.0 seemed slow to me, and the indexing rework has definitely been worth it. I'll edit that to make it clearer.
-- Kurt
Mark,
I've played a fair amount with SQL Server Express, but in this particular case the cross platform aspect was a major part of the requirements, both from a dev and a deployment standpoint. The customer is running a Linux server, having been burned more than once by bad Windows developers - not necessarily a fault of the OS, of course, but it makes it harder to propose a solution in that space when the last project the CEO did on Windows went belly-up.
Microsoft applications need to get out of this rut of supporting only the Windows OS. I suspect that SQL Server for Linux would actually prove to be quite profitable for them, giving them a solid heterogenous base by which to embrace and extend into the Linux space itself.
-- Kurt
Bob,
BTW, thanks for the article you wrote earlier - when I was evaluating XQuery databases, it was definitely the best article I found on the topic, and proved instrumental in helping me decide to go with eXist as my database of choice for a client.
-- Kurt
Len,
I suspect only those of us who have spent serious time in the south (I lived in Montgomery for four years as a kid) really understand the kudzu metaphor - we had a hill in the back of our house where the developers had put down kudzu in order to stabilize the red clay soil from eroding in the frequent thunderstorms, and I remember at age ten hacking away at it (and the rhubarb that was infesting our garden, yet another opportunistic import) trying to keep it from taking over the garden at the base of the hill.
The funny thing about the "XML as kudzu" argument is that I find it only gains more relevance over time. We need the structure that XML imposes, so long as that structure does not become so restrictive as to be constraining. I liken it to the order/chaos balance that LE Modessit has explored to great depth in his Recluse books - you need the order that XML provides in order to generate stability amidst the quickfire of languages such as JavaScript, but stability by itself is only sterility. On the other hand, using XML inappropriately is likely to result in headaches and blindness ... hmmm, not going there..
-- Kurt
Tim Bray posts an article today about the authority of WikiPedia references. Some of us have been worrying for a long time about search system ranking. At the same time there are discussions of using published ideas for patent validation. Early hypertext pioneers worried about citations and their effect on the authority of crediting ideas. Xanadu and other attempts become as complex as they do in part trying to satisfy this requirement.
Simple searches reveal timeliness if not quotation lineage. Berners-Lee talked about independent invention as being a primary quality of the worthiness of the idea. Copyright is not about the ability of the government to defend the right, but is about using the government as a witness. The holder has to defend the right.
The Kudzu is both exemplar and metaphor for the tangled weave of ideas illustrating the single problem of 'who says it first'. In a world of assertion of reputation and reputation management, we either give up the right or we begin to develop even more complex technologies to defend it.
Who invented XML? No one? Every one? I can point to the citations but are they right or just the artifact of who was the most popular person at the time?
This is tangential to what your article is about, yet your article uses the idea and while we share, who's is it? No one's really, but an interesting example of an elephant in the room of web publishing because as I point out repeatedly here and elsewhere, the Long Tail is inverting and that makes me wonder about the future of the ad-sponsored sites and search engines. The need to corral and control the sources of resources continues to grow, but the very act of doing it causes those sources to fade in value as long as they use the very system they are supporting.
Like the Kudzu, it kills the soil it was supposed to protect. That makes for a neatly recursive idea. Now which one of us owns it? ;-)
Birmingham eh? How are your lungs? Still red dust occluded?
Oops. Montgomery. Wow. Now that is occlusion at its worst.
All you suffered was severe boredom. Montgomery is a company town if ever there was one.
Thanks Kurt. I've added a pointer to this piece at the end of part 1 of that article.
Ha Kurt, I felt the same when I first tried eXist a year or so ago. For now I didn't find a time to include in my projects but I think I will try to set it as a backend for my Python implementation of the Atom Pub. Protocol.
Constraints create structure. Chaos is the engine of evolution.
Targeted selection?
Kudzu spreads the same way pines do: they eliminate opportunities for competitors to thrive. XML is not actually kudzu except at the level of the syntax. HTML is kudzu at the level of weak semantics but all naming systems have some degree of that quality. What you want to know is where on the scale from the pine (toxic to competitors because of what needles do to soil) to kudzu which is not toxic to the soil but proliferate so fast that a competitor only survives proportionally to its strength (think kudzu covered trees vs grass).
Once again, it comes down to the force vector qualities of the object in the environment over the ease with which it can proliferate (eg, HTML isn't the winner because it is the best but because it is easy to remember and therefore author for a tipping point percentage of applications). Then it comes down to awareness or buzz and that is why I come back to the PageRank over Wikipedia effect: sustainability over authority.
If I were Tim I wouldn't worry too much. If over time Wikipedia proves to be poisoned soil, the only thing that will grow there is weeds. It takes time but feedback-based systems either equalize at some optima or they collapse. See PID controllers.
Borat creator Sacha Baron Cohen reportedly signs a $42.5m (£22m) film deal starring his character Bruno...
Borat creator Sacha Baron Cohen reportedly signs a $42.5m (£22m) film deal starring his character Bruno...
3dc4876f3f hi, i`m from india, and i has been very hart by you site)))
In an ARM loan, when the initial interest rate is abnormally low 7
http://digg.com/celebrity/Glamour_sexy_women_art_gallery Glamour sexy women art gallery
Lotus flower tattoos
http://digg.com/design/Lotus_flower_tattoos Lotus flower tattoos
Portraits from photos
http://digg.com/design/Portraits_from_photos Portraits from photos
Building a Successful Business
http://www.netscape.com/member/parkins/activity/stories Building a Successful Business
Real Estate Investing View
http://realestate.netscape.com/story/2007/06/01/real-estate-investing-view Real Estate Investing View
Art and Socieity
people are stranger