XML has changed the Web dramatically over the last decade, though not at all as originally planned. XQuery, though, is gathering steam to drive a new round of potentially invigorating changes, even as Ajax heads down the JSON path.
For the last three days I’ve sat through the XML on the Web track at XML 2006. I’d chaired the track, laying out a lot of the sessions, so it wasn’t likely to be surprising - but it still was surprising.
The conference was celebrating ten years since the initial announcement of XML, its first presentation to the world at SGML ‘96 here in Boston. Jon Bosak told the story of that presentation, and the concerns the group had had in making it, at the close of the conference.
I arrived in XML about a year after that successful announcement, trying to figure out how to explain XML to people who weren’t SGML experts. I was a web developer, not an SGML guru, but XML seemed to me to offer unlimited promise for improving the Web. The pieces weren’t all clear, but clean structures and better labeling sounded to me like an excellent recipe for sharing and storing information.
Unfortunately, it hasn’t worked out quite as planned. XML was originally “SGML on the Web”, planned as a drastic change. Instead of tag soup HTML and just-implemented CSS, XML would provide a solid base format for information, XLink would offer much improved hypertext capabilities, and XSL would render all that information into something much prettier and more controllable.
XML pretty completely missed its original target market. SGML culture and web developer culture seemed like a poor fit on many levels, and I can’t say I even remember a concerted effort to explain what XML might mean to web developers, or to ask them whether this new vision of the Web had much relationship to what they were doing or what they had planned. SGML/XML culture and web culture never really meshed.
Instead, XML took over much of SGML’s role in the publishing world, offered itself as a common format for data interchange, and became the foundation for a whole set of projects mistakenly called ‘web services’. That work on web services has proven to be about as successful on the Web as the original “SGML for the Web” vision.
Once again, no one paused to asked the web folk what they might actually want, and programmers’ visions yielded a replacement for CORBA and mainframe messaging rather than lightweight tools that web developers might find useful. No matter, though - those web developers eventually figured out that HTTP could carry bits of XML well enough to solve many of their problems, without requiring them to dive deeply into either XML or protocol design.
XML’s use in simple HTTP transactions combined with Dynamic HTML to yield Ajax, a technology that has caught fire since it was given a name. As it turns out, however, while XML helped that technology emerge, its developers aren’t particularly grateful, or particularly attached to XML. JSON, a much more compact notation based on JavaScript, turns out to be much more efficient for passing data among computers, so long as that data isn’t complex enough to require things like mixed content. JSON seems likely to keep XML’s future in web clients small, but XML did help create an environment where JSON can thrive.
Douglas Crockford’s talk on JSON irked some attendees who’d like XML to be the only answer for data interchange, but demonstrated once again that it isn’t XML that excites people: it’s exchanging data. JSON seems well-positioned (with support in probably as many environments as XML) to take over a huge subset of transactions that could have been XML but don’t actually need to be XML. I wish we’d realized back in 1999 that JSON could carry that load - it might have kept some of the worst disasters of the “XML is for data! No, it’s for documents!” conflict from doing the huge damage they did, especially in the enormous, brittle, and broken beast that is W3C XML Schema.
Another appealing, if dangerous, feature of JSON is its ability to let developers break out of the browser sandbox. As Noah Mendelsohn of IBM was wondering at the closing dinner, the security geniuses who designed browser scripting perversely decided to leave the door open for scripts to come from anywhere, but those scripts could only call back to the source of the page for the (hopefully safer). Since JSON bridges the script/data border, it offers much more flexibility.
It’s a dangerous field to enter, but it seems like it may be time that browser security models move beyond the sandbox. Ajax is pushing the boundaries further and further, banging the sides of the sandbox harder and harder as users ask for more features, and developers strain to provide them. Fabrice Desre, the first presenter in the track, showed off an approach using XUL and XPCOM that let him put an XQuery-based XML database into the browser itself for local storage, using it as a cache in ways that just can’t happen in most browser applications today. There’s huge potential in that simple innovation - but today, you have to write using XUL to make it work in a single browser, not just Ajax across multiple browsers.
There were echoes of the HTML vs. XHTML, XForms vs. Ajax, SVG vs. Flash, and other Web-related battles, but for the most part, those battles don’t seem to matter much to developers. Vendors are arguing over their details, and there was a lot of griping about software vendors, but developers seem to be figuring out all kinds of workarounds, even if they aren’t pretty.
Different technologies are finding homes in a variety of niches, mixing and matching with sometimes perverse degrees of XML-ness. One of the more intriguing technologies was GRDDL, which hopes to use a bit of XSLT to extract RDF from HTML, especially microformatted HTML. Not a lot of people outside the core Semantic Web community actually want to create RDF, but extracting it from what’s already there can be useful for a wide variety of projects. (RSS and Atom are first and relatively easy steps that direction.)
The one XML technology that seems likely to change the Web is XQuery. I won’t claim that it’s because of any intrinsic elegance - it’s pretty ungainly, and even read-only in the spec’s current state. Nonetheless, XQuery was unquestionably the hit technology of XML 2006, after years of growing impatience about whether it would ever actually reach completion.
Why are people excited about XQuery? A lot of them (like O’Reilly) have large piles of XML that they created and would now really like to search, manipulate, and repurpose. Publishers, at least those who’ve been moving to XML, are starting to realize that there’s a huge business bonanza awaiting them as soon as they can figure out how to sell the results of their queries. Presenters were able to show real projects showing substantial results, both technical and financial. (You can get a small sense of what XQuery can do with existing data over at labs.oreilly.com, which is built by querying XML versions of most of O’Reilly’s books.)
It’s not just publishers, though - XQuery offers developers a chance to escape from relational databases and the mashes of tables and BLOBs that have been ugly to manage and unpleasant to use. We’re coming to a point where it will be fair to say that data which fits in tables can go in tables, and data which doesn’t fit easily in tables can be stored and queried as XML. XQuery can deal with both the relational data and the hierarchical XML. (The missing piece there is graphs, though the RDF folks are working on that as well.)
XQuery also promises to flatten the tasks of collecting data and arranging it for presentation, because it’s a transformation language (with the same capabilities as XSLT) as well as a query language. Instead of having to write SQL to get your data and code in another language that formats it, you can perform both steps in XQuery, if you want.
Apart from Desre’s example of using XQuery within a browser, it seems like XQuery will be staying mostly on the server side, supplementing and sometimes replacing scripting languages. I like the idea of a little XML database inside of a web browser for local storage, but it’s overkill for a lot of cases and not likely to happen soon, especially not in a way that’s compatible across browsers. (Though maybe it could be done through Flash somehow…)
It seems like the Web is reaching the point where there are simple tools available for simple tasks, and powerful tools available for complicated tasks. XML’s arrival after the main burst in browser development in the late 1990s meant that its support has come slowly, when its support has come at all. The meandering progress - when it’s been progress - of standards bodies hasn’t helped either.
Web developers today work in an environment strewn with the rubble of past efforts to improve the Web. Fortunately, a lot of that rubble is useful, once you figure out how to assemble it. Developers, at this show and across the world, are showing more and more creativity in using what we have, and pushing the boundaries of what users expect further and further. The web is an exciting place to build again, as old pieces come together in new projects.
Will XQuery open up vast new horizons? Time will tell. I’m feeling optimistic, just this once.


Hey Simon,
Nice write up! Theres only one thing that I disagree with, though only partially,
> XQuery also promises to flatten the tasks of collecting data and arranging it for presentation, because it's a transformation language (with the same capabilities as XSLT) as well as a query language. Instead of having to write SQL to get your data and code in another language that formats it, you can perform both steps in XQuery, if you want.
Unfortunately the transformation portion of XQuery doesn't work from a template perspective. WAY TOO MUCH code rewrite, and/or awkward imports are required to accomplish the reusable aspect that XSLT offers as its defacto foundation.
The solution?
Use XQuery to gain pinpoint precision access to the data you want, outputting the result in a "Lingua Franca" XML format of your personal choice. I personally think Atom provides a nice envelope format for both the over-the-wire and storage piece of the puzzle, but there are certainly others. In regards to the XML format of your choice, obviously this will be determined by the output needs -- If your documents are all web-based, then XHTML makes a lot of sense. If they're more document-oriented, the ODF or (possibly) OpenXML seem to be good candidates, ODF being the simpler of the two to groc.
With these in place, then converting from one format to another becomes nothing more than using reusable transformation files to convert from the storage format to the desired viewing format.
XSLT 2.0 offers some nice benefits in regards to managing the output into multiple document formats, as well as simplifying various pieces of XSLT (like grouping) that are EXTREMELY complex using XSLT 1.0. But EXSLT offers a lot of the same benefits (excluding multiple outputs as part of the same transformation process, though it wouldn't surprise me to see that change in EXSLT 2.0 if this were ever to become a reality (and I think it will)) and EXSLT has quite a few more implementations out there than does XSLT 2.0, though with Saxon/Saxon on .NET you gain the benefit of both XQuery and XSLT 2.0 as part of the same engine, so I'm not sure if that point is one that can be viewed as any sort of dividing line between whether XSLT 2.0 or EXSLT is the more practical of potential transformation solutions.
Of course, with all of this, this is only my own opinion. There are certainly other opinions, each of which have just as much validity, so for what its worth, there ya have it.
Thanks for the write up, Simon! Tons of great info in here to reference/learn from!
One thing I should point out: I started writing this as a direct response, and then immediately turned it into a generalized response for follow-up readers. I probably came across as if I thought I was telling you (Simon) things you were obviously more than fully aware of. My apologies if it came across this way!
Simon optimistic about XML? Check the weather report in Hades! Watch out for flying pigs!
But seriously ... there seem to be two somewhat contradictory points here -- the rise of JSON and the maturation of XQuery. Doesn't the former undermine the importance of the latter? How can XQuery change the web if the Web consists of tag soup for humans and JSON for machines? Or will the availability of XQuery breathe new life into the "SGML for the Web" vision and we'll see a lot of XML that's worth querying and mashing up with XQuery?
I guess I forsee a world where XML and XQuery stand beside RDBMS and SQL behind the firewall, but it is still mostly HTML, JavaScript, and JSON exchanged over the public web. Despite Roger Bamford's prediction at XML 2006 http://2006.xmlconference.org/programme/presentations/163.html I don't think "XML + XQuery(P) + Apache + REST == no middle tier". Instead, we'll still have that middle tier speaking XQuery and SQL to the back end and HTML+JSON (and some XML like RSS/Atom) to the web clients. Sane people won't expose their XQuery stores over the web any more than they would expose their SQL stores over the Web.
Mike has it right. Simple things tend to become complex when one of the fundamentals have to be optimized. I blogged that topic comparing the evolution of internal combustion engines and the demise of the shade tree mechanic along the way. The surprise for some will be the demise of the 'it must be simple' web developers for whom so much has been sacrificed.
For a decade, we spent our time trying to get SGML accepted by programmers. For our sins, we succeeded, perhaps and the bewildering families of reams of specifications were piled on top of that thin paper that was tossed onto the floor in '96. The secret was that nothing of much importance can be done with something that simple unless the other piles are created as well, or some practical applications are sacrificed.
At this point, I think it's easy to see that the struggle has shifted from web developers vs markup technologies to web developers vs programmers and that is a sea change most have yet to understand or talk about. When the internal combustion engine began to be optimized for weight and fuel consumption, none of the principles (fuel + fire + air in the right combination = power) really changed except 'the right combination' and 'the complexity of the engine needed to make it work. But the knowledge and skills required, the tools required, and the learning curve changed incredibly and the shade tree mechanic became an evolutionary dead end.
I think the big change coming is XAML, not XQuery. What is in play now is where Joe WebPage goes, what tools/objects are required, and if the web as the haven of 'web developer's over professional programmers' is a vanishing niche along with the XML conferences. Note the SGML conferences experienced a similar decline. If it follows the same path, it breaks up into conferences for each middle tier object required for each class of application language. XML knowledge is assumed the same way plumbers assume one can solder; just another tool and technique.
David: Your suggestion (quite a good one) of using XQuery to assemble data and handing off that data for presentation transformation with XSLT was one of the lessons highlighted by Michael Kay's presentation on Meta-Stylesheets, which I described here.
@Keith,
Thanks! (left additional comment at post)