XML has changed the Web dramatically over the last decade, though not at all as originally planned. XQuery, though, is gathering steam to drive a new round of potentially invigorating changes, even as Ajax heads down the JSON path.

For the last three days I’ve sat through the XML on the Web track at XML 2006. I’d chaired the track, laying out a lot of the sessions, so it wasn’t likely to be surprising - but it still was surprising.

The conference was celebrating ten years since the initial announcement of XML, its first presentation to the world at SGML ‘96 here in Boston. Jon Bosak told the story of that presentation, and the concerns the group had had in making it, at the close of the conference.

I arrived in XML about a year after that successful announcement, trying to figure out how to explain XML to people who weren’t SGML experts. I was a web developer, not an SGML guru, but XML seemed to me to offer unlimited promise for improving the Web. The pieces weren’t all clear, but clean structures and better labeling sounded to me like an excellent recipe for sharing and storing information.

Unfortunately, it hasn’t worked out quite as planned. XML was originally “SGML on the Web”, planned as a drastic change. Instead of tag soup HTML and just-implemented CSS, XML would provide a solid base format for information, XLink would offer much improved hypertext capabilities, and XSL would render all that information into something much prettier and more controllable.

XML pretty completely missed its original target market. SGML culture and web developer culture seemed like a poor fit on many levels, and I can’t say I even remember a concerted effort to explain what XML might mean to web developers, or to ask them whether this new vision of the Web had much relationship to what they were doing or what they had planned. SGML/XML culture and web culture never really meshed.

Instead, XML took over much of SGML’s role in the publishing world, offered itself as a common format for data interchange, and became the foundation for a whole set of projects mistakenly called ‘web services’. That work on web services has proven to be about as successful on the Web as the original “SGML for the Web” vision.

Once again, no one paused to asked the web folk what they might actually want, and programmers’ visions yielded a replacement for CORBA and mainframe messaging rather than lightweight tools that web developers might find useful. No matter, though - those web developers eventually figured out that HTTP could carry bits of XML well enough to solve many of their problems, without requiring them to dive deeply into either XML or protocol design.

XML’s use in simple HTTP transactions combined with Dynamic HTML to yield Ajax, a technology that has caught fire since it was given a name. As it turns out, however, while XML helped that technology emerge, its developers aren’t particularly grateful, or particularly attached to XML. JSON, a much more compact notation based on JavaScript, turns out to be much more efficient for passing data among computers, so long as that data isn’t complex enough to require things like mixed content. JSON seems likely to keep XML’s future in web clients small, but XML did help create an environment where JSON can thrive.

Douglas Crockford’s talk on JSON irked some attendees who’d like XML to be the only answer for data interchange, but demonstrated once again that it isn’t XML that excites people: it’s exchanging data. JSON seems well-positioned (with support in probably as many environments as XML) to take over a huge subset of transactions that could have been XML but don’t actually need to be XML. I wish we’d realized back in 1999 that JSON could carry that load - it might have kept some of the worst disasters of the “XML is for data! No, it’s for documents!” conflict from doing the huge damage they did, especially in the enormous, brittle, and broken beast that is W3C XML Schema.

Another appealing, if dangerous, feature of JSON is its ability to let developers break out of the browser sandbox. As Noah Mendelsohn of IBM was wondering at the closing dinner, the security geniuses who designed browser scripting perversely decided to leave the door open for scripts to come from anywhere, but those scripts could only call back to the source of the page for the (hopefully safer). Since JSON bridges the script/data border, it offers much more flexibility.

It’s a dangerous field to enter, but it seems like it may be time that browser security models move beyond the sandbox. Ajax is pushing the boundaries further and further, banging the sides of the sandbox harder and harder as users ask for more features, and developers strain to provide them. Fabrice Desre, the first presenter in the track, showed off an approach using XUL and XPCOM that let him put an XQuery-based XML database into the browser itself for local storage, using it as a cache in ways that just can’t happen in most browser applications today. There’s huge potential in that simple innovation - but today, you have to write using XUL to make it work in a single browser, not just Ajax across multiple browsers.

There were echoes of the HTML vs. XHTML, XForms vs. Ajax, SVG vs. Flash, and other Web-related battles, but for the most part, those battles don’t seem to matter much to developers. Vendors are arguing over their details, and there was a lot of griping about software vendors, but developers seem to be figuring out all kinds of workarounds, even if they aren’t pretty.

Different technologies are finding homes in a variety of niches, mixing and matching with sometimes perverse degrees of XML-ness. One of the more intriguing technologies was GRDDL, which hopes to use a bit of XSLT to extract RDF from HTML, especially microformatted HTML. Not a lot of people outside the core Semantic Web community actually want to create RDF, but extracting it from what’s already there can be useful for a wide variety of projects. (RSS and Atom are first and relatively easy steps that direction.)

The one XML technology that seems likely to change the Web is XQuery. I won’t claim that it’s because of any intrinsic elegance - it’s pretty ungainly, and even read-only in the spec’s current state. Nonetheless, XQuery was unquestionably the hit technology of XML 2006, after years of growing impatience about whether it would ever actually reach completion.

Why are people excited about XQuery? A lot of them (like O’Reilly) have large piles of XML that they created and would now really like to search, manipulate, and repurpose. Publishers, at least those who’ve been moving to XML, are starting to realize that there’s a huge business bonanza awaiting them as soon as they can figure out how to sell the results of their queries. Presenters were able to show real projects showing substantial results, both technical and financial. (You can get a small sense of what XQuery can do with existing data over at labs.oreilly.com, which is built by querying XML versions of most of O’Reilly’s books.)

It’s not just publishers, though - XQuery offers developers a chance to escape from relational databases and the mashes of tables and BLOBs that have been ugly to manage and unpleasant to use. We’re coming to a point where it will be fair to say that data which fits in tables can go in tables, and data which doesn’t fit easily in tables can be stored and queried as XML. XQuery can deal with both the relational data and the hierarchical XML. (The missing piece there is graphs, though the RDF folks are working on that as well.)

XQuery also promises to flatten the tasks of collecting data and arranging it for presentation, because it’s a transformation language (with the same capabilities as XSLT) as well as a query language. Instead of having to write SQL to get your data and code in another language that formats it, you can perform both steps in XQuery, if you want.

Apart from Desre’s example of using XQuery within a browser, it seems like XQuery will be staying mostly on the server side, supplementing and sometimes replacing scripting languages. I like the idea of a little XML database inside of a web browser for local storage, but it’s overkill for a lot of cases and not likely to happen soon, especially not in a way that’s compatible across browsers. (Though maybe it could be done through Flash somehow…)

It seems like the Web is reaching the point where there are simple tools available for simple tasks, and powerful tools available for complicated tasks. XML’s arrival after the main burst in browser development in the late 1990s meant that its support has come slowly, when its support has come at all. The meandering progress - when it’s been progress - of standards bodies hasn’t helped either.

Web developers today work in an environment strewn with the rubble of past efforts to improve the Web. Fortunately, a lot of that rubble is useful, once you figure out how to assemble it. Developers, at this show and across the world, are showing more and more creativity in using what we have, and pushing the boundaries of what users expect further and further. The web is an exciting place to build again, as old pieces come together in new projects.

Will XQuery open up vast new horizons? Time will tell. I’m feeling optimistic, just this once.