I was impressed with the size of the <XML2006> conference (a name that must have been chosen by a designer, not by anyone who has to write Web pages), which took place in Boston this week. Rooms were overflowing for many topics, indicating that several large industries and government bodies have settled on XML for document storage and automated text manipulation. Microsoft Office still seems the editor of choice for most people writing and editing documents. The poor XML output of Office is a major irritant.

Self-processing data

No one is more interested in XML processing than the XML specifications committees. Over the years they’ve entered ever narrower corridors in the specification maze, forging specifications that determine not only what XML stores, but how the stored data is processed. The most salient example is XSL, but I found two new ones at the conference.

I’ve never quite seen the need for XML-driven processing. Programmers have been quick to use their favorite procedural languages to parse and manipulate XML. What happens when the processing moves to XSL is that procedures are replaced by declarations. It’s a major shift in thinking for programmers to express their needs in declarations. The XSL goal of self-processing data (data that tells you how it’s meant to be processed) is a complex task.

The new efforts I heard of along these lines at <XML2006> were Remote Events for XML (REX) and the XML Processing Model (XProc). Like XSL, they take on tasks that have been done by languages outside XML for years, and put them between angle brackets.

REX deals with the activities that current Web pages perform through onclick and onsubmit functions, along with other activities that (from the point of view of the programmer) look like events. If this effort at standardization works, the JavaScript functions that respond to such events will be replaced by REX XML documents.

XProc is meant to replace build procedures, which are currently done through techniques ranging in sophistication from manual activities and Unix shell scripts to Ant or commercial business processing tools. XProc looks more procedural than declarative. One defines the types of processing (such as the use of XSLT) and then strings them together in an XML document.

Norm Walsh, leader of the working group, was questioned heavily at his XProc presentation on why they want to duplicate so many other tools. As with other such specifications, the hope is to achieve more portability and predictability.

Skeptics about these new bricks in the XML edifice could do well to quote Douglas Crockford, the JavaScript expert who standardized JSON. In one of the deepest statements I heard at the conference, Doug said, “Being well-formed and valid is not the same as being correct and relevant.” In context, this meant that the automated processing done on XML documents was not the type of information programmers really need to worry about. Doug followed up by saying that ultimately, every application is responsible for validating its own data.

JSON

We are now no longer talking about XML in this blog; we have switched to the entirely different subject of JSON. JSON is a structured plain-text format that has no pretensions to being anything but a structured plain-text format; one couldn’t imagine a three-day conference about JSON.

Since it’s in widespread use, stable (Doug hasn’t even assigned a version number to the standard, because he doesn’t expect it to change), and thriving, the most urgent issue is getting his JSONRequest protocol implemented by browsers. He pointed out that mash-ups, because by definition they bypass the normal “Same Origin Policy” for web pages, are currently using nonsecure data transfer methods that leave browsers open to activities by untrusted sites. A JSONRequest would limit the data sent and received, notably by omitting client cookies.

It was gratifying to see that the JSON talk was one of those bringing an overflow crowd, who definitely had challenging questions but seemed highly interested.