A couple of weeks ago, the W3C made an announcement that caught a great number of people by surprise. After nearly a decade of inactivity, the HTML working group was being restarted, in order to handle the fairly significant amount of development that has occurred on top of the HTML standard since HTML 4
.3 became the last formal HTML standard prior to the introduction of XHTML.

I have to admit to some qualms about seeing this. I’m not doubting that it isn’t needed - XHTML’s adoption has been comparatively slow because of the legacy base of HTML out there, the introduction of AJAX has shifted the balance of power to imperative scripting, and the realization is increasingly being made that the namespace issues dividing HTML and XHTML are beginning to tear the standards apart.

The question, however, comes back to the role that XML ends up playing in all of this. HTML has its own DOM, can in fact be treated as quasi-XML-like, but it most demonstrably isn’t well formed XML in most cases. For those of us who have been pushing XHTML adoption in industry, this is going to be seen as a fairly major step backwards, as it has the potential to make browser developers decide that perhaps incorporating XHTML support isn’t that big of an issue, and can be pushed off for a release or ten.

However, there is a silver lining in all of this, and that comes from statements that Tim Berners-Lee made when he announced the new working groups. The goal here is the long term harmonization of HTML and XHTML. This will likely occur first at the molecular level - rewriting the HTML DTDs to remove the inconsistencies between what is considered valid HTML and valid XHTML. To explain that in a little more detail - there are a number of elements in the HTML 4.3 DTD that prevent the expression of that language as XHTML: singleton attributes with no corresponding value, tags that have no explicit closure, the lack of support for namespaces, the distinction between the formal upper case notation of HTML elements and the lower case preferred notation of XHTML, and so on. It is likely that an HTML 5.0 DTD would in fact allow for a valid XHTML1 or XHTML2 schema as one potential use case. It still becomes the responsibility of browser creators to update their DTDs and conformance engines, but this should in general not be as difficult a proposition as it may sound.

Namespaces are perhaps the next major thorn. Namespaces should not necessarily be unique to XML - there’s nothing in fact in the original 1.0 specification that limits them from being applied to an HTML5, but obviously HTML 4.01 as it stands does not have any notion of them. I suspect that such namespace issues will likely end up becoming a much more complicated nut to crack, however, because one of the key arguments against the adoption of namespaces has been the complexity (sic) that namespaces introduce into XHTML documents. Again, I would suspect that what may end up emerging out of this is some kind of formal detente in which it becomes possible to create umbrella namespaces that can in fact act as proxies to subordinate namespaces, where collisions would be resolved by antecedents in the HTML/XHTML structure (or perhaps by formal declaration of namespaces JUST for those elements) but I worry that in the end the solution may end up becoming more of a problem than the problem.

Namespaces in turn play fairly heavily into three distinct areas: XForms, SVG, and XBL2. XForms may perhaps be the biggest winner in all of this. If you assume that XForms elements such as <xf:model> and <xf:instance> end up becoming incorporated into HTML5 as <model> and <instance> respectively, then you have the ability to have multiple data models sitting within the a given HTML page, that can be referenced via an input element (i.e., the HTML5 <input> element recognizes <input ref=”foo”> and<input bind=”foo”> as syntactically valid). This has already been happening with a number of XForms implementations (such as Orbeon, I believe) that can work within an HTML context, this becomes just a matter of making such changes canonical.

Admittedly, if you have both bound and unbound <input> elements, I see the potential for coding disasters to increase somewhat, but probably not dramatically, and it has the potential of making the XForms data model available as part of the HTML DOM (and hence consistently manipulatable from JavaScript).

SVG may also end up being a big winner under an HTML upgrade. Mozilla, Opera, Safari and Konqueror all now have some SVG implementation. With Chris Wilson heading up the working group (yeah, Chris!!), Microsoft will also be in a position where it makes a great deal of sense for them to be seen in a leadership role here, and I think that an SVG implementation in Microsoft, which was looking increasingly faint, now has some real legs again. The SVG would, of course, still be XML - this would be guaranteed by the changes in the DTD that would make it possible for XML and HTML to legally coexist in the same document. Apple’s recent joining of the SVG also indicates to me that the graphics language is here to stay on the browser (there’s enough here for another blog on the fortunes and vicissitudes of SVG which I will defer to a later post).

The final aspect here that I think will prove huge is the impending publication of XBL2 as a formal recommendation. I’ve been critical of Ian Hickson in the past, but I believe he deserves loud kudos for pushing this particular standard through the W3C. The purpose of XBL2 is to enable behaviors - enabling users to create “custom” extensions to the element namespace that had additional functionality beyond simply being a DOM element. 

Such XBL serves three purposes - it makes it possible to reduce or even eliminate the need of “naked” JavaScript within XML documents, which reduces the security risk that such coding opens up, it encourages significant code reuse and the decoupling of the designers from the developers (which in turn makes it possible to create tools to automate both sides of the equation much easier) and it makes the code itself more declarative and validatable at a local level, which I believe strengthens the case for validation as a critical part of web development.

If, as seems likely, XBL2 becomes an integral part of the discussion for an HTML5 target, then the groundwork will be laid to push the web back into a form that has all the functionality that the AJAX fold want while at the same time remaining nominally declarative. It doesn’t solve the security problems associated with AJAX, but it does tend to encourage modularization and generally good programming practices that can go a long way to mitigating those problems.

Overall, I think the central challenge to such a working group is to stay on task and “fast-tracking” as much as possible. If the WG becomes a forum for people airing their grievances about competitors, then I suspect the talks will collapse fairly quickly. On the other hand, I think re-establishing the working group has the potential to help ameliorate many of the minor nits and inconsistencies that have arisen over the years between the various standards, it may represent a chance for certain organizations to move towards where the consensus is on web technologies without losing face, and it has the potential to push many worthwhile but “blocked” technologies back into the larger sphere of HTML in a consistent and hopefully non-acrimonious manner

Kurt Cagle is an author and systems architect concentrating on XML and Web 2.0 related technologies. He is also the webmaster for The XForms Portal, a news portal and code resource base for XForms related code, and  The XForms Community Forum, a forum for exploring questions about XForms technologies. This article was first published to UnderstandingXML.com, Kurt’s blog on the XML industry.