Opera’s Anne van Kesteren has blogged that HTML browser makers should “define graceful error handling for XML, put some IETF, W3C or WHATWG sticker on it, label it XML 2.0 and ship it.” It has been widely reported.
My company, Topologi, has been using exactly such a grammar for five years in some of our products. We use it as our HTML and SGML editing mode, its not perfect but perfectly workable in most cases. It seems to be in Ann’s ballpark for an “XML 2.0″.
ECS (”Editor’s Concrete Syntax”) takes XML and puts back the forms of end-tag minimization and close-delimiter omission (which is what Ann is calling “graceful error handling” AFAICS) that XML removed from SGML under the mantra ‘terseness is of minimal importance.” This moves it much closer to idiomatic HTML; it is Forgiving XML rather than Superbitch XML, and this is far more suitable for just folks to use.
For example, the ECS
<p id=t1><b<i>Bill & Ben</> &mdash Flowerpot Men</p>
has the same information set as the XML:
<p id="t1"><b><i>Bill & Ben</i> — Flowerpot Men</b></p>
Like XML, ECS can be formally defined in terms of ISO IS8879 SGML, so it is immediately suitable for situtations where ISO standards are required. When IS8879 was revised to allow XML, another change was made, introducing an idea of “amply tagged” which is a less restrictive form of “well-formed “: this has been sitting there waiting to be used if industry ever wanted an “XML with error handling.”
Amply-tagged documents allow various forms of end-tag minimisation. In fact, you can even completely leave out the end-tag for an element, if it would appear as the last element in its parent or if it is followed by another element the same as it. How does it do this? By making the tiny restriction that an element cannot contain itself directly as a child (further ancestor is OK). This allows you to have, for example, runs of paragraphs that have p start tags but no end tags, HTML style.
An ECS scanner is in fact no more complicated than an XML scanner, just a handful of transitions in some states are different. Like XML, ECS does not use a DTD or built-in knowledge of special elements to change parse mode in different elements.