Sun meistergeek Jonathan Schwartz has a good blog Why Open Standards Matter; its permalink name is why odf matters which gives the flavour.
But the question comes up: does adopting or supporting a standard entail tracking changes or evolutions to the standard? I say yes, it must. The world evolves. And as it does, the value of the original standard diminishes. Because the benefit of a standard is largely the network effects, from easier documentability (because the standard acts as a base-level documentation), from multiple implementations, from interoperability, from ubiquity, and so on: in ISO terms from agreement.
But a company that keeps to an old standard while the rest of the world moves on, evolving and improving the standard, is in fact involved in disagreement! They dilute the usefulness the evolved standard but also the usefulness of their own implementation, by inactivity.
HTML is a pretty moribund open standard; after HTML 4 came out, almost all the effort has been diverted on XML languages and XHTML, and only in the last couple of years have the browser makers started to innovate again, with <canvas> and other new elements. But HTML 4 is an ISO standard too, and it meets most criteria of openness that are based on IPR or multiple implementations.
So I hope Jonathan Schartz’s espousal of open standards trickles down to the Java Swing gurus. Swing’s HTML has not even kept up with HTML 4. And despite the great codebase (and it is so nice to see code that is literate about SGML) a bog broom is needed. I suppose it shows a flaw in the Bug Parade system: it is easy to write individual flaws but difficult to use for larger-scale critiques. These critiques are best addressed by systematic code audits, where a package or module is analysed based on a set of concerns.
What kinds of things? Well, if I were the King of Swing I would conduct two audits. I don’t expect either would result in refactoring or new APIs, just a list of fixes.
The first would be based on the nature of HTML as, above all, a method of sloppy communication. XML has a strict syntax. SGML allowed a lot of freedom, but with no error recovery. HTML is all error recovery: the show must go on! In an HTML browser, a syntax error has no business in generating an (uncaught, unhandled) exception. Indeed, the expectation that the HTML will have errors is so strong that possibly the exception should be generated if there are no errors! But Swing’s HTML parser code follows the SGML tack rather than the HTML tack: given that the original code was written to cope with other kinds of SGML, but never was used for that purpose, it was an understandable design decision. But it is not needed now. So the first audit would be to go through swing.text.html.* and remove every exception that is generated based on a syntax issue.
The second audit would be a checklist to systematically modernize the HTML infrastructure, to make the way clear for future improvements to HTML 5 semantics. This checklist would be:
- Encoding (bytes->characters): Are encodings handled correctly? Are all the IANA encodings allowed? What about for CSS?
- Lexical (characters->delimiters->tags): Are hex numberic character references allowed and resolved against Unicode? Are the complete ISO entity sets and MathML’s sets accepted? Are empty element tags like <x /> accepted? Is the XML header accepted? Can simple XHTML be accepted? Is there anyway to register a pre-processor like HTML Tidy as a part of entity management to offload processing of particularly crapulous HTML? Is full CSS syntax allowed, even if not acted on?
- Syntactical (tags->information set): is tag unminimization correct, for example are two open tags for paragraphs treated as nesting or siblings?
Now none of these relate to esoteric display features, some relate to design bugs (which therefore get thrown into the RFE basket). But the trouble with the Bug Parade is that it is easy to miss the wood for the trees: rather than see the HTML issues as individual problems that need to be independently prioritized, an audit allows reported issues to be grouped together and fixed in concert.
B.t.w., I should note that in the Java 6 source code, many of the swing.text.html Java classes have a recent datestamp, which is a good sign.