Håkon Wium Lie’s recent CNET column is entirely dodgy in its details, but solid in its ultimate premise (can a premise be ultimate?): in all the talk about ODF and OOXML, it is important not to lose track of HTML’s potential and actual suitability for much document interchange.

I’ve endorsed this position many times, but it is worth stating it again. For simple word processing style documents, if you need interoperability (and you want to get it by restricting the kinds of structures in the document so that the documents can be read by many different applications and be easily repurposed), then HTML is the format to consider first: validated, standards compliant XHTML in particular. Think of it in terms of a continuum, with HTML at one end (simple WP documents), PDF at the other end (full page fidility but read-only): HTML, ODF, OOXML, PDF. And certainly not to forget the ultimate premise(!) of markup: to rigorously label the important information in your documents accroding to its rhetorical and semantic structures, which sometimes simply requires custom schemas and microformats, extending or augmenting or even replacing the standard formats.

On this last point, Lie has a great line: speaking of ODF and OOXML … I’m no fan of either specification. Both are basically memory dumps with angle brackets around them. Lie thinks this is a bad thing; I think it is a necessary thing: sometimes you want to only save what can beread everywhere (the case with HTML save) but usually you want to save everything that is in your document so that when you next open it, it is exactly the same publication you saved

W3C versus ISO

When looking at any writer on standards at the moment, it is good to establish point of view. Lie is employed by a competitor of Microsoft, Opera Software, whose business is based on standards from W3C not ISO. It is not shocking that his response to the issue of Office formats at ISO revolves around promoting that applications should follow W3C standards. It is not as much a non sequitur as it seems. Lie is the inventor of CSS: it is hardly suprising he does not want it sidelined, especially if organizations or governments adopt ISO formats ahead of W3C ones.

Perhaps it is time for W3C to take ISO seriously, befriend it, snuggle up to it, and put the core Web standards through some kind of fast-track procedure as well? Deal yourselves a better hand! The world of fast-track that ISO JTC1 has, in its wisdom, launched us into is intended to allow specifications from consortia that corporations can participate in and dominate (W3C, Oasis, Ecma, etc.) to get promoted up to ISO standard level, if national bodies vote to accept them. This is because ISO sees its core activity at enabling agreement, not sniping at rival brands.

ISO has taken a more constructive approach than denigrating the boutique standards consortia: it does not denigrate ECMA merely because it is designed to help companies make their technologies public with copyright-free specs fast; nor that W3C may be extra-accomodating to the larger fee-payers (such as the ‘phone calls’ that Lie mentions); nor that OASIS’ procedures made it susceptible to ‘branch stacking’ to favour one groups technology. Because, at the end of the day, ISO votes are very hard for commercial organizations to manipulate. There are simply too many countries; where manipulation is possible, of course, is when there is the appearance of a grassroots movement in many countries; however, at least some national standards bodies (and their committee members) take a dim view of lobbeying on non-technical issues, and certainly against single-issue committee people with no real interest in promoting standards.

Malthus

Lie adopts an extreme view towards overlap of standards: overlap at all brings nothing but misery and bloat. I doubt if there is much sympathy for his view that XSL was unnecessary in 1999 because of CSS (CSS now has a better set of formatting properties and selectors, but where are the transformations?) And to blame MS for XSL rather ignores James Clark’s involvement. [Lie thinks ODA and SGML were competitors, even though they took extremely different approaches (binary, fixed, application-dependent format, versus text-based, free, universal): unfortunately, he does not burden us with any facts behind his mention that 1986’s ISO SGML was more complicated because of ODA, something I hadn’t heard before. ]

But what about the dodgy details?

Borat!

Well, the first one is personal, and the reason I am bothering to write this. Lie says In the past, consultants paid by Microsoft have joined standardization groups and have become sympathetic voices. Are they buying countries this time? Is Lie saying that small nations have no interest in document and document processing languages? Again, no detail, just slurs. (I am sensitive to this, of course; when I first read it, I wondered if Lie was referring to me obliquely, which would be fairly hypocritical considering Lie himself is undoubtedly a sympathetic vote for his company at W3C. Companies or their representatives are allowed to participate in standards work indeed encouraged, but the difference with ISO is that they don’t get a final vote. But reading it carefully, it is clear that Lie knows that individuals don’t vote on ISO standards, and that he was saying something uncertain about Kazakhstan. (I worked with two brilliant programmers from Kazakhstan, via Russia, Israel and Australia, for many years, one of whom is now at the highest technical level of a government department here; the distance between their excellence and Borat only makes Borat all the funnier!)

Ambition

The next dodgy detail is to make blanket comparisons between HTML and ODF/OOXML. ODF and OOXML deal with many issues that HTML/CSS simply does not. What is HTML/CSS’s story for spreadsheets? What is HTML/CSS’s story for ZIP packaging? (Well, I think the W3C argument might be to say that every part should have a URL and be available on the web. W3C’s worldview is bounded by the web.)

Don’t laugh at the fat girl

Lie repeats the 6000 page claim, which I thought had been retired from debate along with the more embarassingly bad Groklaw material. But mud sticks, we say here in Australia. When you add SVG and all the other standards that ODF or HTML/CSS invoke or require to get similar general capabilities,and typeset them similarly, the numbers are not so different (in the order of 3000 to 5000 pages excluding primer material.) Probably I should repeat my complexity metrics analysis now we have the final schemas out (assuming the schemas correspond to what is in the standards…the lack of SC34 checking on all fast-track standards means that national bodies vote(d) for ODF and OOXML without the most basic quality check from SC34 WG1 grrr.)

Deathwish?

Lie has a strange theory that MS wants ODF and OOXML to both fail: I am not sure what possible mechanism could be used for this. “If both specifications fail, the most likely result is that the world continues to use Microsoft’s proprietary “doc,” “xls” and “ppt” formats. This is consistent with Microsoft’s attitude in other areas in which the company is pushing closed formats. For example, the MSN Messenger protocol is not public. ” Even apart from the basic problem that those formats are no longer the default save formats, I don’t get it: that MS should not finally do the right thing with one product (open up and standardise Office) because it would be inconsistent with them being bad elsewhere? With respect to Lie, this is complete crap. (We consider Hitler a monster, we consider him a monster even when he pats his dog on the head, but patting dogs on the head is a good thing even when done by monsters, and it is better for monsters to be patting dogs on the head rather than going around being monsterous. You get the idea. Just as in negotiation you need to have a position in which the opposition can consider itself to advance or not lost face, we cannot expect MS to embrace standards if we block them out of the process.)

Infidelity

But Lie is right, I think, to be alarmed by the prospect that if OOXMLfails MS will revert away from open formats. I don’t see them adopting ODF as the default format for general sale. for a start, current ODF simply does not have matching capabilities. This issue of fit is strong enough that we don’t even need to get to the issue of control. We have this nice little window now where MS is inclined to open up its formats, something that the document processing community has been pleading for for years. The ODF sideshow runs the risk of screwing this up; I’ve said it before, but I say it again: being pro-ODF does not mean you have have to be anti-OOXML. ODF has not been designed to be a satisfactory dump format for MS Office; OOXMLhas not been designed to be a suitable format for Sun’s Star Office or Open Office or IBM’s products. HTML is the format of choice for interchange of simple documents; ODF will evolve to be the format of choice for more complicated documents; OOXML is the format of choice for full-fidelity dumps from MS Office; PDF is the format of choice for non-editable page-faithful documents; all of them are good candidates for standardization, all have overlap but are worthwhile to have as cards in the deck of standards. But systems for custom markup trumps all.

Brave New World

I guess, behind it all, there is this idea that there will be one true document format. The future will be beautiful because we will have full-potential HTML. Or the future will be beautiful because we have full-potential ODF. Or the future be beautiful when we adopt the same common microformats regardless of the framework. I tend to the view that the future will never be beautiful in that kind of monochromatic (’Stalinist’ is entirely too dramatic) way, but that we need to to encourage a rich library of standard technologies, widely deployed, free, unencumbered, explicit, together with the awareness of when each is appropriate and with an adequate set of profiles and profile validators (using ISO Schematron!). Plurality. (HTML browsers are not weaker because there is GIF, JPEG and PNG, let alone TIFF, even though there is almost complete overlap.)