There has been so much disinformation put out about the limited review time for OpenXML, that it might be salutory for people to revisit a review of the Open XML draft I put on this blog dated Thursday May 25, 2006.

You read it: May 2006. That is 22 months ago! Not “5 months”, not even 9 months as the claptrappists say. June, July, August, September, October, November, December, January 2007, February, March, April, May, June, July, August, September, October, November, December, January 2008, February, March.

To the people who are saying they have not had enough time in 22 months, I have no sympathy: you should have been reading my blog! :-)

I want to go through all of what I said then. I think it holds up really well.

A new draft of Open XML came out on my birthday. 4081 pages of PDF, and very impressive for anyone who has worked on specification and standards. Two things stick out: first how horrible XML Schema fragments are when stuck inline to document structure; second, how the implementation-neutral tone of the introduction is at odds with the elements for various kinds of Active X embedded objects. I suspect people would be a lot more comfortable if the elements for Active X embedded objects were in a different namespace, and gathered into an appendix of some kind. Antiques and curios. It will be interesting to see what the extensibility strategy will be (it hasnt been released in this draft.)

By halfway through the Ecma period, the spec had doubled in size with extra material from its original submission of about 2000 pages from Microsoft. In the subsequent six months it increased by the same amount. So much for the idea that Ecma TC45 merely rubberstamped the original submission from Microsoft.

The comment about the horribleness of XML Schema fragments is still one I’d make. The BRM at least made them non-normative, but it did not agree to remove them entirely. I expect when people see the new generation of multi-format standards that some SC34 people are championing, where you can turn on and off normative sections, we can see the end of this clutter at reader request, which is perhaps the sweet spot.

The comment about Active X of course later became a mantra, with various demands that either DIS 29500 should have no normative reference to proprietary binaries or that it should more (to bring them under the OSP). But it was an important issue that was addressed during the BRM and can benefit from continuing vigilance. The idea of gathering legacy proprietary elements into some kind of appendix is exactly what happened, at least for the compatibility elements, at the BRM. (I don’t know that many of the participants at the BRM would have been comfortable with namespace-based notions of conformance, I didn’t get the impression that using namespaces or schemas as tools was on many delegate’s radars, no disrespect intended.)

The extensibility strategy came out as a separate part, with no significant trouble as a technology. Though some people have subsequently discovered that extensibility and “openness” (meaning guaranteed receipt) do conflict: this is something I have repeated talked about: the need for profiles. On the general subject of extensibility and interoperability, Joel Spolsky has another good article this week: Martian Headsets

On the technical merits, well actually I dont know if they matter much. I say potato. Exporting to HTML or XHTML gives people base-level interoperability for most documents, which neither ODF nor Open XML will challenge; at the high end the solution is exporting to XML using a domain-specific schema (e.g. S1000D for military & aerospace) and not ODF or Open XML at all; in the casual middle we will have ISO ODF available, perhaps as the interchange format of choice, as well as ISO Open XML (if it is accepted) for when you need to track MS Offices capabilities closely. I think there is substantial value in a standard XML format for MS Office documents even within organizations that will mandate ODF for interchange and archiving. The availability of the alternatives reduces the need for ODF or Open XML to be the one true interchange format.

I think I still agree with everything there. (By technical merits, my point is not to do with the state of the draft, but about doctrinaire views on optimal technology which are ultimately subjective, and the benefits of plurality.)

I still think ODF is the appropriate format of choice for level-playing-field document interchange, especially for governments, though it seem ODF 1.2 and 2009 are the more realistic time-frames for this. And Don’t forget about HTML!

Probably coming from the industrial publishing background biases me here: the need for dumbed down interchange formats is real sure enough, but the need for intricate close-to-the-metal feature-exposing typesetting feature access is also important for different contexts. Word’s binary formats and RTF’s weaknesses have long held Microsoft’s applications back from being happily usable in serious industrial publishing systems (or, at least, have often held back the people who adopted them.)

+1

UPDATE: Fzzzzz Fzzzzzz Claws out. Please note that I wrote the above 8 months before my Wikipedia job for MS, so I was not writing with any kind of relationship to Microsoft. And as I write now, I have no financial relationship with them. I expect this is pretty embarrassing to the people namecalling me a paid Microsoft shill whose opinion was bought: especially those who actually are fulltime paid shills for rival multinationals! And guess what: consistency… Don’t you guys have better things to do than call people names and threaten them?