July 2007 Archives

Timothy Appnel

AddThis Social Bookmark Button

Last week the IETF announced that is had approved the Atom Publishing Protocol to be a Proposed Standard. If you’ve been waiting for things to get finalized this is it. In my somewhat limited understanding of the standards process, the last step is just a formality that will assign an RFC number and perhaps formatting of the specification document itself.

The Atom Publishing Protocol or simply APP, is the web services part of the Atom Working Group’s work. In summary, it’s a more advanced (and standardized) version of the Blogger/MetaWeblog APIs and its forms. It’s also a sterling example of RESTful API design.

The counterpart to APP is the Atom Syndication Format (ASF) that was approved last year as is now an official standard — RFC4287. The Atom Syndication Feed is similar in many respects to RSS, but shares the same semantics as APP and many enhancements and clarifications that an international standards process like IETF demand.

Congratulations and thanks go out to the working group that initiated and ushered this vital work along. That’s 5 years of work. An eternity in Internet time.

More from work group committee chair Tim Bray here and Sam Ruby (the man who made it all happen) is here.

In other news, glutton for punishment and looking for his next standards body process fix, APP specification author Joe Gregorio has submitted a draft for URI Templates that is based on the system implemented by OpenSearch API. [via DeWitt Clinton]

Simon St. Laurent

AddThis Social Bookmark Button

After many years where it wasn’t entirely clear what XML had to contribute to the Web, XML is finally becoming a key part of the Web’s infrastructure. I’m looking for stories to tell about this technical mixture, at XML 2007 and beyond.

M. David Peterson

AddThis Social Bookmark Button

Update: As I pointed out in a follow-up comment to Woof, one of the things I absolutely love about blogging is that is encourages interaction and communication on important subject matter that would otherwise not take place if the medium did not exist. Often times I find myself having to reevaluate my position on any given subject matter because someone has forced me to do just that via a blog post they’ve written or via a follow-up comment to one of my own blog posts. Such is the case I am currently faced with,

But I groan over this particular post because you generally attack rules of clear writing, which is all S&W (and others like them) are trying to promote, with a class of “hey, man, WHO’S TO TELL ME” middle-fingered hubris.

Of course, as I pointed out in another follow-up,

Put this way your position becomes quite a bit more clear. And I can’t help but agree with your point.

… which is absolutely the situation I am currently left pondering. That doesn’t mean I believe the content of this post is no longer relevant, and instead that there is certainly more to this than meets the eye, a fact of which is forcing me to reevaluate my overall position.

Of course, life could be worse. I could go around thinking that my viewpoints were always and without a doubt the correct viewpoints and that everyone else who disagreed was, in fact, wrong. If there is one thing I have learned in life it’s that you don’t *ever* want to be “that guy.”

Thanks for helping me realize the flaws in my argument, Woof! Still thinking through this a bit, but once I have I’ll provide a follow-up comment with the results of my reevaluation.

Update: via Piers Hollot we have ourselves a new quote-of-the-day, week, month, and possibly even year,

To their credit, messrs Strunk and White had no way of knowing that semicolons, hyphens and parentheses could also function as winking faces.

*YES*! :D Thanks for the laugh, Piers! :D

[Original Post]
Coding Horror: Google’s Number One UI Mistake

Strunk and White urged us to Omit Needless Words:

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell.

If you were to ask me who I believe to be the greatest writer of our era, that answer would be immediate and definitive,

Tom Robbins

Of course I doubt Strunk and White would agree. “Too wordy. Too much personal expression. Too much social and political undercurrent. Too much. Too much. TOO MUCH!” would of course be five too many too’s for Strunk and White’s liking.

A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts.

Of course if this were truly the case, there would be no need for erasers. In fact, there would be no need for pencils. Everything would simply be written in permanent ink.

In the same sense, there would be no need for code refactoring tools and there would only be one programming language. I mean, why mess around with Ruby or Python when you can write everything in assembly, or better yet, machine!

Hell, for that matter there would probably only be one spoken language if Strunk and White had things their way. Why waste our time describing things differently than someone else when it’s much more efficient to describe them exactly the same way as everyone else? With such efficiency we could spend less of our time communicating and more of our time…

– doing —

… Hmmm, I don’t know… Staring at the wall? Watching the paint dry? Or would the paint already be dry in such a world? For that matter, why even paint the wall! What’s wrong with the wall the way it was in the first place?! In fact, why do we even have walls! It’s just wasted space!

Of course, in an efficient world such as this that has no need for walls and with such efficiency in our written and spoken languages I’m not exactly sure what we do with our time. But we’d have plenty of time to do whatever it is we wouldn’t be doing, that’s for certain! ;-)

Hey Strunk and White**, here’s some elements for your style: As hard as it would obviously be to actually submit to, go to your local library or favorite offline or online retailer and pick up a Tom Robbins novel. I’d personally recommend Half Asleep in Frog Pajamas or Fierce Invalids Home from Hot Climates. Then again, Jitterbug Perfume or Skinny Legs and All are both excellent works of literary art, as are each and every one of his other titles.

Oh, and while you’re at it, take a week off and go visit the Louvre. It’s a beautiful, wonderful, and thought provoking place filled with LOTS and LOTS of lines. No, not those kinds of lines. These kind,

** Yes, I’m aware of the fact that William Strunk Jr. and E.B. White are both dead.

NOTE: I should also point out that I have always found,

This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell.

… to be a pathetic attempt at saving face. By who’s definition makes the determination as to what words are necessary to *tell* a story, and what words are not? Theirs? Yours? Mine? Which, of course, in and of itself is the entire point,

One mans trash is another womans treasure. This same rule can be applied to *ANYTHING* and *EVERYTHING*. If you don’t like it. Don’t read it. But don’t tell others how to define what is trash and what is treasure…

*They* can make that determination on *their* own.

Kurt Cagle

AddThis Social Bookmark Button

I’m sitting in a quiet coffee shop on the mist-shrouded Oregon coast, taking a much needed break from family in the wee morning hours to put down some thoughts on the recent O’Reilly Open Source Conference in Portland. I’ll be heading back to Victoria over the next couple of days, nursing my poor, ailing Saturn back to the island and no doubt stopping in Seattle to indulge my daughter’s mania for all things Japanese anime related. She tells me that she’s a dedicated Otaku, regaling me with the plot-lines from half a dozen Japanese comics, many of which she’s now reading (more or less) in the original Japanese (”They always get the translations wrong, Dad!” she says with the conviction that only a fourteen year old teenager can have).

The conference itself was immensely enjoyable, and very eye-opening. I did get a chance to meet with Simon St. Laurent (an old friend and acquaintance that I haven’t seen in nearly a decade) and hung out with M. David Peterson, Kevin Farnham and James Turner, all of O’Reilly, spent some time talking business with Jason Gilmore and Terry O’Donnell, Managing Editors of Apress and DevX respectively, and sat in on some very good presentations (and hopefully gave a good one, though its always hard to tell when you’re on the stage side of the presentation).

Rick Jelliffe

AddThis Social Bookmark Button

The issue of handling legacy binary formats is one that impacts much more than old Word documents, especially for governments who have long-term archiving requirements.

I think governments should simply legislate “After 20 years, the documentation for all formats used in government data should be made public for access on government archival websites and is deemed unencumbered by IP considerations for the purposes of information retrieval of government data” as a matter of public policy. Hand them over or get a fine for obstruction or bad record keeping!

Of course, the regulations would need to say more than that to cope with industry churn and the ravages of time. For example, what if the vendor or product has been onsold and no-one knows where the documentation is now? What if the local sales body is no longer the sales body for that product, or the development organization is defunct. But that need not stop the general case.

Of course, for contemporary and future data, standard open formats are the thing.

Rick Jelliffe

AddThis Social Bookmark Button

By now XSD users are pretty aware of the severe limitations in the complex type derivation mechanisms provided by XML Schemas. Apart from the issue of whether they should be there at all, rather than being treated as a kind of validation issue as they are in RELAX NG, the problems are basically that “derivation by extension” only allows new elements at the end of the content model (”extension by suffixation”) so that I cannot extend <name><first>Rick<<last>Jelliffe</last></name> to be <name><first>Rick<<middle>Alan</middle><last>Jelliffe</last></name> using derivation by restriction (I need to change the base schema), and that I cannot use derivation by restriction to remove or optionalize an element that is required in the base (I need to change the base if I want to remove a required middle name for example.)

Now this is not to say that the definitions of complex type derivation by restriction and restriction are not logical. It is just that they are not useful or too strong in many important situations. The W3C XML Schemas Working Group has indeed worked on finding better definitions for them, but maintaining the core concept that a type derived by restriction is valid against the base type.

But I suggest that there may be other kinds of derivation which are useful. One that I would suggest might be called “Derivation by Implied Restriction”. This is where there are two complex types and neither is the base type for the other, but there is clearly some family resemblance. Rather than creating an explicit base type, I wonder whether it would be useful to ask a lesser question of them: could there be a base type created (automagically or notionally) against which both content models were valid by restriction and in which there was only a single particle for each duplicated particle the source content models? So the implied base type could be given a name that derived types but would not be specified (declared, defined) explicitly anywhere.

So if one content model said (first, last, gender?) and the second said (first, middle, second) the implied base content model would be (first, middle?, last, gender?). However if one content model said (first, middle, last) and the second said (first, last, middle) there would be no implied base type, because because (first, (middle, last) | (last | middle)) have duplicated particles. (I haven’t thought wildcards through.)

In other words, really the type is being derived from the instances, backwards, and if no derivation is possible then the instances are not related by an implied complex type.

I suspect this derivation type (and I am sure there are more) would reduce the complexity for XSD development from the users POV. Something more constrained than ALL but less constrained than current type derivation.

Rick Jelliffe

AddThis Social Bookmark Button

The subject of the quality and completeness of standards is very much on people’s minds. I suggest that the best objective evidence does not come from armchair review, necessary as that is in any case, but from implementation experience.

The document Known Issues from the OpenXML/ODF Translator Add-ins for Office project at SourceForge is currently the best source of objective evidence.

Now first a caveat. The translator project is concerned with issues for doing a direct conversion between formats. It is not interested in issues of information preservation as such: so we should not expect to see comments related to round-tripping, where going from one format to another and back again result in the same document structures. So the list misses out on features that can be faked or worked around.

So what do we find in the sections Undocument Features in the ODF Specification and Undocument Features in the Open XML Specification which relate to Word Processing documents?

In the case of ODF, they identify three places where Open Office does things differently to the ODF spec. Now of course here they are saying that the standard should support the product, and I don’t know where these issues fit in with ODF development. But nevertheless people do take Open Office to be some kind of primary source or even reference implementation for ODF But three little issues sounds something to be proud of.

In the case of Open XML, we find ….(drum roll)…four issues. Two of these are editorial problems in the draft specification and two are where the draft does not quite capture all the constraints of Office 2007.

Now I don’t know that the issues list for the converters for Spreadsheets and Presentations are complete enough to be usable in the same way.(There are five documentation issues so far for spreadsheets for Open XML and one for presentations in open XML.) Nor do I know that these lists were systematically prepared and so include all the issues they found. Nor the backgrounds and skillsets of the developers, nor external help they had in reading the various specs.

That not withstanding, I think that the converter project is mature enough to provide some good objective evidence of the quality and completeness of the specs for the particular important issue of conversion. And my interpretation of that evidence is that the ODF and Open XML specifications are both surprisingly good: I would have expected dozens of issues.

Strictly, what this issues list means is that Open Office and Office 2007 do generate documents that follow their relative standards, and that is evidence of the completeness of the standards for working with their major existing implementations.

Rick Jelliffe

AddThis Social Bookmark Button

We are used to thinking in terms of formats as rivals (either X or Y) or adjuncts (X for this use; and Y for that use) but what if there is an entirely different way of approaching office file formats? What if we learn from the success of Apple’s Fat Binary system and progress to a ZIP-based system where different standards can co-exist and share media files in the same package?

For background on this, see my 1999 paper How to Promote Organic Plurality on the WWW which introduces three ideas data kidnap, workflow kidnap and data lockout as more useful specific concepts than the usual data lock-in (which, being an over-broad concept, tends to generate over-broad solutions.) The basic idea is that technology needs to be layered so that each layer can allow a multiplicity of alternatives, rather than monolithic solutions. Think the Internet stack and all the RFCs giving alternatives. The paper was made as part of thinking about XML Schemas at the time of its development, and I think our later experience with XSD has entirely born out its conclusions. XSD has succeeded where it is modular (e.g. data types) and had trouble where it is monolithic.

Exploiting ZIP

ODF, Open XML and Java Web Applications (.WAR files) all are based on ZIP archives. Change the extension to ZIP, and you can poke about inside with just COTS ZIP utilities. So at the moment, it seems that we can actually, say, save a simple word processing document as HTML, ODF and Open XML, then merge them into a single file after paying attention to various paths and metadata issues, and removing duplicated media files. If we give that file the .odt extension, it will open using ODF; .docx, it will open as Open XML; .war, it can be installed as a servlet serving HTML pages.

Does this give us a file that is three times the size of the single file? Probably not, because not only will media files (which can easily dwarf the text components) be shared, but also because there will be fewer unique strings to compress, a unique string in the original document’s text will appear in the three different formats. Also, the modern formats allow embedded XML for forms or spreadsheet data source which can also be shared. So I don’t see it as impractical from the size point of view.

The kinds of adjustments that would need to be made include adding the appropriate MIME-based content type information to the ODF manifest metadata and the Open Packaging Convention content types file. But the result would be a single file that could be, with the appropriate extension change, be read by any of the other systems.

And, more interestingly, if we made up a new extension (.superXML?) an application could open up the file and then select which format it was happiest with. For example, an application might only cope with HTML and ODF, and so would chose one of those. Or an application might decide to open the file using whatever was the native format of the application that created the document: for example, if the document was created by Open Office, the receiving application might decide that ODF matched the feature set of Open Office more than Open XML does, and so import using that.

A different road to harmonization

With this kind of framework in place, the road to harmonization becomes clear, because harmonization doesn’t become a question of “Which format do we choose as the round hole, and which formats have to become square pegs?” but becomes a question of “What modules do they have in common now? What modules can be split out of one to help the other?”. So by supporting plurality and modularity, we can actually find out the points of similarity and quarantine differences to ever-smaller alternative fragments.

Lets give a practical example. Font matching is the feature where an application opens up a file and, upon discovering that some font needed by the document is missing, tries to find a near match. Various mechanisms can be used, but the most basic matching criterion is of course whether the font contains the characters for the language (I mean “script” of course) being used. It is not good using a Russian font for a Thai document.

ODF betrays its pre-Unicode and UNIX roots here, and uses a non-Unicode based system where it uses the locale character set of the original document (or of the font) and matches that. So it will say “This font has an ISO 8859-1 mapping table, therefore we will look for another font with an ISO 8859-1 mapping table.” This is pretty crappy in theory, actually, because Unicode extends so many of the locale-based character sets, but ultimately OK, because these things are only optional hints and the more hints the better.

Open XML uses the more modern Open Fonts standard ISO/IEC 14496-22 for font mapping, which allows mapping both by Unicode block and by major script family. Open Fonts comes from Open Type, which in turn is a container for including both Adobe PostScript fonts and Microsoft TrueType fonts: in fact, it is another example of this kind of containment mechanism.

Interestingly, it is this use of IS 14496-22 that has shows one of the problems with ISO DIS 29500 (i.e. Open XML). You may remember that anti-Open XML people have raised the issue of bitmasks in Open XML, with the lunatic fringe going as far as saying that Open XML was riddled with bitmasks and that these were impossible to validate or manipulate in XSLT; and me then rushing to Schematron’s defence and showing how it was entirely possible, if not trivial, in Schematron and XSLT. Well, the main place that bitmasks are found in Open XML are actually in the font/sig element that is used for font matching, and the bitmasks are the values specified by ISO 14496-22. There is no reason for an application to tease apart the bitmask numbers, certainly not to add 96 separate attributes for something that humans will not be interested in. because the numbers are just magic numbers that come from the original font and are matched against the prospective substitute fonts, that I can see. In the same way that you don’t want to have separate values for R, G and B, because having combined RGB values is more convenient for manipulation. (So the problem with DIS 29500 is not that it uses bitmasks in this element, but that it only gives a vague reference to the standard that the bitmasks are based on, when it should have a clear normative reference—I don’t think anyone else has picked up the ISO Open Font implicit reference hooray for me—: yet again this requires just an editorial fix rather than a technical fix.)

So what should be done? Should Unicode people say to ODF “You need to replace your antique system with something better” Should Linux people say to Open XML “You need to replace your cross-platform system with something that handles Linux-only legacy fonts better?”

With a common system based on plurality, we can say “Well, why not modularize both out as separate resources in the ZIP archive, so that each application has more resources to use?” Now Open XML is probably ahead of ODF here, because it tries to split up the document into many different files in the archive and is already divided into multiple namespaces. So it would be great it ODF adopted the same kind of modularity too. So an ODF application also can, if it chooses, to look in the Open XML font tables for better information. And a Linux system that is using the Open XML format can include information that will help with legacy documents on Linux better.

Practical issues that need to be addressed to get to plurality

The overarching idea is not so much that each document will have a grab-bag selection of different formats, but that each document will have at least one complete version in a standard format *plus* any alternative and additional information from other formats that the application can provide. So that a receiving application can choose the best modules it can, and so that information interchange becomes less dependent on limits of one particular standard.

I have mentioned before that no serious application suite can afford to ignore any common standard format. So in a couple of years time I am sure we will see Open XML and ODF import/export as part of the base packages for all the suites. Indeed, governments and power buyers should demand this from vendors for the distros they buy. (I suspect this will indeed become a purchasing requirement: see the European Open Documents Exchange Formats workshop in Feb 2007 where (p.12) Representatives from public administrations requested over and over again that industry take steps to overcome interoperability problems between ISO 26300 (ODF) and Office Open XML and to implement both standards in their products.. The writing is on the wall.) But my idea goes beyond merely transformation to a model of enabling selective augmentation.

Now even though it seems we can probably make an archive with these different formats now, the difficulty is with writing them. Applications currently won’t update the format parts they don’t understand of course. So if you update an ODF document that also has an embedded Open XML document, the Open XML document will be out-of-sync. This is an area for standards, and in particular an area for the maintenance of OPC and ODF: should the extra parts be removed and how do we signal it in markup?

Adopting the multi-format approach then has feedthrough for other formats, such as PDF. PDF would need to be unshelled, so that the various pages and resources were exposed as different files in the ZIP archive.

Of course, adopting this approach would nor preclude different formats from cross-pollenating and converging where possible. But sometimes there are differences that cannot be reconciled, and supporting plurality means that no solutions are gratuitously ruled out by bureaucratic dictates for single standards (Obviously I think the “Highlander” principle expressed here p.6 is in danger of being terribly simplistic and impractical, unless the one-true-format itself allows plurality at subsequent layers.)

Already Open XML has some capability for allowing alternative chunks within a file, and ODF of course allows foreign elements so you could poke some alternative or extra information in there. But my view is that this is something that needs to be engineered at the standards level, with vendor buy-in, to push competition between standards bodies and their stakeholders one level up the protocol stack. Every level is a victory, and I think this is a race where we need to win one step at a time. The hare and the tortoise.

What steps might this involve? Well, for a start I think that most of Open Packaging Conventions (OPC) should be adopted. There could be an on-ramp made for it, to allow current ISO ODF documents to fit in. The big difference is that ODF uses direct references to entities in the package, while Open XML uses OPC which uses indirect references. So the idea would be an identifier resolution system where ODF applications first treat the reference as a local relative URL then if that fails look up the OPC package then if that fails treat it as an external URL (of course, delimiters will provide extra hints to speed this up.) Furthermore, Open Office rewrites the identifiers used to be GUUID not human names, so it would be nice to add mirror SGML’s PUBLIC/SYSTEM/Indentifier distinction here—SGML got it right.

But I don’t think these issues are insurmountable. The question we need to ask is not How do we enforce monolithic technologies? but How do we take the sting out of multiplicity? It is not a question of trying to have the cake and eating it too, but rather that it is foolish and unworkable to merely throw half the cake out. Oh, that is getting far too aphoristical.

The pluralistic approach of this .superXML format also makes it easy to address issues such as equations, bibliographic citations and metadata where the needs of laymen are entirely different from the needs of professionals. The primary standard formats can adopt simple, layman-oriented structures (Dublin Core, etc) while encouraging specialist formats with higher qualtity requirements.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve just returned from a super-interesting week in Wellington, the capital of New Zealand. The government there has a very forward-thinking e-government program at the State Services Commission (SSC) that has to finesse its way around a severely fragmented organizations structure (they made several friendly digs at Australia’s three levels of government, but then someone mentioned they had 45 different government departments, or did I hear wrong? :-) and the emergence of powerful blocks of topical industry standards (Health, Education) which were not designed with inter-department data sharing in mind.

The e-GIFprogram has quite a few topics, including an advanced activity in authentication. I think Colin Wallis’ RFC 4350 A Uniform Resource Name (URN) Formal Namespace for the New Zealand Government gives a pretty good indication that they are asking the practical questions about providing the mortar between international standards and locally-useful standards.

They are putting in various services as part of a second-generation of pilots: of most interest to me is that their first set of pilots indicated to them that there was a real practical problem with validation: with XSD validation begin too difficult to set up, too difficult to interpret, not powerful enough, and not flexible enough to cope with the variants and derivations of real life schemas. So they are putting in a Schematron online validator, among several other measures. SSC’s Liz Kolster seems to be approaching it with a pretty hard nose.

My main task was to address a series of meetings on the topic of XML Governance: everyone’s mind on the same page was the goal. The first meeting was internal for the immediate staff. The second was for government department representatives (”stakeholders”). The third was for commercial integrators, information architects and so on (”vendors”) which was moderated by a really good professional facilitator, who used something called World Cafe that worked very well and efficiently, I thought.

What surprised me was that vendors really wanted more direction from government: more use of XML, more standardization, more best practices, more forums, more dialog, and so on. Terribly positive and encouraging. It seems to me that one reason why XML projects tend to succeed is that people adopt XML because they are asking the question “How can I make it easier for the other people in this ecosystem?”, and that attitude (even more than the technology) is the big win. Where projects are conducted where the ease of implementation of the other blokes in the project is ignored, you have a disaster waiting to happen.

As I was in town at SSC’s behest, on Thursday night I was invited by the New Zealand Open Source Society to speak on Wikigate and the Open XML process. They were so welcoming, sweet and smart, not to mention alcoholic, that I had a really enjoyable time. Plus I met up with an old friend from SGML days (Hi Richard) which was a real bonus. Andrew MacMillan has blogged about the talk Chinese Whimpers and kindly added my pic to my Wikipedia entry, (The reference to “bribery” is a joke based on comments in the talk btw.)

The talk was a run-through for a keynote I’ll be giving next week at Open Publish 2007conference in Sydney, which will be called “The True Saga of Wikigate” that should be fun.

M. David Peterson

AddThis Social Bookmark Button

So I’m sitting here at the O’Reilly both @ OSCON with James Turner of The Watering Hole fame and have convinced him to allow me to post a sneak preview of the strip set for publication in three weeks. So without further adieu I present the first pane of “A Little Knowledge”,

mail.png

If you’re at OSCON, stop by the O’Reilly booth to see the rest. Otherwise > See ya in three weeks ;-)

Thanks for the preview, James!

M. David Peterson

AddThis Social Bookmark Button

Sarah McLachlan - World On Fire


Update: I should point out the above video is several years old, but I was reminded of it while visiting Swivel recently and felt like this was just the kind of information that needed to be broadcast on a more regular basis.

CHARITY

FOR

AMOUNT

TOTAL DONATION

Carolina for Kibera • 12 room clinic and land deeds

• Medicine for 5000 people for 6 months in Nairobi Kenya

• $22,500
• $7,500
$30,000
Comic Relief • Running street children’s hospital in India for a year
… Feeding 10 street children in Calcutta 3 meals daily for 1 year
• Schooling for 100 street children in Tanzania
• Education for 200 students in Ethiopia

• $11,050
• $3,000
• $2,500
• $400
$16,950
CARE USA • Building of 6 wells in S.E Asia, Latin America & Africa
• Helping 100 widows to develop income generating activities in Afghanistan

• Sending 145 girls to school for one year in Afghanistan
• Equipping 10 classrooms in Afghanistan
• Training 10 teachers in Afghanistan

• $10,200
• $5,400
• $5,000

• $480
• $400

$21,480
DORCAS • Total running costs of orphanage in South Africa
• Improving the lives of 10 elderly people in Eastern Europe
• $16,500
• $3,500
$20,000
Engineers Without Borders - Canada • To purchase and implement a Multi-Function Platform in Ghana

• Christy Yaa: scholarships
• Nana Yaa: scholarships

• $15,000
• $1,000
• $1,000
$17,000
Help the Aged • Mobile Medical Unit (MMU) vehicle providing medical treatments • $15,000 $15,000
Film Aid • Entertainment & escapism for refugees • $9,500 $9,500
War Child • 70 former child soldiers to receive schooling & psychosocial support
• 7 young people in Sierra Leone to receive job training
• Education, shelter & food for orphans in Ethiopia

• $3,500
• $1,500

• $500

$5,500
Heifer International • 1 heifer, 2 goats, 1 buffalo
• 2 sheep, 4 goats, 2 llamas and 1 heifer
• A pig
• Chicks
• Ducks
• $1,000

• $1,500
• $120
• $20
• $20
• $20

$2680
ITDG • Scheme which would allow 300 families to remove smoke from their homes

• 10 smoke hoods
• 5 bicycle ambulances
• Nuts & bolts to secure houses of monsoon victims
• Sudanese irrigation

• $1,925

• $250
• $1,300
• $500
• $1025

$5000
Action Aid • To aid and implement programs in Khlaipathar village, Orissa, India to encourage families to be able to stay together
• 5000lbs potato seeds for planting vegetable gardens
• $5000
• $160
 
$5160
 

TOTAL

 

$148,270

So what’s this have to do with XML?

XML frees information. The above information is free.

Either that, or not a damn thing. And that’s okay.

On a related note, I sure wish there were more rock stars on this planet like Sarah McLachlan, don’t you?

Rick Jelliffe

AddThis Social Bookmark Button

I’ve been making some presentations this week on XML Governance. The aspect of governance in particular is the promotion of evidence-based management, with governance involving higher-level management asking lower-level management “What objective evidence do you have that you are taking care of issue X?” The trouble is that it is very difficult to come up with a good list of Xs.

So the approach I am suggesting is that as well as the top-down approach, there also can be a bottom-up approach where you invert the question so that we ask “Given that we have these technical artifacts (e.g. XML), what information can we extract from them and what issues can it be used as evidence for?” In this way, we come up with a list of the issues for which there can be objective evidence, and management can cherry pick the issues which are useful,

One case-history I gave was of a markup operation who installed context-based full-text indexing system. They started using it in an entirely different way to the way we had expected: they went through the list of words that had been marked up as keywords and then looked for every instance of that word where it wasn’t marked up as a keyword. This allowed them much better consistency.

But taking my bottom up suggestion, it also can be used as a governance input: the technologists first report “It is possible to get evidence (a measure or metric) of the words that have not been marked up correctly” which then allows the managers to trace to or add the business requirement “All keywords should be marked up” which in turn leads us to the governance requirement “How to you prove that this business requirement that all keywords should be marked up is being met?”

Things like the Extensibility Manifesto may help formulate useful issues for governance, but it is top-down. And the trouble with top-down is that sometimes tracing from issue to evidence peters out or stalls on the way down: a fine sounding abstract requirement that is unmeasurable. Now this is, of course, the basis of many of my company’s tools and my work on validation and metrics: concentrating on the possibilities that XML allows for evidence gathering, and then trying to progress this upstream to management questions and the governance issues.

Roger Costello has been making a series of “best practice” papers on Schematron over at XML-DEV recently. While these are very important to think about and to gather intelligence about, in a sense bests practices represent a kind of middle-out or context-free approach, which I think can be criticized because abstract statements of principle move away from the worlds of evidence (at the low level) and governance (at the high level). For example, at the moment there is a discussion on whether it is better to embed Schematron schemas in XSD schemas or to have separate documents: a good question. (I have posted to say “well what about XSD types inside Schematron rules too?”)

But my main comment was that perhaps whether a constraint is bundled with the grammar is kept independently should perhaps follow organizational lines: database people can look after static kinds of storage requirements, and analyst people can look after the business rules -checking. It may be that dividing constraints between schema documents should be based on who is looking after them. Now this would correspond to a management requirement “A separation of concerns should be implemented to reduce the intra-organization dependencies on data and applications.” And the relevant governance question would be “How do you prove that you have a separation of concerns in your data and schemas?” And the evidence would be to trace from each constraint in a schema to the driver for that requirement, and showing that a particular schema only has traces to a requirements set by a single organizational entity.

So in summary the bottom-up approach starts with technical artifact (e.g. XML) then finds out what it can evidence (”what can I measure?”) , then extracts potential management requirements which could be analyzed using that evidence, which then suggests possible governance questions. The bottom-up approach never descends in airy-fairy handwaving or impossible to implement abstractions: it seems a practical approach. The result will be partial: if you start with XML as the artifact you will get “XML Governance” issue raised. And the issue of “What can I measure?” leads directly into the worlds of complex validation (e.g. Schematron) and the need to develop good metrics in general.

Rick Jelliffe

AddThis Social Bookmark Button

DonationCoder.com has a very good Word Processor Review by Zaine Ridling, divided into three tiers: Major Word Processors (Open Office, Office 2007, Word Perfect), Second Tier Word Processors (AbiWord, EIOffice, etc.) and Online Word Processors (Google Docs, etc.) that is well worth reading for an idea of the capabilities of each. The final Pro and Con tables are handy.

The predictable quibble I have is that the reviewer apparently believes that application features are disconnected from save formats. So while he opens with If ever a maxim fit, one size does not fit all applies accurately to word processors and diligently mentions the different feature sets of the different applications, these different features never need to save any information that ODF cannot handle, it seems.

I think the best resolutions is that if a document does use some features that a format cannot handle, the application should alert the user who can choose the appropriate format. For Office 2010, for example, a user could set ODF to be the default default, and OpenXML can be the fidelity default, for example. I think that is one good way to reconcile the basic ODF-wasn’t-designed-for-our-feature-set issue with the we-want-ODF-as-our-default-format issue. Rather than panicking ‘It is impossible to use ODF because it doesn’t support all these things” (which is clearly true for many, but hopefully not for most Office documents, presumably following one of the standard statistical patterns) on the one hand, or chanting “ODF gives you everything you need” on the other hand (which similarly is hopefully true for most, but certainly not all Office documents)

It would be interesting to also include the word processors from Adobe (FrameMaker), IBM and Lotus as well. And it would be interesting to also include validation reports where the XML-in-ZIP save formats were validated against their standard schemas, since validity is a great tool for determining whether an application is doing the right thing,

Rick Jelliffe

AddThis Social Bookmark Button

In the spirit of truth and reconciliation, and to calm the situation down, I thought I’d keep a running list of new accusations on the Web of bribery and corrupt procedures in the document standards world, for the instruction of readers. Of course, follow the links for the context behind the money quotes.

I’ll add notable or novel examples I find for the next week, hopefully none! Sorry no comments on this one; the quotes and links can speak for themselves, And I want to plainly emphasize that there is a big difference in saying that a process is bad (see this interesting Karl Best blog and the interesting post by Marbux referenced below) and saying that some people or group is corrupt.

Here is a little map with the countries mentioned in red.



create your own visited country map


(Added) From NOOOXL.ORG

After the swiss cheese, you can taste the smell of the bitter cacao from Cote d’Ivoire. The Chairman of the Technical Committee in Cote d’Ivoire is Roger Kouadio, from the company Inova Formations. I let you guess from which vendor he is a business partner.

which leads to

Microsoft paying National Bodies to suddenly become P?


(Added) From “Anonymous Coward” on Slashdot

Just yesterday I was sitting in the relevant meeting of SNV/UK14 (http://www.snv.ch/), that decides how Switzerland will vote. The chairman (Hans-Rudolf Thomann) explained the following rules:

- we are here to create standards, not to reject them

- if we reach consensus (>=75%) to vote for Microsoft, we will vote for Microsoft

- if we only reach a majority (>=50%) to vote for Microsoft, we will vote for Microsoft

- if we reach a majority to vote against Microsoft, we will vote for Microsoft

- if we reach consensus to vote against Microsoft, we will abstain

The present spin doctors of Microsoft and ECMA managed to convince Mr. Thomann to reject every serious technical and general concern we had regarding OOMXL by pointing to compatibility reasons. At the end we had a majority _against_ Microsoft but which (giving the unfair rules) results in a Swiss vote _for_ Microsoft. “


(Added) From NoOOXML website from FFII

Brazil also in the way to be hijacked by Microsoft

ISO and ABNT are being hijacked by Microsoft and its partners.


(Added) From NoOOXML website from FFII

A generous contributor gave us the list of members of the Technical Committee in Colombia. The vendor is also invading Colombia.


(Added) From NoOOXML website from FFII

Romania votes “yes” for OOXML

Our correspondent writes, “…our politicians here are corrupt as hell, the general view around is that they are “yes-men” when it comes to lining their pockets or lack of knowledge of the technology…


(Added) From NoOOXML website from FFII

Microsoft 0wns Azerbaijan

Our correspondent from the Caucasus reports on his efforts to shed some light on the OOXML vote in Azerbaijan, a P-Member. Seems Microsoft ‘arranged’ the vote already


(Added) From NoOOXML website from FFII

Rumors of Microsoft blackmail in New Zealand?

Some rumors are saying that Microsoft’s representative has blackmailed the Government and the Standardisation body in New Zealand with something like “If you don’t say Yes to OOXML, we will revise our license terms with the Government for all Microsoft products you use. There will be sanctions”. We are looking for some people from New Zealand to investigate this.


(Added) From NoOOXML website from FFII

Microsoft recruiting puppets in Australia

Microsoft is looking for puppets to represent themselves in the Australian Standardization Body: “We need more Aussie companies supporting Office Open XML”.


(Added) From Roy Schestowitz

Has OOXML ‘Funny Business’ Already Reached Australia and India?


(Added) From Roy Schestowitz

We have seen it in Italy, we have seen it in Portugal, we have seen it in Spain, we have seen it in Denmark, we have seen it in Colombia, and we have seen it in the United States. Microsoft has no shame about abusing any system that is open to abuse or where abuse will go unnoticed. Microsoft even admitted this. The latest report comes from Switzerland (although its source and credibility cannot yet be verified).

Is China next?


(Added) From Roy Schestowitz at Boycott Novell

The Microsoft OOXML Corruption Train hits Denmark


From Charles-H Schutz’s blog The Price is Not Right

“Hungary voted with a yes. The price was right for whoever was in charge”…


From Sam Hiser’s blog Sacrebleu

“The ISO committee in France voting on OOXML is bent, too!”


(Added) From “Stephane Rodriguez” comment on Sam Hiser’s blog above

“The guy from CleverAge has posted the list of 14 voters in AFNOR. 5 of them are Microsoft bitches. 3 of them I am not sure

then

WYGWAM : Microsoft bitch. One of the employees has been bribed by Doug Mahugh for months (in what is now a classic…)


Paoli Pires’ blog OOXML vs ODF - The war is still ongoing in Portugal on the conduct of the standards committee

“Apparently this bunch-of-impartial-tech-companies has rejected a proposal from Sun Microsystems (Portugal) to become an active part, arguing that “there are no empty chairs in the room used for the meetings”.

UPDATE: More material at Groklaw, including some notes by an open source representative, Rui Seabra.

UPDATE #2: “The person that answered “no room left” is from IIMF (IT Institute of the Ministry of Finance), not from Microsoft.” says translator

UPDATE #3: From MS attendee Stephen McGibbon’s Much Ado about Nothing is some responses to other parts of the Groklaw story)

My recollection was that Rui himself was amusingly irreverent and was chastised at one point by one of the other members for lack of respect for others on the committee.

UPDATE #4: From IBM’s Bob Sutor

In spite of various communications, we are still locked out and will not be allowed to participate.

(Watch this space: I note that Sutor does not state the reason given by the Portugal NB committee to exclude IBM, but it seems to be that they have adopted a jury-style system with a fixed number of committee members..)

(Added) From Zaine Ridling in a comment to the same post

Since the US tech press isn’t writing about this (nor about the weird situation in Chile), then I guess it will be filed under “Who cares?”

UPDATE #5a: From several “Rui Seabra” comments on Ed Brill’s blog (which I would summarize as “The problem is that IBM and Sun were too late for seats on the committee, not chairs in the room”)

Ed: «According to Groklaw, he was Microsoft’s representative at a standards committee meeting in Portugal…where the committee chair, a Microsoft employee, »
This is false, of course. It was the National Body sectorial organization who refused their presence.”

UPDATE #5b: From several “Rui Seabra” comments on Ed Brill’s blog

0. There was not a due and fair process of invitation. When the real rules were known, people had less than 48h to propose members to the TC.

1. The National Body sectorial organization (NBso) refused the entry to IBM and SUN *before* the meeting had occurred.

2. The NBso claimed that a) there was a maximum of 20, and b) representativity was achieved.

3. There was an average of 24 people seated in the room. There were at maximum 25 people in the room. The room could handle almost 30 people seated. The NBso had an auditorium and chose not to use it. Microsoft occupied 3 seats. NBso occupied 2 seats. ASSOFT (BSA alike) occupied 2 seats. Microsoft business partners occupied a few seats more (at least 5, maybe more).

4. Some people more appeared to the TC meeting who thought they had been accepted (invitation process was broken). They got inside because some TC members (noticeably not Microsoft nor its business partners) invited them to stay as experts.

UPDATE #5c: From several “Rui Seabra” comments on Ed Brill’s blog

Nathan: «@12 - RS claims that the issue wasn’t that Sun and IBM were denied space on the committee. It’s that they WEREN’T ALLOWED ENTRANCE INTO THE ROOM. “Applied after the deadline” is not the same as “showed up at a particular time at the meeting place.”?»

Your understanding is wrong, there, Nathan. They were denied space on the committee.


(Added) From Roy Schestowitz’s (who says he is a maintainer of Groklaw’s New Picks) blog Voters on OOXML Up for ‘Hire’ in Italy

“It’s not just Britain and it isn’t just Portugal, either. Watch the following observation which comes from Italy. Voting on OOXML seems like a rather iffy business, not just in the United States.


From Groklaw

It seems there may have been more games played by Microsoft in the OOXML saga, or at the very least some confusion spread, and this time our story comes from Spain…


From Groklaw on the standards process in South Africa (he is MS’ Jason Matusow):

Normally, bodies do follow the recommendations of the technical committee. And why wouldn’t they? How else do you decide if an offering should be a standard? Stacking the decks to get the votes you want despite the technical concerns? I’m not sure I’ve understood him precisely, so perhaps he’ll clarify.


From “Marbux” in a comment on Rob Weir’s blog

It is the ANSI/INCITS process and procedures that are corrupt, not its TC members.

Allowing people with vested financial interests to cast ballots on federal government decisions also violates federal conflict of interest laws.


From “Anonymous” on the same comments list


How much do executive boards cost these days? Microsoft seems to have nearly unlimited pockets when it comes to bribing officials and paying off executives, so the OOXML standard is a merely a technicality.


(Added) From “anonymous” on Groklaw Newspicks comments responding to the following

A source close to the voting process speculated that Microsoft might still attempt to cripple the process bureaucratically before the vote is taken internationally in September.James Archibald

Yes, they have done that in Malaysia already when it was clear that OOXML would not be accepted. They had the TC4 committee suspended.


From “mcrbids” on Slashdot

But just look at that graph! The lengths that Microsoft will go to in order to prevent people from being free of the vendor lock-in… Cash is king, and Microsoft has more available cash than many countries’s GNP. How far can they corrupt the process? Probably far enough, with enough time and money, and the only holdback is the time.


(Added) From John Scholes’ blog Current OpenXML ballot is unlawful

So now that we have dealt with how to vote, we should perhaps turn a more important issue: the ballot is unlawful. … Why? Because the JTC1 Secretariat and ITTF (Information Technology Task Force) failed to follow the JTC1 Directives and ordinary principles of international law in dealing with the “contradictions” that were raised.


From “Stephane Rodriguez” on Doug Mahugh’s blog first about hAl that

If you are not being paid by Microsoft for the FUD you are throwing, then in addition to being full of shit, you are really a lame bastard.

then

I am actually incorrect in stating that hAL is one of those poor schmucks going around the internet for gratuitous comments to make. The other one is Rick Jetliffe. After a quick scan of related blogs today, I have noticed he’s in just about every comment area, with the exact same argument : OOXML cures cancer.

Whatever bribery buys…

UPDATE: I also get a rather milder mention this week from Roy Schestowitz

I’m sure you know this, but just to be 100% certain, Rick Jelliffe is consulting for Microsoft, he was paid by Microsoft to edit Wikipedia, and articles with/about him are anti-open formats.


Feel the love!

Kurt Cagle

AddThis Social Bookmark Button

Recently I passed both my 44th birthday and my 15th wedding anniversary, just signed my daughter up for high school and was told by my doctor that my HDL was soundly thrashing my LDL. My beard, which I’ve worn since my early twenties, is now streaked with gray (a curse of red hair, I fear), and I notice that lately the stairs seem to have mysteriously begun to grow from one trip to the next. T.S. Elliot is beginning to become … relevant … to me. All signs, perhaps, that I am no longer the young spring chicken I once was.

As I was thinking about things to write for this particular column, this realization about age began to sink in about the standard that I’ve spent the last decade writing about. A decade is a long time in computer circles, especially when you figure that there’s only been five or six of them in the whole history of computing. XML has gone from being a “standard” that perhaps a couple dozen people worldwide knew about to a pervasive technology that is so well entrenched that many people don’t really even think much about it any more. We argue about the XMLification of word processing and spreadsheet programs, we debate whether Atom or RSS 2.0 will predominate, we shake our heads at the whole notion of web services and how the dominant web services protocol was designed largely by bloggers to let people know about their websites.

In short, while XML is not exactly doddering off to the rest home, its angle-bracket knees are no longer as flexible as they used to be. If it were a person you’d expect it to be muttering about those damn JSON punks and how property taxes and inflation are eating up its standard of living. It no longer is as flashy a technology as it used to be (even as Flash has been migrating to an XML format), and more than once I’ve run into twenty-something AJAX hot-shots who declare XML so yesterday (even as they write applications that bind AJAX objects to XML structures). It’s become the establishment, though in many respects I suspect that while its glory days are behind it, XML is becoming more integrated into the fabric of computing.

To that end, I wanted to offer up an assessment of where XML itself is going. As always, this is written by a guy in a coffee-shop, so take it with the usually assortments of saline condiments:

M. David Peterson

AddThis Social Bookmark Button

memcache.c:45:2: warning: #warning “Working around busted-ass Linux header include problems: use FreeBSD instead”

As seen in the libmemcahe 1.4.0.rc2 ./configure script. :D

Rick Jelliffe

AddThis Social Bookmark Button

I’d long ago told my Microsoft contact that I thought ultimately ANSI will abstain on the Open XML vote at ISO, due to an inability to achieve consensus, so I was quite surprised that they only missed out by maybe one vote in the end in the V1 technical committee. From emails from my friends on both sides of the table, it seems discussions at the V1 committee meeting became, if not acrimonious, then whatever the step beyond “robust” is. It will be interesting to see what INCITS/ANSI does to proceed, but it is delightful to see that the opponents of Open XML have started protesting that they really wanted Open XML all along!

(NOTE: I am withdrawing the rest of the blog, for now, while I think about whether it just perpetuates tit-for-tat sniping. Consequently, some of the comments no longer have their context. Apologies. The missing parts express surprise at Sun’s recent statement of support for Open XML, bring up what anti-trust means in the context of standards, points out the impracticality of adding thousands of pages of binary mapping documentation to the Open XML standard, looks at the bad logic used to justify it, brings up the notion of a “poison pill” which is a trick where an impossible-to-fulfill clause is added knowing it will cause refusal, and also thinks about voting procedures when there are multiple choices.)

Rick Jelliffe

AddThis Social Bookmark Button

This is a simple list of 10 general corrections to Open XML. There have been comments recently that pro-Open XML people are not contributing any fixes, so here are my big ticket items. They flow out of the principles that I mentioned in my blog before, and other discussions, and my distaste for non-verifiable specifications.

(If Australia decided to become a P-country and vote for ISO Open XML, these are the general corrections that I would submit to Standards Australia, apart from specific typos and unclear sentences. Whether they formed part of a ‘Yes with comments” or a “No with comments” wouldn’t bother me, since they all are fixable.)

It is here as PDF and some blog feeds will have it below in the extended entry. Download PDF file

Rick Jelliffe

AddThis Social Bookmark Button

Why would Open Source developers want to support Open XML becoming an ISO standard? Isn’t it from Microsoft, the great Satan? Isn’t is some kind of trick or trap to stop nice Open Standard ODF? Are we going to let this chance to overthrow the monopoly escape?

Now there have long been two different camps in the Open Source movement: those who think that it is important to have independent APIs and those who think that it is important to have Open Source clones of the most important proprietary APIs. This latter group is of course associated with Novel and the Mono effort is a good example: on their history, I don’t think they have much problem with Open XML going through ISO (Gnumeric’s Miguel de Icaza fits into this latter camp.) So this blog is more addressed at the first camp.

First, I would like to set the scene. I think the reasons for supporting Open XML at ISO become a lot clearer if we take a fairly hard-headed view of what is possible. Which is perhaps a nicer way of saying that I think some of the anti-Open XML case has been built on naively faulty assumptions about the miraculous power of ODF to disrupt Microsoft.

  • No office suite or utility can afford to ignore any important format for long. So in a year’s time, every major office suite and utility (whether open source or not) will support ODF and Open XML, whether or not Open XML becomes an ISO standard.
  • No vendor will adopt, as their default save file format, a format which does not support their particular feature set. So the only way that MS would make ODF its default file format would be if (when) it is improved to support Office’s feature set. (Support for Office’s feature set was not a design goal or activity of ODF’s development. See the second item in the minutes of the first ODF meeting for example. Or see ODF’s Gary Edwards "http://about.diigo.com/about/show?url=http%3A%2F%2Fwww.consortiuminfo.org%2Fstandardsblog%2Farticle.php%3Fstory%3D20070629070544217">comment that “There is no possible way anyone can claim that today’s OASIS ODF TC would welcome Microsoft and make accommodating changes to the specification!”)
  • The poor state of ODF implementation by Open Source applications means that a too-fast adoption of ODF will backfire for Open Source developers. So paradoxically, supporting mandatory ODF too soon would be the kiss of death for open source ODF: bureaucrats will test applications and, finding them lacking, be forced to buy into a new generation of closed source tools (MS Office, IBM Lotus, Word Perfect, Sun Star Office.). If governments mandate ODF, it won’t exclude MS Office, in particular, and MS thinks their new GUI and features are competitive against other implementations, of course.
  • No matter what format you use, the only way to get 100% page fidelity (apart from some good-as-read-only format like PDF) is to save in the native save format and re-open the same file using exactly the same application on exactly the same operating system configured exactly the same. ODF won’t give instant interoperability in the sense of full page fidelity; I’d expect the same would be true of Open XML on different platforms too.

Putting it all together, it means that there was never a chance that Microsoft Office would or could adopt ISO ODF 1.0 as its native and default format. So the real choice that faces us is whether we want Office to generate files in a format that MS controls with very few external checks (with all respect to Ecma) or to generate files in a format that MS instigated but which has the extra checks and balances that come from being an ISO standard. ISO standardization is not an Aladdin’s cave of democratic rights, and it is not a Pandora’s box for Microsoft, but it is way better than nothing. Because that is what the anti-Open XML people would achieve: no controls on Microsoft. Under the guise of supporting ODF.

If you, like me, are in the position where you don’t use MS products in your normal work lives, then you may not feel any urgency to support Open XML, but I think we at least should not oppose it. It is a good step forward.

We often read that Microsoft is doing this as some kind of sinister ODF spoiler. However, Open XML is a path they have largely been forced to take (though obviously they will try to make the path as beneficial to them as possible) in order to fend off continuing anti-trust problems in Europe: Microsoft went down this path after a very strong hint from the European Union: in the same report that recommends that ODF be submitted to ISO, the EU’s ‘Telematics between Administrations Committee’ recommended that Microsoft should consider the merits of submitting XML formats to an international standards body of their choice as well as improving the documentation, IPR issues and going all the way with XML.

I certainly support governments mandating that public documents should use standard formats: HTML and PDF being the primary two, and ODF after that, but also Open XML as a second source. However, having an ISO Open XML does not prevent any government from preferring ISO ODF. ISO standards are what are called “voluntary”, which means that they are not like laws where you have to adopt them. In my view, the drivers for ODF will continue unabated even after/if Open XML becomes a standard.

So, in my jaded view, ODF will not make Office go away, ISO ODF will not make Ecma Open XML go away, and ISO Open XML will not make ISO ODF go away. So I see no downside in Open XML becoming an ISO standard: it ropes Microsoft into a more open development process, it forces them to document their formats to a degree they have not been accustomed to (indeed, the most satisfactory aspect of the process at ISO has been the amount of attention and review that Open XML has been given), and it gives us in the standards movement the thing that we have been calling for for decades (see my blog last week that compared what Slashdotters were calling for in 2004 with the path that MS has taken).

In the 80s, there was a hilarious wrestler called George the Animal Steele. He was incredibly hairy, especially on his back, which was supposed to be emblematic of his sub-human state. His great flaw as a wrestler was that just as he was winning he would be distracted by the turnbuckle, often trying to eat its stuffing while his opponent recovered. I guess this is how I feel about the attempts to stymie Open XML at ISO: just as we have victory in our hands, with MS prepared to go XML and standards, along comes this distraction, ODF, which is great in its place but dumb, unworkable and counter-productive as a Microsoft buster.

M. David Peterson

AddThis Social Bookmark Button

Signs on the Sand: Saxon, NET and XInclude

Saxon, famous XSLT 2.0 and XQuery processor, supports XInclude since version 8.9. But in Java version only! When I first heard about it I thought “I have good XInclude implementation for .NET in Mvp.Xml library, let’s check out if Saxon on .NET works with XInclude.NET”. I did some testing only to find out that they didn’t play well together.

Turned out Saxon (or JAXP port to .NET, don’t remember) relies on somewhat rarely used in .NET XmlReader.GetAttribute(int) method (yes, accessing attribute by index), and XIncludingReader had a bug in this method.

Finally I fixed it and so XIncludingReader from recently released Mvp.Xml library v2.3 works fine with Saxon on .NET.

More goodness at the above linked post. Thanks, Oleg!

Rick Jelliffe

AddThis Social Bookmark Button

Some reference material and commentary for people interesting in getting to grips with the process. (This is a much revised version of the original page, which was confused. I think I have it right now, but corrections are welcome and I’ll edit them back into the page,)

Open XML Material

The text of ECMA 376 can be found at

ISO Procedures

At the highest level, are the ISO/IEC Directives. These have two parts which can be downloaded from the ISO site:

  • ISO/IEC Directives, Part 1 — Procedures for the technical work (2004, 5th ed.)
  • ISO/IEC Directives, Part 2 — Rules for the structure and drafting of international standards (2004, 5th ed.)

These are the umbrella standards. ISO and IEC are two different organizations, but they share various common procedures. The branch of ISO/IEC that looks after Information Technology is called JTC1 (Joint Technical Committee 1). It has its own directives

  • ISO/IEC JTC1 Directives (2007, 5th Ed. 3rd version)

which I found at the ISO/IEC JTC1 SC34 website. This version three from March this year has a few important changes since the previous version, so it is not enough just to find the 5th edition: I have been caught up on this. Similarly, you have to be careful that the general procedures given by the ISO/IEC Directives may be implemented or supplemented in the JTC1 Directives in a different way than you might expect.

There are two procedures for fast-tracking standards. ODF used the one called PAS (Publicly Available Specification), see JTC1 Directives Section 14. Open XML is using the one called Fast-Track Processing, see JTC1 Directives section 13. (The earlier version of this wass confused on this issue.)

PAS

The main place to find information on the “PAS” Fast-Track procedure is Appendix M of the JTC1 Directives The Transposition of Publicly Available Specifications into International Standards — A Management Guide. This gives procedures for an organization to be accredited as a submitter and the various procedures to be used. Because of the recent changes, there is still a little uncertainly as to how some of the material is to be interpreted in practice: in my opinion, the JTC1 directives cannot be interpreted in any way that goes against plain reading of the ISO/IEC Directives, which are the controlling documents: when there are disputes about the particulars of the JTC1 directives, the first place to look is at what the ISO/IEC Directives have to say.

When a PAS is submitted, it gets submitted with an Explanatory Report from the submitter.

Criteria for PAS Voting

The JTC1 Directives give detailed criteria, noting that it is not a matter of pass/fail for each. Some are mandatory:

  • Organization-related criteria
    • Co-operative stance
    • Characteristics of the organization
    • Intellectual property rights
  • Document-related criteria
    • Completeness
    • Testability
    • Suitability
    • Availability
    • Consensus

and the JTC1 directives give specific questions in each case. National bodies should ask themselves whether these are the questions they are in fact asking, one would presume. Then come lots of other possible questions, including market acceptance,

Fast-Track

Compared to PAS, there is little explicit help on how a Fast-Track draft should be evaluated.

  • Before it is submitted, a draft can be submitted informally for comments to the appropriate committee (SC34) in JTC1 that looks after Document description and processing languages. Now Ecma did in fact approach several participants in SC34 individually for informal comments, asking “What do we need to have in the spec to make it acceptable?”, but not as far as I know through the formal “informal” channel spoken of here.
  • During the one month “administrative review”, evident contradictions with ISO standards can be found. See my What is “Contradiction” of an ISO Standard and the followup No (showstopping) contradictions in Open XML?.
  • National Bodies then get five months to prepare their votes and comments on the Draft of the specification. It is important to note, however, that section 13.1 twice describes the draft ballot as a vote on technical issues.
  • If there was not a clear result, and yes and no votes with comments (see below for the possibility that a “no” vote must have comments) then a ballot resolution meeting is scheduled. The comments from different national bodies are sent to all bodies, who are supposed to form opinions about their acceptability in order to get the consensus at the ballot resolution meeting. After a few months study time, the meeting occurs, with no particular time schedule given. (I expect it may take two weeks, and the Convenor for the meeting is going to have to be fairly tough on time-wasting and long-windedness.)

Simple No Votes thrown away?

In the JTC1 Directives annex on PAS transposition there is an interesting note at at section M6.1.5 Ballot that says

Note to JTC 1: The ballot follows normal JTC 1 Fast Track rules in the case of an IS transposition, and normal ISP rules in the case of an ISP transposition. Negative votes by National Bodies have to be accompanied by comments giving the reasoning for the vote.

What is also interesting from the above note is the idea that negative comments have to be accompanied by comments. So simple “no” votes are actually not allowed. In fact, they must be disallowed! This aligns with ISO/IEC Directives Part 1 2.6.3 that for Enquiry Drafts, Abstentions are excluded when the votes are counted, as well as negative votes not accompanied by technical reasons. This is definitely an area where the people involved in the process need to get a good grip in which procedures apply!

I am not sure that there is any reason to expect that this does not also hold for the Fast-Track procedure. (Corrections welcome)

(Edits: Updated link to JTC1 Directives. Add paragraph on two different methods. Put PAS material in section. Remove or revise commentary on Open XML as a PAS. Added material on Fast-track. Re-order. )

Rick Jelliffe

AddThis Social Bookmark Button

Open standards are clearly a good thing. Hurrah for open standards, etc. Nail my hat to the ceiling!

But anyone who has been involved in community and consortium committees where there are commercial rivalries engaged knows that the thing that kills or corrupts a standard is when the spirit of mutual accommodation is overtaken by the spirit of competition. When I look over the standards that I have been to one extent involved with, at ISO, W3C and tangentially at IETF and OASIS, the golden rule is that the standards that come out of a nasty process have problems. The rancour during the Open XML debates does not auger well either for ODF and Open XML, in this respect, but I am an optimist.

The trouble with the ideal that people seem have of “open standards” is the extremely pragmatic one: how do we trust the committee? Who appoints these elders? Now this is something that MS have brought up about OASIS ODF, that ODF people have brought up about ECMA TC45, and which will undoubtedly be brought up about ISO (though ever more tenuously) by one side or the other no matter what the result is at ISO, sooner or later.

I think the problem is that rather than talking “open standards” we need to be talking as much of “verifiable vendor-neutrality”, if that is the goal for our public policy. It is nice that ODF and Open XML are open standards by the academic definitions, but it does not get us to where we need to be, and legislation based on mere “open standards” tickboxing will not succeed in getting vendor-neutral formats, if that is indeed the underlying aim. A standard may be as open as the grave yet not be good enough to the native format for an application, to bring up the current instance.

I have talked before about the need for profiles (to restrict extensible standards), and others are bringing up the natural progression from validation to test suites, but recently I am coming to believe that there has been a fundamental incorrect emphasis which self-defeats the open standards movements: the lack of scrupulous attention to the need for verifiable inclusiveness and fairness of process.

In other words (while not disagreeing that requiring “open standards” based on XML and ZIP is the best option now) the way forward for the EU and other governments is to direct and require that their application-suppliers participate in fair, mediated, format-harmonization standards processes (which is not the same thing as unification, and not the same thing as feature-leveling.) The boutique standards bodies, such as Ecma, OASIS, W3C are simply not constituted to be reliable here: they are democratic and two minor players will outvote one major one, which if done often enough will cause the major one to take off in a huff.

A company like Microsoft is famous for trying to keep effective control of its API. Some see Sun’s JCP as the same thing, it is a rational approach. So it is simply futile to imagine it is feasible that a company will give up control of an important asset to its business rivals: this is an issue that we have seen time and time again in the W3C, and is a tricky one in general, because it is not government’s role to solve this problem everytime. Some issues cannot be solved neatly or optimally or instantly, because of market forces (balanced markets or unbalanced ones!).

But the issue of public and archival formats for government and agency documents is clearly one where governments have a vital interest: the customer is always right. This is why I believe governments need to look beyond the current academic definitions of “open standards” and re-frame the issue as “How do we achieve verifiably vendor-neutral standards?”

Verifiable here means that there is a check in place that the committee proceedings did not discriminate against any player. Mere quorums and absolute votes are not enough. Vendor-neutral here means that the standard does not discriminate against any realistic players, either by making basic implementation too hard or by disallowing vendor-specific features or innovations or experiments, where appropriate. The only forum that I see that is set up for this kind of thing currently is ISO, where vendors can have committee input but only national bodies ultimately vote, but there may be some other approaches possible.

Rick Jelliffe

AddThis Social Bookmark Button

Lets imagine that we are transitioning into a “Document Engineering” style of architecture, so that we can model our entire business using old but not-as-outmoded-as-we-first-thought Data Flow Diagrams. At each data flow we need to ask the Exception Question: Does an exceptional document need human intervention or can it be dealt with automatically? Indeed, the expected answers to this question is probably what distinguishes the document community from the database community: the docheads would expect exceptions to be dealt with by humans who can monitor, fix and reset the production flow at all stages, the dataheads would expect exceptions to be dealt with by automated process, since humans involvement is at the input/output periphery of systems.

Obviously, the most “exceptional” kind of document is the invalid-against-a-schema document. However, Schematron allows a much milder (or tougher, depending on how one looks at it) bar: the presence or absence of any arbitrary pattern in the document can allow it to be marked as exceptional. (Schematron not only define valid/invalid, but it also allows complex dynamic diagnostic messages, and it also allows various flags to be set by assertions that fail.)

So the Exception Question then becomes a criterion for evaluating schema or constraint languages: when exceptional documents are to be sent to humans for intervention, does schema language A provide clear enough information to be usable by those humans. Similarly, when exceptional documents are to be sent to software (services) for intervention, does schema language A provide clear enough information to be usable by that software. Looked at in those terms, grammar-based systems do not shine. Grammar-based systems excel in all-or-nothing Great-Wall-of-China exclusion uses, but then throw the users (systems and humans) at the mercy of the validator-developer for the kinds of feedback and information possible, who has of course absolutely no idea of the problem domain of the schema. XSD is perhaps a little more organized in this regard compared to the other schema languages, because it defines a specific list of outcomes that can be found in the notional Post Schema Validation Infoset after validation.

But, the trouble is that, whether for humans or systems, the more that problems are diagnosed in generic terms (i.e. in terms of the markup) rather than in domain terms (i.e. in terms of the intended patterns, or dare I say semantics), the less chance that the diagnostic can serve any practical purpose for downstream systems. Notoriously this is true for system which “hide the markup” from the user: the grammatical errors are unavailable and incomprehensible to the users. Grammars have shown themselves over the last 20 years to make programmers more productive but to stupify end-users: the traceablity issue I raised this week on XML-DEV in response to one of Roger Costello’s excellent fishing expeditions is another head of the same Hydra.

Rick Jelliffe

AddThis Social Bookmark Button

I find there is little awareness of the health dangers of XML Conferences in our community. As indeed I was until it happened to me. It had me in hospital near death within a few days of returning from the XML Europe 2005 conference, and it took more than a year to recover, and I just heard of a friend hospitalized after a trip to another XML-related seminar.

So I am not being flippant here, but literally deadly serious. The problem is not dull speakers or conference food or sponsored talks on products or the body’s natural spasms reacting against the evils of WS-*. Open *, or * 2.0 as you might expect. The trouble is prolonged airflights, in particular intercontinental and interhemispherical flghts.

Many people are now aware of deep vein thrombosis and take good measures against it. (It is a real problem outside flight too: a friend of mine died of it last year during radiation treatment for cancer because he was too debilitated to move his limbs.) Dick Cheney recently had one for example.

But there is a more general risk in air travel: you are sitting in a box full of people at all sorts of stages of health, and you are making yourself fatigued. The excellently named Dr Dement from Standford says that fatigue reduces mucus production in the throat (and dry airplane air will accentuate this), which makes you more susceptible to coughs and colds and other airborn viruses. For example, I developed pericarditis which does not seem to be directly contagious however, many of the organisms that cause infections that can lead to … pericarditis are spread from person to person by coughing or sneezing.

Then there is of course the danger (or likelihood for some places) of food or water problems. Just because we know about it doesn’t mean it won’t happen. For example, on my last trip I got sick before India in Europe or Canada, while a friend became sick immediately after.

So here’s my travel tips:

  • Never eat seafood when traveling. Actually, this advice comes from the aging queen, Elizabeth II, but it seems to have served her well. The trouble is that seafood poisoning is fairly catestrophic.
  • Never travel for more than one intercontinental leg. My recent trip required me to go from Sydney to Toronto to Delhi to Sydney. So I scheduled 24 hour stopovers in San Francisico, Frankfurt and Bangkok. These cut down the flying times to more reasonable levels (Sydney to San Francisco is only about 14 hours fllight, the others are between 4 to 9 hours. Now, of course, that blows out the total time of a trip, and adds expenses for accomodation,
    etc. But knowing what I know now, you are crazy not to do something like this.
  • Never travel overnight when there are day trips available. Sleeping on a plane is a complete gamble, even with the unpredictable delights of Stilnox (Ambien), plus you may be sleeping in a cramped position. And if you arrive too early, you then may have to wait till check-in time. Or, if you in somewhere dangerous or with an extreme climate you may have to book extra half-day hotel time at each end anyway (if they allow such a thing), so you may not actually save money either.
  • Never travel on a Friday. Homecoming businessmen saturate the skies on Fridays. The more people on a plane, the more chance of contact with a vector, and the less chance of space to stretch.
  • Avoid North American carriers. This goes without saying. North America is seemingly populated entirely by dwarves. Asian airlines are the best for legroom: Thai, Singapore, QANTAS.
  • If I do have to take an overnight flight, I take a quarter of a sleeping pill, just enough to push me in the right direction but not enough to knock me out. I find traveling without my contact lenses also helps rest, because otherwise the infloght movies are too attention-grabbing.
  • At the stopover legs, don’t do anything taxing. Pick low-hanging fruit and concentrate on rest. and relaxation.
  • Take extra care on West-to-East trips. It sounds odd, but it is very real. Traveling West-to-East makes you lose time: you only get a shorter sleep period. Traveling East-to-West stretches things out, so you get more time to rest. (This is the same effect as with daylight saving, of course.) If I have to do an around-the-world trip, I’d always try to go Australia-Asia-Europe-America-Australia, not the other way around.
  • Sit by the window for snooze room or in the emergency row, for more legroom.

Now it is difficult to get agents to understand all these kinds of criteria. My last trip probably took about 20 phone calls to get right. Am I obsessed? Well, so should you be! If you have to take prolonged trips, be careful.

I am interested in any other tips people have.

Rick Jelliffe

AddThis Social Bookmark Button

Bob is a really clear-thinking and enthusiastic guy, and one of most interesting to wine and dine with. His book Document Engineering is important for anyone who wants a better vision of where XML is leading us. I’ve just discovered the IT Conversations website, which has podcasts of various people of interest to me: Miguel de Icaza for example.

Bob’s podcast has much of interest. An idea that hadn’t registered with me before is that one of the drivers for (larger) business to adopt a document-engineering approach is because they need to componentize their business functions: a document doesn’t care whether it goes to Florence, Bangalore or Kinshasa. Globalization as a driver for XML: that’s a pretty strong driver.

Bob also has a blog with co-author Tim McGrath Doc or Die

M. David Peterson

AddThis Social Bookmark Button

Update: Problem seems to be fixed. Thanks to whomever@O’Reilly did the fixing!

Update: So this is *DEFINITELY NOT* an issue with Google Reader, and instead an internal issue with the feed generation,

Atom 1.0:

Parse error: syntax error, unexpected '{', expecting ')' in /title/oreillynet/htdocs/blogs/xml/templates_c/%%0A^0A4^0A404241%%mt%3A225.php on line 6

RSS “2.0″:

Parse error: syntax error, unexpected '{', expecting ')' in /title/oreillynet/htdocs/blogs/xml/templates_c/%%0A^0A4^0A404241%%mt%3A225.php on line 6

My sincere apologies to the Google Reader team for suggesting in the title that this was a problem on their end. Quite obviously that is not the case. Title has been adjusted accordingly.

Also, I should emphasize the point that this is a problem I see *ALL TOO OFTEN* with MovableType. One *REALLY* strange issue I kept running into when working on Lawrence Lessig’s new blog was random characters (the %, &, and { characters were the most common) being saved in the template files for no obvious or apparent reason.

Any other MT users notice this as of late?

[Original Post]
via a recent tip from W^L+ (thanks, W^L+!),

Google Reader hasn’t shown any XML Blog posts since June 26th. I thought you were all on vacation.

Anyone else experiencing similar problems with Google Reader? Or is the problem (potentially) with one of the feeds generated by MovableType (the blogging software we use on O’ReillyNet Blogs)?

Thanks in advance for anyones and everyones help in tracking down the potential problem!

M. David Peterson

AddThis Social Bookmark Button

Update: It’s official,

Safari on Mac
Safari on Windows

Thanks again (everyone@)Apple!

[Original Post]
So a few weeks back Todd Ditchendorf brought to the surface, and I followed up with a report, the fact that Apple had added scripted transformations to the WebKit mix. At the bottom of that same follow-up report,

** Though I wonder if Safari has migrated any of the EXSLT functionality from libxslt, in particular the node-set() function? Anyone know off hand? If no, then Opera still has one leg up on Safari. Of course they still have one leg down on Safari as well. ;-)

Extending from a “request for support” from Romain Brestac, the above question then led to an open discussion in the WebKit Bugzilla interface regarding adding support for exslt:node-set(). Less than an hour later, Dave Hyatt (yes, *THAT* Dave Hyatt), followed-up with,

Rick Jelliffe

AddThis Social Bookmark Button

There is a running gag in the Simpsons where, in a flashback, Homer or someone will boldly predict something wildly wrong: 8 track tapes will never die,, that kind of thing. I was reminded of this type of gag today when reading an Slashdot thread from 2004 entitled “Is the new Microsoft Office really open?” It relates to Office 11, but it is interesting to note what rabidly anti-Microsoft people were demanding at that time, and their expectations of getting it.

Some highlights:

  • In summary ‘Microsoft says it’s opening its Office desktop software by adding support for XML–a move that should help companies free up access to shared information. But there’s a catch: It has yet to disclose the underlying XML dialect.’ Could this be grounds for another anti-trust suit against Microsoft?”
  • Of course it isn’t open. It’s a silly question. Open is EVIL. Actually open would eliminate advantages. People would be able to create their own tools to interact with documents, instead of with MS tools. Where’s the money in that?
  • The move to XML has the potential to eliminate that sort of brain damage once and for all provided they actually open their file formats.
  • But they can make it so massively complex that it is very difficult to implement interoperability with foreign tools, but that it is somehow much easier to implement with MS-centric tools. … So maybe the XML format will be like that. If you’re Linux-centric, for instance, the threshold of pain for accessing Word XML docs will be fairly high, but if you’re Microsoft-centric, with all of their tools, code-snippets, documents, etc., then it won’t be nearly as painful.
  • …even if they use absurdly complex element names,…
  • As long as MS is effectively a monopoly XML will be whatever they say it is, for the majority of people.
  • Of course, most people won’t use the XML format at all, since it won’t be the default.
  • Even if a language is in XML, you still need to *document it* to be able to *understand* it.
  • Well if the way Microsoft Word saves out as HTML is anything to go by, then concise it most definitely will not be.
  • I suppose they could put some weird binary or encrypted data in the files, but that would defeat the purpose of XML.
  • If the XML files office produce are not made the default save types or if the XML merely encapsulates large portions of binary code, it will not matter one lick that office can save these xml documents because the majority of people will be stuck on the default, unreadable formats.

  • That being said, it’s hard to see what business the government has engineering document formats. They could, on the other hand, specify disclosure of formats as a remedy in an anti-trust case, but they generally fall into one of two categories which precludes this: stupid or bought.
  • It’s not “obfuscated” so much as it’s “optimized.” The whole idea seems to be for Word to save as quickly as possible–which the doc file is best at for Word for some reason, probably becuase it’s derived from how the program structures documents, and not how some document spec says documents should be handled.
    In an era of 2+ GHz computers with 7200+ rpm hard drives, it seems odd that Microsoft would be unable to write an application than can quickly save and open text files that, on average, run well under 50 kilobytes.
  • The big question (to me) is whether Microsoft can put a legal encumbrance on the XML schema they use for a new file format. Could you publish a schema but have it so wrapped in legalese that (for example) open source projects could not be allowed to use it ?
  • So, is this going to be XML like the rest of the world knows it, or is it going to be an embrace and extend XML? Or, could it be a mutant XML? How about an XML that makes reference to Windows specific resources IDs?
  • Adoption of a “standard” is no guarantee of interoperability. Understanding the conceptual underpinnings of the standard is just as important. The question is, when Microsoft says they are using XML as a document format, are they doing it because they believe in the principles underlying it, or solely for the cynical “this is what is selling now” aspect?
  • The Office file formats will be open if M$ decides to:

    * Document them, and
    * Not change them with every update.

    I doubt they will do either of those things.
  • They simply have no reason to play nice in an industry consortium to agree on a DTD/Schema when they have 90% market share. But as long as they publish the details of their Schema and don’t leave chunks of encoded COM schwag lying all over the place it doesn’t matter. Of course, we all know the likelihood of that happening.
  • Something in my gut tells me that beyond all the extraneous tags, attributes and data types, the XML is going to have a hash code built into it.
  • For an XML language to be open, you need a full description of what each possible construct in that language means.
  • It would be quite easy to make the M$ document xml format propriatry. Make all default generated documents have linked in components like some ActiveX HTML pages. You might be able to read the base parts of the document but that won’t make it very userful without M$.

And so on…ratbags and survivalists: a bitter cup but with a few drops of sweeetness from such as Liam Quinn plonked in. So lets suppose the Masters of the Four-Paned Beast capitulated and gave these Knights of the Slashed Dot pretty much everything they were calling for: we would get a requirements list along the lines of:

  • Open up: use XML with no extensions
  • Open up: provide copious documentation
  • Open up: don’t require MS-specific tools for processing; allow people to make their own tools
  • Open up: don’t use long element names
  • Open up: make it the default save format for Office
  • Open up: use real XML, don’t just wrap binary or encrypted sections in XML tags
  • Open up: don’t have legal encumbrances on use
  • open up: don’t have defaults that use (or require) COM or embedded objects or anything MS platform-proprietary.
  • Do have some stability to the format: don’t have a lot of updates

however

  • optimization is OK
  • integration with the deeper “conceptual underpinnings” of XML is important

It strikes me that this is almost exactly a description of the direction that MS then took with Open XML. The Slashdotters don’t know they won, (or at least that in 2007 they would get what they were asking for in 2004.) …Perhaps in 2010 they may get what they are demanding in 2007! “Doh!”

Rick Jelliffe

AddThis Social Bookmark Button

The more I think about it, the more that I think that the reason why Schematron (or something like it) will ultimately win (i.e. consolidate into a mainstream place as a schema language with broad vendor support) is that it fundamentally asks a different, and more important, question about documents than the grammar-based validators, such as DTDs and XSD. The question it asks is “What information does a schema need to provide to the user and application?”. And a related question “How do we group that information so that it can be sequenced to the user and application in some useful order?”, which is just as important when there is a large amount of information.

Contrast this with, say XSD, where the question gets morphed into “What canned outcomes can validation have, independent of the schema’s actual semantics and domain?” XSD is crippled in this regard because there is not enough guidance about the appinfo and documentation annotation elements: are they for human end-users, for schema management or what? They are wasted elements.

Now I guess underlying this is the idea that the central issue with building computer systems is ultimately how to relate information to humans in ways they will understand. (I want to use the dreaded word “empower” here, to my shame.) That includes developers, but developers are only the initial target group, not the only ones.

Rick Jelliffe

AddThis Social Bookmark Button

When you see a data field with text like 2007-07-05 you are probably looking at a date in ISO 8601 date format. Year, month, day: YYYY-MM-DD

IS 8601 is in an international standard which gives several standard syntaxes for representing Gregorian dates and times. The full English title is ISO 8601:2004 Data elements and interchange formats — Information interchange — Representation of dates and times. It is only about 33 pages long; you can purchase it from your local standards body or from ISO, and as is common practice for ISO standards, there are final drafts available for free on the Internet. It is maintained by ISO TC 154 who have the Dr Who-ish name of Time Task Force.

Before IS 8601 there were multiple other standards for dates and times. For example, IS 2711 which allowed formats like 5 Jan 2000 and dates using ordinals. 2711 was withdrawn as an ISO standard in 1988, superceded by 8601. However, other nations and bodies have continued to use many of the other ex-standard formats, because it is convenient to have time written according to local conventions. The difficulty, you see, with IS 8601 is that they managed to get a nice unambiguous format for dates by adopting a format that no-one non-technical used: year, month, day. (I won’t be dealing with times in this blog.)

The second rub with IS 8601 is that it defines multiple formats. So as well as YYYY-MM-DD for years, such as 2000-01-01 you can also have the basic form 20000101. And the same date could also be represented as YYYY-DDD as 2000-001 with a basic form version of 2000001, where the DDD is an ordinal counting the number of days into the year. And it could be specified as a relative date too: you could specify it as a duration (from a base of, say, 1999-01-01) using the syntax PdddD where P means period and D means days: so P365D or even P12M where M is month. You can do the same with an explicit base and get the notations 19990101/P365D for example. The more exotic you get, the more chance that you have a need for what the standard calls “a mutual agreement” where the exchanging parties agree on what the notation means, because it could have several different meanings under the standard.

The third rub with IS 8601 is that it is based on the Gregorian calendar. This is unsurprising, in view of the dominance of the West and its ex-colonies in trade and standards adoption. However it imposes a conceptual, processing and formatting burden in places where other dates formats are used. It is not as uncommon as people think: I have lived in Taiwan and Japan, where non-Gregorian calendars are used for example. And obviously the Islamic calendar is in wide use.

XML Schema Standards and Dates

XML Schemas (XSD) is a Technical Recommendation made by an industry consortium W3C, which allows direct participation by representatives of fee-paying members. XML Schemas supports a wide range of ISO 8601-ish date formats. For most formats, there is no difference, and there is even an explicit appendix ISO 8601 Date and Time Formats which gives clear information.

XSD provides many different datatypes for dates, times and durations. However, it does not allow all ISO8601 syntaxes, and it does alter others. For example, ISO 8601 allow a year 0000. This is not allowed under the AC/DC calendar system, where you go straight from 1BC to 1AD. XSD’s date types disallow the year 0. Most importantly, the date notation used is the extended one with the minuses, so 2000-01-01 not 20000101. XSD allows you to “derive types” to restrict dates to certain values or ranges.

When we were discussing date formats in the W3C XML Schemas Working Group, I tried to get localized date formats allowed: I think it is the same principle as IRIs: it is good if a human can author directly (or generate directly) in the form or notation that they use to think about the data. However, my brilliant idea was rejected by the XML Schema Working Group (with the MS representative taking quite a strong stand that neutral/standard formats should be used, not localized ones) probably because I did not have a proof of concept. Since then, Jenni Tennison’s DTTL data typing language has come along, and is being adopted as part of the ISO DSDL multi-part standard. It is, I believe, exactly the right way to go: allow notations in the format that makes most sense to the stakeholders and application requirements, but provide a mapping to neutral/fixed-syntax formats.

In that sense, my personal belief is that ISO 8601 is a relic of a pre-markup and pre-schema mentality. That does not mean it is not valuable nor that it should not be maintained, nor indeed that it shouldn’t be the first port of call when looking at date formats. But it pushes localization to be an application consideration whereas I think it is just as legitimate and feasible to make it a markup/parsing (i.e. schema) level issue. This is not only because localized formats (rigorously described with an appropriate declarative schema language) make it easier for humans to read and write, but also because where the consumers and generators of data are computers and humans are relatively unimportant in the pipeline or critical path, then data field notations localized (again, rigorously described) for optimal computer performance is entirely appropriate and smart.

Office Document Formats

ODF and Open XML both use ISO 8601 dates in the YYYY-MM-DD form throughout for all dates. (The ODF spec uses US date MM/DD/YYY formatting in palces in its text, but don’t let that confuse you.)

ODF has quite a nice, basic and consistent approach to dates in spreadsheets: read and store them in a kind of ISO 8601 format but also allow a “null date” (such as 1899-12-31) to be specified to allow conversions of date into numbers. Spreadsheets very often actually store, manipulate or transfer dates as ordinal values from an index point: this makes calculations with dates very straightforwards. Representing dates as ordinals is also used in other ISO standards: for example, the SQL_DATE data type gives the number of days since January 1, 1841. (It gives this count as a simple integer.) See section 8.5.2 Calculation settings in ISO ODF for more information.

The draft specification for ISO Open XML, from Ecma, does have one oddity, which has attracted much controversy. In SpreadsheetML table cells only, dates are actually saved as durations, as ordinals. The base is set by an attribute on the workbook, and reflects the supported ranges of Excel on different platforms (on Windows, Excel does not support dates before 1900; on the Macintosh, Excel does not support dates before 1904; putting in such a date will be serialized out as 0 into SpreadsheetML.)

The reasons for saving as as duration rather than a date are obvious: it reflects the internal format directly, allows faster loading and save times to the XML, and allows faster loading and saving times when interfacing with an SQL system that uses SQL_DATE etc. The economic value of load/store times for Office documents is enormous, and it would be quite inappropriate to apply the criteria that one might use, say, for DOCBOOK documents, to standard office formats: I actually think that ODF gets it quite wrong here, and that best practice should dictate that optimized formats should be available. However, by the same token, I think that SpreadsheetML gets it wrong, and that it also should allow reading of data in ISO 8601 format as well as in its optimized notation.

The logical question that comes up is Should SpreadsheetML use ISO 8601 duration format rather than just raw ordinal integers. If the ISO 8601 standard notation was used, SpreadsheetML should use <v>P1D</v> to mark up the first day in the range, rather than <v>1</v>. However, the P and D are redundant, because the notation is clearly marked up by attributes (and documentation). This is the old issue of where the barriers should be between information in markup and information in embedded formats. I don’t see that <v>P1D</v> has any benefits over <v>1</v> frankly: it would seem to be an exercise in nominalism and pointless compliance.

<digression>The additional difficulty here is that we are let down by XSD here, again: XSD doesn’t allow the type of an element to be selected in part or whole by an attribute value on an ancestor, unlike ISO Schematron and ISO RELAX NG. XSD is completely deficient in support for these kinds of idioms, because the database mindset of its developers let them to conceive of attributes as merely funny kinds of elements rather than as metadata on an element, of the same importance and character as the element name. So XSD doesn’t allow attributes to select type; therefore Open XML would have to compromise its design, where elements are highly generic (i.e. data values in spreadsheets are in a “v” for value element) in order to allow values to be typed; however then Open XML could declare the value to be an xsd:duration which would then require the P1D notation. Another approach in XSD would be to use xsi:type where the v element is a union of integers, durations, string etc. However, then we would need to consider how to fit shared string references into the datatyping framework. Too much work! </digression>

The second reason why the ordinal values for dates in SpreadsheetML are controversial is because of an out-by-one adjustment that is needed for some functions for the first two months in 1900. To me, this is just a silly edge case: remembering that spreadsheets from Mac Excel don’t even get back to 1900, and on Windows they don’t go before 1900: it is hardly the wholesale subversion of the Gregorian calendar that you might suspect from various comments on the Web. ODF perhaps punts the issue, by allowing date indexes to start on 1899-12-31 or on 1900-01-01 (examples they give) and so they leave it up to the application developer or document generator to figure out which one is appropriate.

In my blog last month on Principles for reviewing standards, I took the position, which I think is the most reasonable one, that for embedded data fields the standard forms should be provided and optimized forms may be provided. From that POV, Open XML should also allow ISO 8601 durations and/or dates as well as the simple duration ordinal. And ODF should allow duration ordinals as a matter of best practice,

Rick Jelliffe

AddThis Social Bookmark Button

Here’s a quick tip for the interested. When someone says “Standard A violates standard B” ask “Which clause?” No clause, no violation. (A cynic might think that if no clause from standard B is mentioned, it may suggest that standard B has not even been sighted.)

Kurt Cagle

AddThis Social Bookmark Button

Tim Bray recently announced his publication of a new Apache module, mod_atom, which will make it possible to use the Atom Publishing Protocol (APP) directly with the Apache HTTPD server. This is a pivotal achievement, and one that will rocket APP into daily use. APP uses Atom feed content and HTTP headers to build a publishing “blog” system, though its uses extend considerably beyond the normal scope for blogging and could very well be a staple of most data publishing systems within the next few years.

M. David Peterson

AddThis Social Bookmark Button

You know, there was a time in the not-too-distant past where an effort to standardize something Microsoft created was seen as a *GOOD* thing,

ECMA to create standard out of Microsoft rival to PDF

July 01, 2007 (IDG News Service) Standards body ECMA International has formed a technical committee to develop a standard built on Microsoft Corp.’s XML Paper Specification (XPS), a rival file format to Adobe Systems Inc.’s Portable Document Format (PDF).

According to ECMA’s Web site, the goal of the TC46-XPS Technical Committee is to create “a formal standard for an XML-based electronic paper format and XML-based page description language which is consistent with existing implementations of the format called the XML Paper Specification.”

When and why did that change?

Well, either way, good on ya Microsoft! I’m one of your *many* supporters in regards to keeping the transparency and openness rolling forward. Please do just that.

Oh, and thanks!

Jennifer Golbeck

AddThis Social Bookmark Button

The central idea of the Semantic Web is to extend the current human-readable web by encoding some of the semantics of resources in a machine-processable form. Moving beyond syntax opens the door to more advanced applications and functionality on the Web. The Semantic Web Challenge offers participants the chance to show the best of the Semantic Web.

This year, I am one of the co-Chairs of the 2007 Semantic Web Challenge. Do you have a cool application you’ve developed? Enter it! We’d love to see what you’ve been doing.

http://challenge.semanticweb.org/

Advertisement