Articles Archives

Rick Jelliffe

AddThis Social Bookmark Button

Now that ODF and OOXML are both set to be on the ISO/IEC books, it is useful to consider what the next productive steps are.

For genuine ODF Supporters who are concerned that ODF has languished a little out of the limelight during 2007, there are a lot of useful things to be done. You don’t even need to join the OASIS groups or your local National Body or SC34 to begin.

I suggest here are some things that will help the ODF effort coming into ODF 1.2.

  • Lobby the component standards groups, notably the W3C, to have official RELAX NG schemas available. Without schemas, there is no validation, and without validation there is no conformance testing, and without conformance testing there is no interoperability. (Or, at least, it becomes significantly more difficult in each case.) I believe SMIL is an example of this. If possible, actually have the schema ready and waiting, to make it easy: you will feel more of an achievement to have part of the standard that you can say “I contributed that”!
  • In a similar vein, lobby the component standards groups to harmonize their standards with ODF. SVG is the one in particular that seems needed. It would be great if not only would W3C SVG group add the few missing attributes and so on, but perhaps also make a profile of SVG to match ODF better (this is not a concrete suggestion, just something whose usefullness could be checked up by someone wanting to get involved.
  • Speaking of SVG, some open source XSLT transforms for going from ODF’s “SVG” to standard SVG would be good.
  • Join in the KOffice and Open Office efforts, especially in areas that effect you or for which you have expertise. Maths is a good area, for example.
  • Check through the IS29500 spec that are of interest, when it comes out, and figure out whether they are things that are decorations (which can be handled merely by foreign elements in ODF) with the current ODF behaviour an adequate fallback, or features that are currently unsupported in ODF, that will need attention. Share your results with the SC34 committee and with the OASIS and ECMA committees.
  • Patrick Durusau has made a request that he thinks the area of checking how well some of the detailed descriptions of formula functions in IS29500 accords with the reality of Office as currently implemented, would be really helpful. This would help both IS29500 get improved and provide better information for IS26300.
  • Join in a conformance testing group: make up test documents. Ideally a test library will have some tests that test one thing per document, which makes a very large number of documents, and others that test cascaded errors. So I wonder if algorithmically generating test documents from schemas is viable.
  • Get you National Body to submit more Defect Reports, so that SC34 does not lose impetus. Remembering that when something becomes a standard, maintenance becomes a community job not “their” job.

Of course, if you were not interested in being constructive but in trying to frustrate yourself there are other things you could do. You could for example, mount a court action asking for something that you know to be impossible (e.g. withdraw a vote on a ballot that has been closed), with reason that you know won’t stand up (e.g. that a committee of long-term experts changes it vote after being satisfied that there have been enough changes to proceed with a standard), with odd legal ground (if the voluntary standards group is not subject to administrative law, not being under the government), and where you know that your standards body’s final vote is a credible one (because it was shared by more than an absolute majority of other National Bodies around the world.) Why would someone do that, my readers might be asking themselves? Embarrassment? Sour grapes? Vindictiveness? Marketing?

I certainly hope that national standards bodies will stand by their committee members and provide financial support during court cases, for time and expenses the private individuals will be dragged away from their work. This kind of intimidation, to use courts and the threat of legal action to force a result after you have lost the technical argument, should be seen for what it is.

Now please, I am not saying that I have confidence in every NBs votes. While I believe that every NB acted intra vires and therefore legal overturns are futile, I was not pleased with the Norwegian national vote (for just the same reason as I was not pleased that several NBs voted for ODF bypassing their technical committees too) and the Brazillan vote (after an IBM representative blogged that he had convinced them that if they had *any* outstanding technical problems they should vote no: if he is true, the NB secretariat should have picked this up in committee and told its members that perfection is not a requirement for a standard.) But, I don’t see them as acting outside their powers.

And, most importantly, it is a different class of problem to have a standard accepted than to have it blocked.

National and international standards bodies are highly aware that their activities and importance is tolerated and encouraged only because they create markets. The minute a national or international standards effort becomes a servant of some clique or cartel, to the exclusion of others, it loses its fundamental justification. (I say “effort” because a body may have thousands of efforts on the boil at any time.) For standards bodies, exclusive behaviour is a mortal sin; in comparison, too much inclusiveness (i.e. by having multiple standards where in a perfect world we could imagine having only one) is only a mild (and bearable) fault. (And, indeed, in most cases I consider support of plurality, to allow the market to choose, a positive virtue.)

Rick Jelliffe

AddThis Social Bookmark Button

Patrick Durusau has fun on his site with a posting satirizing the strategies of some opponents and proponents of OOXML at ISO as Beavis (for the the former) and Butt-head (for the latter.) Wikipedia has a good explanation of the Great Cornholio for people who don’t get the reference.

His key passage is probably

I think the Butt-head side seriously abused a process that had been designed by the Beavis side for their own abuse but that hardly qualifies as an objection to OpenXML.

The notional trigger for the commentary is a worthwhile article by IBM’s Arnaud Le Hors A Standards Quality Case Study: W3C which asks many good questions. I don’t know that he is correct about Candidate Recommendation, however: From what I have seen, OOXML would have passed the Candidate Recommendation requirements from W3C: it clearly has an implementation and the differences between what Office 2007 does and IS29500 says are largely cosmetic; my understanding of the W3C CR regime was that it proved that a technology was implementable, not that every part had been implemented: consider SVG or XSL-FO for example. And I was a little puzzled that Ecma’s process should be considered lacking because of its emphasis on timeliness, but later on OpenId was lauded because it is by a group of interested individuals that share a common interest and decide to solve it swiftly in a somewhat informal way using the internet to its full advantage. (emphasis added)

My angle is this: there needs to be a marketplace for standards bodies (plurality), so that stakeholders can choose the one that matches their requirements. And this in turn allows a marketplace for standards (plurality), so that users can choose the one that matches their requirements. And it is only to be expected that when there are competitive standards, vendors will attempt to use the standards process for marketing, to differentiate why their doovalakey is better than their rival’s thingamajig. Caveat emptor: competition entails sorting through rival claims.

David A. Chappell

AddThis Social Bookmark Button

For the past several years, I have been involved in many healthy discussions centered around the benefits of adopting technology and its supporting tools and infrastructure. Never once had I ever thought of measuring the benefits in terms of tonnage of hardware or in kilowatts per hour (kW/h).

Until now….

Rick Jelliffe

AddThis Social Bookmark Button

Open Geospatial Consortium has put out Google’s KML as one of their industrial standards. Congratulations to all concerned!

KML is an XML language focused on geographic visualization, including annotation of maps and images. Geographic visualization includes not only the presentation of graphical data on the globe, but also the control of the user’s navigation in the sense of where to go and where to look.

OGC seems like thriving ecosystem, under their OpenGIS® brand. OGC has a strong liaison with ISO TC 211 Geographic. information/Geomatics) and having been transposing their standards across when stable. OGC has a strong government and specialty-vendor influence. (Geomatics is not a word I had come across until today.) The OpenGIS® Reference Model (PDF) seems a good place to start figuring the GIS standards out. I liked the statement in their FAQ answering Q: How do you compare the action of the OGC with that of the ISO and other standards organizations?:

A: The standards tracks of OGC and ISO are fully coordinated through shared personnel and through various resolutions of ISO TC211 and OGC. They are often complementary and where they overlap, there is no competition, but common action (e.g. in the geometry model). OGC provides fast-paced specification development and promotion of standards adoption, similar to other industry standards consortia such as W3C, IETF, and OMG. ISO is the dominant de jure international standards development organization (SDO), providing international government authority important to institutions and stockholders. Through OGC’s cooperative relationship with ISO, many of OGC’s OpenGIS Specifications either have become ISO standards or are on track to become ISO standards.

The OGC process includes a 30-day window for public comment. (The ISO process has at least 6 months period for National Body comment, and sometimes much more.)

What I found interesting about the KML spec was that it represents a connecting point between two different standards ecosystem: in particular with the Khronos Group which is

a member-funded industry consortium focused on the creation of open standard, royalty-free APIs to enable the authoring and accelerated playback of dynamic media on a wide variety of platforms and devices.

The Khronos group is that now maintains Silicon Graphics’ OpenGL graphic system, and is strongly influenced by the mobile device industry. For my own research, I made a little diagram to try to express the Khronos ecosystem (and to experiment whether there was much point in using UML for this kind of thing.)

To summarize the diagram: lets concentrate on three kinds of standards: XML file formats, Rendering APIs for 2D or 3D graphics, and Delivery or Codec APIs which provide session or lower-level services.

2D3DStandards.png

From the bottom left, we see that OGC KML uses Khronos Collada Digital Asset and FX Exchange Schema. This is a Sony-derived technology for transporting 3D assets between applications. It used by 3D modeling software. In turn, Collada can be considered to some extent to be a serialization of OpenGL data. Of particular interest to XML-ers is OpenVG which has as a design goal of supporting SVG Tiny 1.2 :

OpenVG™ is a royalty-free, cross-platform API that provides a low-level hardware acceleration interface for vector graphics libraries such as Flash and SVG. OpenVG is targeted primarily at handheld devices

The diagram has a cluster of various vector-related XML (or soon to be XML) formats: the SVG family of SVG and its two profiles SVG Basic and SVG Tiny, and SVG’s near relative the ODF drawing language. What struck me is that the ease of implementation of a standard is really related to the function points (and tooling up, etc) of a complete implementation, and this in turn is directly related to how close the underlying libraries are to the markup language.

I would call OpenVG a glue standard, which is where you have one or more existing underlying standard API which has grown under its own steam, and one or more standard formats, and to ease implementation you make an intermediate API based on abstracting the various standard’s features. In industries where there are multiple entrenched document formats which have a lot of similarity, these kinds of glue standards are one practical way forward.

It is often the case that where you have a problem with obstinate multiplicity of standards, the way forward is not by insisting on a single standard, but in aggressively supporting plurality in such a way as to neutralize the problem. I’d see the XML encoding header (and, indeed, Unicode itself and perhaps SGML/XML too) as an example of exactly the same strategy.

I don’t expect to see it, but it is interesting to consider whether the same approach would make sense for ODF/OOXML harmonization. To what extent is “harmonization” an assumption about how to achieve a particular desired result (in particular, more guaranteed interoperability or less gratuitous non-interoperability) rather than being an outcome in its own right? If the outcome desired is interoperability, then that could also be addressed by having everyone support everything (to reduce my point to the absurd): and that tactic can only be prosecuted by making implementation as easy as possible (by having APIs that are as close as possible to the various file formats.)

Eric Larson

AddThis Social Bookmark Button

Test Driven Development is a relatively popular methodology nowadays and I think XML tools can play crucial aspect in better testing. Testing frameworks are more than capable of using and testing XML based applications, but just in case you have ever had trouble, here are a few tips.

Kurt Cagle

AddThis Social Bookmark Button

XML.COM Newsletter

There’s a problem with living life on the bleeding edge. For all of the exhilaration of being one of the first to play with a new technology (or in some cases even to create that new technology), it’s also very lonely there - by definition, most people will be encountering the same technology about the time that you’ve come to think of that technology as old hat or even (gods forfend) passe. This means that sometimes its easy to lose sight of what happens when these tools and techniques hit the real world of the workaday developer.

XML is a case in point. It’s ten-year old technology and has become the data lifeblood of the Internet (along with its younger sibling JSON). Every so often someone up on the xml-dev mailing list will pop up and say “Is XML Dead?” - to which the rest of the old guard will pipe up in defense of XML or agree that, yes, XML’s existential crisis is upon it, and it will soon be pushing up the daisies, if it’s not there already. As one colleague of mine put it - XML’s just not interesting anymore.


Read the full newsletter.

Subscribe to XML.com newsletter.

Rick Jelliffe

AddThis Social Bookmark Button

Three programmers gathered at the next cubicle to mine yesterday, clucking and snorting as is their want. I looked over to ask what was going on. “A bug in Java” they said. The problem was with ZIP files, specifically some differences between ZIP files made by different methods.

They had some files with non-breaking spaces (U+00A0) in the file name. Not something that I would do myself, but the number of people who want to use non-ASCII characters in their filenames is surely now much greater than the number of people just content with ASCII-only names. Aha, so file this under internationalization (I18n)!

The problem was, it seems, that WinZIP stored the filenames using the system default encoding. But Java would read the filename using UTF-8. So sometimes ZIP files parts would have the non-breaking space, and other times the same file saved a different route would have 0xFF at that position. Now this is the kind of behaviour and problem that you would expect a decade ago, but I was surprised it still occurred.

Checking through Sun’s bug database, we find that this bug (or its clone) is actually the second most requested (2008-13-28). The engineer who evaluates the problem gives the excuse that Sun decided to use UTF-8 for JAR files (which use ZIP) and seems a little surprised to discover that ZIP may actually be created by other systems to.

Looking at the bug report, we also find it was first reported 07-JUN-1999. Almost nine years ago. The bug report says it is only reported up to Java 1.4.2, however I cannot see anything in Java 1.6 that addresses it.

So what has happened? Several things:

  • Apache put out a zip implementation as part of Ant that supports different encodings. So people who needed it can use that.
  • Since September 2006 the ZIP spec has formally included a bit to state the the file name is stored using UTF-8.
  • It seems other manufacturers have increasingly used UTF-8

So for almost 10 years the Java version of ZIP has been broken for internationalization purposes, the fix seems to be caught in limbo (are they waiting for non-UTF-8 encodings to go away, perhaps?) , and so people are forced to go to other implementations. WORA undermined! Indeed, this seems another example where Java is simply too large for Sun to maintain adequately.

But what about this angle: the current ZIP spec has an appendix on file names and encoding it says

The ZIP format has historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437.

Which means that Sun’s policy of merely writing UTF-8 is now going against what the ZIP spec says.

Software maintenance and juggling issues on a budget are not easy. However I think it is more than plausible that had Sun gone ahead and submitted Java to ISO for standardization a decade ago, this issue would have been fixed long ago. Because ISO National Bodies give very high precedence to issues such as internationalization, accessibility, modularity, and conformance. So the lack of proper encoding support in the ZipEntry API would undoubtedly have come to the fore in the very first round: Japan never lets this kind of thing slip, for example.

By exactly the same token, if the ZIP format has been put through as a standard, proper encoding support would have undoubtedly been raised as part of the first review. Standardizing either would have been good enough to have a technical fix agreed on, published and pressure applied for a fix ahead of the demands of corporate featuritus. But standardizing both would still be best.

After Sun backed off last time, leaving so many people who had participated feeling burnt, it is hard to see that standards people won’t be deeply suspicious of them. And Sun people may not be keen to submit even to a “bullshit process” based on pragmatism and incrementalism. But Java would clearly, IMHO, be in a much better position today if it had been standardized. And so would ZIP.

Standardization as a kind of audit

What standardization of a living technology gives stakeholder companies is more than just bragging rights and ammunition to shoot their rivals with and to confuse procurement people with, tempting as those things may be, it also give an objective audit program dictated not from the corporate POV but from (to a greater or lesser extent, depending on interest) the market and relatively disinterested third parties. Any long-term software project gets encrusted in the personal politics and ideosyncrasies of the development team, and needs a circuit-breaker. This is a view of standardization as a kind of major technical audit, particularly of the documentation but also of areas that are becoming more market-critical: standards use and compliance, openness, responsiveness, accessibility, internationalization, integratability, testability, and so on.

These are all things that established technologies need. Now of course you can get audits in each of these areas by hiring experts. That is good, but you don’t get the breadth or provable transparency that National Body participation can bring. And expert opinions still have to get evaluating the context of the power relationships of the company, the very same relationships that allowed the problem to arise (these might be as simple as CJK requirements not having an adequate champion or I18n not being a profit center that can demand changes.) And you can get benefits from using boutique standards bodies in which vendors or their representatives can have voting rights: W3C, Ecma, OASIS, and so on. That is good too, but it does open to domination by one side or the other.

Which leaves the ISO family (e.g. ISO/IEC JTC1) as being effective forums for this kind of audit. People who think that ISO standardization is always a pushover should consider the current OOXML debate: you have MS and friends on one hand and IBM and friends on the other both pushing as hard as they can, and yet as I write neither can establish clear dominance. And these are the largest players in the world. Whether DIS 29500 mark II passes or fails it will be because national bodies decided on technical issues, not pack alliances, as far as I can tell. I am sure that neither MS nor IBM is feeling comfortable at the moment: and this is the strength of the ISO kind of procedure, regardless of the outcome.

We have all had enough experience of open source to be aware of its strengths and weaknesses now. Making something open source does not automatically mean that bugs and so on will be fixed. No silver bullet. As I wrote in this blog a couple of years ago in Sun should open source Swing

it is not enough to Open Source something: the mechanism for speedy response to bug fixes and releases is crucial too.

And neither will auditing a technology by making it a standard. Nothing is automatic. But Error-full systems emerge from single-strategy maintenance regimes and the dinosaur systems such as Java and Office are full of examples of this. The ISO standardization process has many qualities to commend itself for large companies as a tool for shaking things up and circuit-breaking. And we still need an ISO standard for ZIP too.

Rick Jelliffe

AddThis Social Bookmark Button

I was told recently that of the 250 or so fast-tracked standards that Ecma has successfully had accepted by National Bodies at ISO/IEC, only three of them have failed. I thought it would be interesting to read up a little more on them.

Ecma (shooting the messenger)

Ecma makes standards on a wide variety of subjects, and has particularly strong involvement with the European and Japanese computer hardware industry. In a response to a comment on another item, I posted this list, which is of the current groups and chairman’s affiliations, to give an idea of its scope:

  • C# (Chairman from Microsoft)

  • ECMAScript (Chairman from Mozilla)
  • Business Communications (Chairman from Siemens)
  • Near Field Communications (Chairman from Sony)
  • High Rate Short Range Wireless Communications (Chairman from Sony)
  • Environmental Design Considerations (Chairman from IBM)
  • Accoustics (Chairman from HP)
  • Electromagnetic Compatibility (Chairman from Intel)
  • Optical disks and disk cartridges (Chairman from Toshiba)
  • Universal 3D (I3D) (Chairman from Boeing)
  • Holographic Information Systems (Chairman from Fujifilm)
  • OOXML (Chairman from Microsoft)
  • XPS (Chairman from Global Graphics)

Now I knew that the C++/CLI effort had failed (for what seems good reasons to me.) But I was not so sure of other efforts.

I found this article, from 10 years ago: Sun Uses ECMA as Path to ISO Java Standardization which I will look at in more detail in a moment. But there is an interesting passage halfway down the page:

In 1996 Microsoft Corp was able to shoot down another ECMA standard, the Public Windows Initiative, at this stage, thus preventing it from becoming an ISO standard. The PWI was a Sun effort to get Windows APIs put into the public domain. … Microsoft was able to mount a successful campaign against PWI at ISO on this issue.

What do we learn from that? That Ecma was happy to serve as a neutral forum. That Sun was happy to try to make use of the Fast-Track procedure when it suited them, for competitive reasons. That in fact IP buy-in from the critical stakeholder is necessary. And that MS has made a 179 degree turn on standards since a decade ago. (I am always amused at how often anti-OOXML material will, when it fails in a current objection, resort to decade-old material as if it were fresh and compelling. The company then was fleeing standardization; now they are participating and allowing significant changes. You do not have to trust or like them to acknowledge that.)

Control of the API

ISO standards are a very scary proposition for large companies. Many of them are not comfortable with any position other than dominance and stability. The control of the API is terribly important to them, and they regard loss of control of the API as a risk (whereas it can be a circuit-breaker and new-market enabler.) This is one reason why all the large companies try to favour the member-based boutique standards bodies: W3C, OASIS, Ecma, because there is more chance that they can establish a beachhead and make participation at those bodies unattractive or futile for their competitors. The need for stability is sometimes stronger than the need for dominance: when you see calls for “equilibrium” to be maintained in a market, you know that is a buzzword for maintaining the status quo. (And it is not always the market leader: it can be a smaller player in fear of losing their share just as much.)

It goes in cycles. The wheel turns and sooner or later the big companies are forced to deal with ISO and national bodies, and they find this lack of control very unpleasant. Sooner or later they find some reason to split back to more dominatable bodies, and they jump ship.

It is not all venal (or even venial) or negative though: for example, look at SGML: Sun’s Jon Bosak (and many others) were unhappy with the way and speed that SGML maintenance was proceeding and we went to W3C as a forum for making a simple profile and addressing a lot of peripheral issues, and XML in turn became the foundation for the update of SGML. There is always an interplay between what the boutique, specialist bodies are interested in, and what the national-body-based regimes such as ISO are interested in: industry activity is actually really important, because it clarifies what the ISO groups should be doing.

The downside is that when these large, usually-US-based multinationals hop over to their boutique bodies, they have to try to justify their jump by slagging off at ISO/IEC. This is a predictable behaviour: it has happened in the past, it is happening now, and it will happen in the future. Some parts of the complaints are often reasonable, some parts are often merely self-serving, but it is not a new behaviour.

Ecma and Java

Now back to Java. Originally Sun put up Java to become an ISO standard using the PAS process (the fast-track process that ODF used) using the Open Management Group (another boutique group) as the submitter. Then Sun changed its mind and decided to submit it to become an Ecma standard (and thence to ISO on Fast-Track) because

In examining our standardization options, our primary goal always has been to preserve the industry’s substantial investment in evolving and using the Java technology,” said Dr. Baratz. “By paring the collaborative Java Community Process with ECMA’s proven standards process, we can achieve international standardization while preserving rapid innovation and cross-platform compatibility.

According to this article Sun chose to go with Ecma, because it was flexible enough to allow maintenance to continue on through the Java community process as it stood then. Other articles suggest that one reason for Sun’s reluctance to be involved at ISO was their strong desire to keep effective control. One particularly interesting aspect of the article is that it mentions the potential danger from Sun’s point of view of HP, Microsoft and so on doing exactly what Sun had attempted to do with PWI: make up their own version of the standard and submit it to ISO!

Of course, what Sun was concerned about was Microsoft’s attempts to destroy Java’s Write-Once, Run Anywhere promise by grafting on their own graphics primitives into J++ and splitting the market. This is of course how IBM put a nail in Java’s coffin for the desktop, by doing exactly the same thing with their SWT graphics library, as used in Eclipse: it is not a part of standard Java and Java applications that use it are not WORA applications.

The fight between Sun, IBM and Microsoft over their effective graphics libraries shows a couple of things that are very instructive. For a start, it shows that they all try to use standards for their own competitive purposes. It is no news: the challenge it to try to use the standards process to channel them into behaviours that benefit society and the market.

It also shows the futility of non-layered standards. The WORA spiel is really compelling, and it is something that I bought into with my company Topologi, but all systems that have to grow need to support what I call Organic Plurality. Systems with modularity in the wrong spots die but can cause problems in their death throws: it seems that with Java, the graphics interface was exactly such a spot, unfortunately for the vision. (For another aspect of this, see The Software World of 2010: Its about the Suite.)

But thirdly it shows that the big players have been involved in these kinds of standards games for years. For a while, and under the noxious impact of the MPEG group, the large companies got excited by the idea that they could use standards bodies to become revenue-generators by standardizing on Royalty-bearing technologies.

Pigs at the trough

In the middle part of this decade, there were attempts at OASIS for this, and many of us spoke out against the large companies trying to do this, and we were successful. For people with short memories, the background of this was the attempts to get non RAND-z technologies adopted for DRM proposals: the major pigs with their snouts in the trough at that time were ContentGuard (ex Xerox), Microsoft and IBM, all the usual suspects. (Readers may also be interested to note that Patrick Durusau got involved in the OASIS DRM effort, on the side of the angels: he has a very hard-headed attitude to all the large companies, and not one that endeared him to Microsoft or IBM.) By 2004, the OASIS DRM group wound up without getting this endorsement for the non-RAND-z technologies. RAND-z won!

David Berlind has quite a good article on why a non-RAND-z standards organization is a “patent shelter” and not open: it is great that OASIS has straightened up here, and I hope SC34 continues its long-standing RAND-z policy. But it is especially great that companies like Microsoft, IBM and Sun, which a few short years ago were all excessively concerned with trying to keep control and use standards as patent-shelters are behaving well now. However, just because Microsoft, IBM and Sun have little credibility in the world of standards for altruism’s sake, it does not mean that they should be blocked from participating legitimately in standards. To the contrary, we need to have institutions to allow these behemoths to act as good citizens: RAND-z standardization is a great vehicle for a behemoth!

The futility of monocultures

Back to our Java story. In late 1997, SC32’s Java study group had recommended that Sun should submit Java through the “more traditional” processes. Sun eventually did shift to use the Ecma route, but apparantly out of fears it would lose control. Then

.In another effort to block other companies and interests from developing Java platforms that do not meet its strict guidelines, Sun Microsystems on March 1, 2000, declined an offer from ECMA to standardise Java. ECMA, which is a standards organisation in Geneva, Switzerland, denounced Sun because the company refused the standardisation proposal. TechRepublic

Industry gossip was that Sun wanted to make their source code a normative part of the standard and they withdrew when they found it would not be possible through Ecma (or ISO or anywhere!): nice try fellows! I’d love to get some confirmation or another angle on this. But clearly the issue is one of control: integrity, interoperability are all nice side-effects. The trains always ran on time under Mussolini: we should not pretend that centralized control and monocultures do not have some benefits.

However, when we look at the way large companies act with respect to standards bodies, one very large question should arise: it is a variant on Adam Smith’s aphorism (or was it G.B. Shaw) that every profession turns into a conspiracy against the public interest. If monopolistic, cartels and collusive behaviour are undesirable (I don’t use “wrong” here because it carries a moral implication which distracts people from the point and lets them drink from the waters of Lethe from the sweet cup of self-righteousness) because they result in sub-optimal market operation.

So why are standards allowed: surely they are collusive, and interfere with the market?

Public policy

The traditional answer is that public policy encourages standards because and as far as they create markets. When the Torx screwdriver company got its hexagonal screwdriver heads adopted as a standard, they may have been wanting to encourage a market in screws not competitors in screwdrivers, but they were creating a market none-the-less. OASIS lawyer Andy Updegrove, who I criticize a lot for his flakey reporting and bias, has really good legal material at his website which quotes the (U.S.) Fifth Circuit Court of Appeals decision in Consolidated Metal Products v. American Petroleum Institute in 1988:

A trade association by its nature involves collective action by competitors. Nonetheless, a trade association is not by its nature a “walking conspiracy”, its every denial of some benefit amounting to an unreasonable restraint of trade. In particular, it has long been recognized that the establishment and monitoring of trade standards is a legitimate and beneficial function of trade associations.

One key aspect of the setting of standards is that they cannot be needlessly exclusionary: this is why there is always the need for multiple boutique bodies, because when a company is unable to get satisfactory inclusion of its technologies or requirements because existing members have “stacked” the process against it (and it should be noted that this is a negative stacking aimed at blocking: there seems to be no such thing as stacking a standards body in favour of a legitimate technology, quite the reverse: a standards body is there to foster agreements) then that company can go elsewhere. The need for a market in standard technologies requires a slew of supporting markets, including a competitive market for member-based standards organizations. (It’s turtles all the way down, as the joke says!)

When we get to ISO/IEC JTC1 we run out of competitive standards bodies. At the international level, there is quite a clear difference between the kinds of work that, for example, IEEE takes on and the work that ISO takes on. So if allowing plurality rather than blocking is at the very core for justifying standards (I mean voluntary technical standards used by industry, not regulations or which side of the road to drive on) as market-creators and preventing standards from being feet-in-the-door for cartels, what happens at the apex, at ISO/IEC JTC1 for example, when there are no competitor bodies?

The answer is simple: plurality. ISO/IEC cannot be in the business of allowing cartelization, since the only justification for standards is because they actually prevent cartelization by creating markets.

Trapping a bear

From this light, I hope my support for OOXML getting standardized even though I recommend ODF for public government documents, becomes clearer. The need to support plurality goes to the very heart of the mission of international standards bodies. It is one thing to speak of technical issues, it is another to blanket state “We already have a standard that is good enough for us, therefore you don’t need the standard that you think would meet your needs”. Because that is just code for “We want to prevent your technology for operating in its market by limiting the market to our favoured technology”. That kind of blocking behaviour needs to be exposed and rejected.

The large US multinationals have always been trying to use standards bodies to compete, and they have always shopped around, and none of them like giving up control. The recent defection of some of the leading lights of the Open Document Foundation away from ODF springs out of exactly this issue: the charge that Sun has tried to keep too much control. They all try to play this game, it is not new.

So what can we do? We have to be like bear trappers. The bear is bigger than us, has an off-putting odour, and a taste for honey. But when the bear wanders into a cage, you don’t say “Oh, Mr Bear, you are too big” or “Oh, Mr Bear, you stink” or “Oh, Mr Bear, all you want is to raid the honeypot, such a naughty and greedy animal does not deserve to be trapped!” You close the trapdoor and jubilate. The history of these large companies is that they all try to find the route where they can maintain the maximum control, and very often they will get skittlish at the amount of control they have to give up. Even Ecma, which is polloried at the moment for being some kind of a rubber-stamp, would have required giving up too much control for Sun with their Java effort: and you would not want to think that Ecma were necessarily the most accomodating here.

A lot of the anti-OOXML material over the last year has been along the lines “Don’t you know how bad MS is” spouted by companies who have been playing exactly the same kinds of games. Think SWT, think DRM, and so on. But standardization can be a real game changer: one of the few game-changers on the horizon. The chance to capture a large mass technology into the review and influence of the international standards organizations comes very rarely and IMHO is not a chance that should be squandered on petty ideological or competitive points. Open Source millionaires and closed source search engine companies, all of them are in the same boat as the rival office suite developers: competitors with vested interests to block the development of multiple markets.

The thing is that competition between these kind of standards is not just good, it is essential. I have just been looking at the new feature list for OpenOffice 3.0, due mid year, and it finally includes tables in Spreadsheets. Now it has been incredible to me that this has not been there before: I don’t know how you can make a presentation without tables. But tables in spreadsheets was not something encouraged by ODF before OOXML came on the scene. (It is not a feature suggested for spreadsheet applications in the informative feature table in ISO ODF, in particular.) And the recent changes in OOXML have surely occured in part to catch up with ODF: it is not one sided. The competition is forcing each technology to be improved in places that their original champions did not consider important.

Given the utterly toxic relations between the various players at the moment, which makes any talk of sitting down at the same standards body ludicrous, what we need is frog race. Rival technologies whose stakeholders are attempting to leapfrog each other, but with each jump taking them closer to the goals we have set: open standards, with better QA, harmonized and mappable where possible, supporting plurality, extension and adequate profiles, with decent validation and test suites. The anti-OOXML side tries to claim that the best way to openness it through enforcing a monoculture, but the experience of the last two years, and the substantial improvements in the ODF and OOXML technologies that have occurred and are pending are clear indications that standards need to harness the competitive energies of the stakeholders rather than dissipate them in prolonged committee-room chicanery aimed at maintaining the current “equilibrium”.

Kurt Cagle

AddThis Social Bookmark Button

I have recently accepted the position as Site Editor for the XML.com site, becoming responsible for the content appearing throughout the site as well as helping to guide functionality and look and feel for this particular portion (and to a certain extent the other sites in the O’Reilly Network). Having contributed to xml.com for several years, I feel honored to get a chance now to steer the editorial direction of the site, but I also need help doing it.

What I’m looking for right now, more than anything, are bloggers interested and passionate about XML and who would like the forum of XML.com to share these ideas. Given the breadth of the XML field at this point, what I’m looking for in terms of skills or expertise is equally broad; specialists (and generalists) in:

  • XML Data Technologies (XQuery, LINQ, XForms, etc.)
  • Semantic Web, both formal (RDF Stack) and informal (micoformats, folksonomies, and so forth)
  • User Interface, User Experience and RIA Components (AJAX, XUL, Silverlight, Flex, CDF/WICD, etc.)
  • Publishing and Syndication (AtomPub, Office Formats, DocBook, DITA)
  • SOA Services (SOAP, WSDL, Messaging and Marshalling, ESB, etc.)
  • XML Data Modeling (Schema design, taxonomies, methodologies)

These are currently unpaid positions, though we’re working on plans to change that, but the site is widely recognized as being one of the pre-eminent authorities on XML technologies on the web, and we hope to provide as much editorial freedom as possible to all of our bloggers.

So if you are interested in writing a regular blog on the hottest trends in XML, give me a shout at kurt@oreilly.com with what you’d like to do and, if you have any, some samples of writings on the web.

Rick Jelliffe

AddThis Social Bookmark Button

IBM/Lotus’ Rob Weir has a timely blog up entitled How many defects remain in OOXML? Timely, because of course, the clock is ticking on the OOXML vote, so this is coming up to his last chance to throw some mud. This is a subject I am interested in, and have blogged on before, so I think it might be useful to make a comment.

The Set Up

First lets look at the set-up material:

DIS 29500, Office Open XML, was submitted for Fast Track review by Ecma as 6,045 page specification. (After the BRM, it is now longer, maybe 7,500 pages or so. We don’t know for sure, since the post-BRM text is not yet available for inspection.)

Longer? Well what has happened is that

  1. Normative schemas (with structural improvements to run better on the open source XSD validators) that were in external files are now included in the text: there is no change in the amount of information in the standard despite the extra pages! In fact, because at the same time the schema fragments in the draft are now (post-BRM) informative, there has actually been an decrease in the amount of normative text.
  2. Non-normative material on accessibility has been added, again not requiring the kind of review of thought that normative text requires.
  3. Extra explanatory material requested by NBs has been added, but this text was specified in the Editor’s responses or explicitly by the BRM, it simply isn’t the case that NBs don’t know what this text is: see the BRM outcome documents.

I have blogged before against the simplistic use of page length: That diagram (Let me ring your bell), and I refer interested readers to that.

Next, comes:

Based on the original 6,045 page length, a 5-month review by JTC1 NB’s lead to 48 defect reports by NB’s, reporting a total of 3,522 defects.

Now what you might not realize from this is that the 5-month review is actually a title or nickname for one phase of the review, not the actual time limit. The initial text was released in December 2006, and national bodies didn’t actually submit their ballots until September 2007. So National Bodies had 9 months, not 5. (And interested parties could have participated for the prior year-long process at ECMA, which included a public draft.)

The total of 3,422 defects sounds impressive, except that most of them were duplicates, many just cut-and-paste duplicates by lazy or novice reviewers who somehow were under the misapprehension that in ISO process the squeaky wheels would get the most oil. ECMA grouped them into 1027 unique issues, however my estimate was that many more could be grouped together (this is borne out by the repetition of answers within the Editor’s disposition of comment) to about 750 really unique issues.

Next comes the material on a defect count per page. (To give an idea of why this is an area where simplistic use of numbers will be actively misleading is, of course, that adding the extra pages of schema material will actually cause a reduction in average the number of errors per page, without decreasing the absolute number of problems.)

I have blogged before On error rates in drafts of standards and I refer interested readers to that. Note that I give an estimate of the number of errors that your would expect to be caught (in one pass) at about 1,000, which was exactly what we have. In particular, note (ISO SQL Editor’s) Jim Melton’s comments, which I will repeat

Or perhaps most people were somewhat intimidated by the prospect of (thoroughly) reviewing a 6,000 page document. To put this in perspective for those who know SQL’s size and complexity, the sum of all nine parts of SQL is about 3950 pages. A ballot on SQL frequently receives several thousand comments, and we’ve been balloting versions of SQL for 20 years!

In fact, virtually every large spec I’ve ever had the “pleasure” to review leads to “thread-pulling”, in which every page yields at least “one more” bug, and following up on that one leads to more, and following up on those leads to still more, etc. I would personally be stunned if 30 dedicated, knowledgeable reviewers of a 6,000 page spec on its first public review were unable to find at least 3,000 unique significant problems and at least 40,000 minor and editorial problems. But that’s just me…

Under that kind of criteria that our Big Blue friend is proposing, the ISO SQL standard which is one of the most widely implemented and important and mission-critical of all ISO IT standards would not be of high enough quality to make the grade! Next Mr Weir says:

If we believed that the 5-month review represented a complete review of the text of DIS 29500, by those with relevant subject matter expertise, then we would have some confidence that all, or at least most, defects were detected, reported and repaired.

Did you see the sleight-of-hand there? The outcome “repaired” is not the only possible outcome! The big possibility that Wier misses is that a defect can be allocated to maintenance: the ballot to become a standard is not the end of the process but merely the start! But absolutely no reference to this. Why? To panic people into assuming this is the last and only chance to get things perfect.

(Weir does have another post Contra Durusau, notable for a really sleazy reference to Seattle. He takes an unrelenting anti-maintenance line, rather surprising in the light that the same arguments can apply to ODF which is his alternative. It does not suit his argument that there are many standards with successful maintenance.)

The Trick

One of the constant themes over the last year has been the theme of panic. QUICK: You only have one month to find contradictions. QUICK: You only have five months to find defects. You only have a few weeks to evaluate the Editor’s comments. Every person has to read or review the whole standard. Every national body needs to have an explicit detailed position on every issue. And so on. Always under the assumption that the current stage is the last and only chance for change.

It every case this panic is has been unnecessary FUD-mongering, because at ISO there is always the scope for improving a standard. [The normal caveat that you want to get it as right as possible first time because you cannot bolt the stable door once the horse has bolted does not apply with the same strength as with a from-scratch standard because the horse has already bolted. In fact the horse has been off and running for the last 20 years! So “getting it right” relates to documentations and harmonization rather than the general shape.]

What happens when a draft gets accepted as a standard? It gets subjected to the normal committee maintenance procedures. There is indeed a special step which can be taken where a standard gets deemed stabilized and so not subject to maintenance, but there is absolutely no way that IS29500 (or IS26300) are candidates for that yet!

Maintenance sounds a dreary word, but what it means is that National Bodies (and liaison bodies) can submit to SC34 defect reports. And I would hope there are a backlog of these issues: a trouble with a stretched out Fast-track such as we have had is that it means there is in effect six months where Defect Reports have to sit on the shelf waiting until the standard is accepted before being processed. That there have been more defects or improvements discovered since the ballot was taken is not a source of wonder or horror: of course there will be more issues discovered: how could it be otherwise?

But it is a complete mistake, and at worst disinformation, to think that defects remain outstanding, that the standard is set in stone at the time of voting. Indeed, ISO ODF is largely predicated on there being ongoing maintenance to fill in the gaps and fix problems that are found. The thing is that standards based on deployed technologies do not need reviews based on “is this technology bogus and unimplementable” in the way that blue-sky standards do: in the case of Open XML and ODF and PDF you can open up a file and look at it and see whether the big and middle picture is workable. (And you can go further and validate the XML with the schemas, for fine-grained and objective compliance testing, of course.)

At ISO/IEC JTC1, the rule is that the Editor has to handle defect reports “promptly”. (”Promptly” needs to be measured in quarters of years, it won’t be weeks. But it won’t be years or decades, which is how long some bugs have persisted in Office without the circuit-breaking of National Body scrutiny.) SC34 participants have been discussing many issues relating to getting maintenance agile and pro-active, and National Bodies who are interested in document standards need to get involved.

What you have in the ISO process is equivalent, if the NBs want it, to a Ballot Resolution Meeting every six months in perpetuity. Defect Reports can include detailed suggestions for change, and it is even possible to bundle them as Draft Amendments and get that fast-tracked.

There is a lot of talk about “ECMA should resubmit it for another fast-track” or “ECMA should resubmit it for slow-track” and so on. I regard a lot of this talk as disingenuous, because it is frequently suggested by commentators who you know are not interested in corralling OOXML into a standard no matter how technically excellent it can become. It looks like a compromise but it is intended to block progress not help it. Now I have no general objection to standards taking years to complete, but for a deployed technology the correct process is the maintenance process not the committee draft process.

Every standard that gets adequate review will have reams of defects reported. That is just as much a function of the intensity of review as the underlying quality of the standard. Indeed, you could use a reverse metric: any standard which does not have at least one defect per 6 pages reported (for example) should be suspected of having inadequate review. DIS 29500 has had thousands of people reading it and reviewing it. Thousands, not hundreds. A big swathe have been dealt with, a big swathe has been dealt with partially and can be improved further; and there is a big swathe of issues that are not defects at all but extra features which clearly belong to maintenance not initial review.

But the idea that this is it, this is Microsoft’s only accountability moment where they get a pass or fail is propaganda, not the ISO process. It is completely true that the maintenance procedure needs continued interest and continued pressure, but it is not true that this is the last chance to improve the standard as if it will be frozen for all time.



Update

In comments below, ISO SQL editor Jim Melton has clarified his comments. I was glad to see him say Please note also that I have taken no position at all on the merits of standardizing the technology in the spec, nor even the merits of the technology itself. What Jim says, however, is that he would expect a full multi-year review of a new 6,000 page spec to almost certainly reveal upwards of 5000 unique issues.

I have three responses to that. First, that Ecma 376 already had a year of review before ISO, so it is inappropriate to count the number of issues as from a de novo standard: we should be open to the possibility that in fact we did not find thousands more problems because they are not there. (However, Jim’s original comment about pulling threads is really appropriate.)

Second, that the error rates in a standard have to be tied to the number of normative pages not just the raw page count: OOXML is unusual as a standard in having so much repeated and non-normative material: indeed, Patrick Durusau in 20 hours was able to condense the WordProcessingML material by 74% to 452 pages: assuming that the other parts have similar rates that gives us about 1500 normative pages, which by Jim’s metric should reveal only 1250 unique issues. Compare this to the approx 1,000 issues that were dealt with (and the large number of issues dealt with en masse such as fixing ISO-ese shalls and shoulds and fixing examples) and the review is actually looking pretty good even on Jim’s metrics, isn’t it!

And my third point is the same one I have said elsewhere. The maintenance process is the best place to deal with remaining issues. If you look at some of the FUD lists floating around of new issues, you see an indiscriminant grab-bag of new feature requests, denials of the scope of OOXML which emphasizes legacy features, function changes, as well as (hopefully) some errors proper. These are not showstoppers, but they all should be dealt with sooner rather than later because of their importance. And sooner means by maintenance of the standard, not by pre-standardization faffing around and fillibistering.

Update 2

A website picked up on this exchange and quoted Jim’s

You’ve written 6000 pages of specification largely in secret (and, I understand, recently added over 1500 more pages) and given the world five months to read, absorb, understand, review, critique, and establish informed positions on it.

So I think it is useful to restate the problems with this.

  • 6,000 pages The pre-BRM draft standard (DIS 29500 mark I) had over 6,000 page plus several hundred more for schema files that were not printed in the text. However, the text of a standard has normative parts which state actual requirements and informative parts which give extra information to help users. Estimates from the editor of a “rival” standard is that about 75% of the content of DIS 29500 mark I was informative or could be condensed to that without loss. The additional pages (and I have seen no reliable count that it is 15000 pages: that seems just puff) is mainly due to taking schemas that currently are normative and putting them into the standard; however, at the same time repeated fragments of schemas in the draft text are being made informative, so actually there is net decrease in informative material.

    So really what we have is a standard of about 1500 normative pages (perhaps 2,000 pages including schemas) with about 4500 pages of additional information to help explain it. The attempts to use the blanket figure 6000 disguise both that the text has an enormous amount of material to aid understanding but also to allow inflated views of the amount of work needed to find errors in the normative sections. Furthermore, there is an enormous amount of repetition, so review comments from one section often applies without change to other sections.
  • Secret Actually, Ecma put out a public draft for comment.
  • Five months No, the “five month period” is the nick name, and it actually took nine months until the ballot. So not 5 months to review 6,000 normative pages, but 9 months to review effectively 1,500 normative pages. What is the difference: well let us remove 1 month for administrative palava, the difference is 6,000/4 = 1500 pages per month and 1,500/8 = 187 pages per month.
  • Five months No, actually there was an additional period after the ballot where National Bodies could look at each other’s comments and participate in the Ballot Resolution Meeting: which takes it to over a year in total, not including the previous year of development at Ecma
  • read, absorb, understand, review, critique, and establish informed positions But every individual National Body does not need to have a definite opinion on each individual issue: abstain is fine on issues that are not of interest or are outside the expertise. I don’t know how the ISO SQL Steering Committee works, but in SC34 national bodies try hard not to act outside their competence and are careful to abstain rather than spoil the process: they find the best experts they can and encourage development of national expertise and awareness of their particular national interests: Japan on internationalization, fonts and formal schemas for example. The review happens not because everyone involved knows everything, but because collectively and cooperatively all the issues get adequate coverage. For example, there may only be three or four National Bodies with deep experts on maths, and several more with general experts who can get the drift pretty well, and a few more with industry contacts and other liaisons, and that is more than adequate for review.
  • Given the world SC34 has been operational in one form or another for almost 25 years. People who are interested in this area have had a long time to get involved, learn the procedures, get national committees going, participate in various standards to learn the ropes and make networks. Both when ODF and OOXML were first proposed for fast-tracking there were good signs for people who were interested to get involved. The idea that somehow DIS 29500 has been foisted on an unsuspecting and unready public shifts the responsibility away from the people who should have been participating and up-to-speed. If a National Body (or government or other stakeholder) ignores developing skills and experts who will be ready to participate when the time comes, of course they will not have enough time: but it is their fault! If you are running in a race, arrive late, and the starter’s gun goes off while you are still putting on your shoes, you cannot complain “I didn’t have enough time!”
Rick Jelliffe

AddThis Social Bookmark Button

This is an open letter to all companies who achieved market success in the 1980s and 1990s with PC-based applications.

The recent controversy over ODF and Office Open XML at ISO shows both that there is substantial interest in document formats, and that there is also substantial commercial rivalry. I do not believe I am on my own in thinking that the writing is on the wall: the days of private proprietary formats, especially binary formats, are numbered and perhaps have already expired.

There are of course many millions of documents archived in these older formats, and it will be a major challenge for archivists to figure out workable and cost-effective strategies for maintaining or grandfathering these documents into newer formats, especially more-or-less lossy standard formats.

Corporations who were market leaders in the 1980s and 1990s for PC applications have a responsibility to make sure that documentation on their old formats are not lost. Especially for document formats before 1990, the benefits of the format as some kind of IP-embodying revenue generator will have lapsed now in 2008. However the responsibility for archiving remains.

So I call on companies in this situation, in particular Microsoft, IBM/Lotus, Corel, Computer Associates, Fujitsu, Philips, as well as the current owners of past names such as Wang, and so on, to submit your legacy binary format documentation for documents (particularly home and office documents) and media, to ISO/IEC JTC1 for acceptance as Technical Specifications.* Handing over the documentation to ISO care can shift the responsibility for archiving and making available old documentation from individual companies, provide good public relations, and allow old projects to be tidied up and closed.

The recent controversy over Office Open XML and ODF has occurred in part because both were submitted to become International Standards, which is appropriate for living formats. However, there is still a substantial public interest that would be served by existing documentation of legacy formats being submitted as Technical Specifications or Technical Reports, which, as classes of documents that are less than a standard, will be less controversial but still useful for putting this valuable information onto the public arena. As publicly available specifications, ISO/IEC would make the material available free on their website: free access is a very important outcome.

For nations where the 17 year patent time applies, there seems little reason why formats from 1990 and before could not be quickly submitted and dealt with in this way. However, given the enormous benefits that openness brings in increasing the size of the pie, I suggest that even recent formats, for example formats before 2001, should also be submitted to ISO as Technical Specifications in this way with some appropriate RAND-z IP covenant or license.

Examples of these formats that spring to mind include:

  • All Microsoft Office binary and text and media formats, including RTF and Visio
  • All IBM/Lotus binary and text and media formats, including Visicalc
  • All Corel formats, including WordPerfect

Furthermore, I call on archiving and regulatory bodies to investigate encouraging and supporting this kind of activity. As well as office document formats, there are substantial legacy collections of financial and engineering documents which would also benefit from the same treatment. It should go without saying, but the Macintosh, Amiga, OS/2, and applications on the many different versions of UNIX may also have hosted popular applications whose documentation may be in danger of being lost unless it is lodged with a suitable formal international technical library, such as ISO/IEC.

The ISO/IEC Technical Specification is a good, low-fuss medium for making sure that older formats do not disappear, and without requiring costly rewrites or changes.

*Contact your local national standards body for advise on this, or your local SC34 committee member. Do not get too caught up in whether the document is a Technical Report or Technical Specification.

Rick Jelliffe

AddThis Social Bookmark Button

In the markup world, the jargon is that inline markup is the tags that delimit ranges of text in a document (e.g., Plain Old XML), while out-of-line markup is where the structures and labels are in one place but the subjects of the structures and labels is in other place (e.g., XLinks). Of course, you can have XPaths which drill down to some piece or bundle of information with inline markup, but where there is out-of-line markup there is potentially another XPath that can drill down through the out-of-line markup and end up labelling the same information.

What may not be obvious is that a web system that uses the PRESTO is in effect using URLs that act like XPaths on virtual out-of-line markup. “Virtual” because no actual tree is ever explicated (necessarily): notionally PRESTO uses resolver rewriting.

That good markup practice is to directly markup the information without fluff and tricks and in as pleasant a way as possible is universally acknowledged; and that there are many kinds of information structure where the markup cannot be a neat model of the data such that all elements represent objects of the same analytical importance is also widely known and regretted. (Think of the distinction in XSD between the components (the objects of the schemas) and the tags used for each component, for example. Or the *Pr containers in OOXML. )

A PRESTO URL should give the view in terms of the (conceptual) components, not the specific tags used if the resource is stored as an XML document. And not necessarily every tag, certainly. But every concept (every significant concept) should have a URL, even if there is no representation available or only a pretty crappy one.

So if in PRESTO a URL represents a kind of XPath to a virtual out-of-line markup view of some data, then it is possible to have a virtual schema for that virtual markup: in effect, you could have a schema for the URL. For example, given the virtual schema (as RELAX NG compact syntax here):

  element address {
     element tent { text },
     element oasis  { text },
     element wadi { text },
     element desert { text }
  }

which would allow PRESTO URLs like

   http://www.eg.com/address
   http://www.eg.com/address/tent
   http://www.eg.com/address/oasis
   http://www.eg.com/address/wadi
   http://www.eg.com/address/desert

In PRESTO, these should be available regardless of how the data is stored, because the idea is to model the user’s conceptions. (And if an exact match is not available, to provide the best fit. This certainly creates a task allocation between front-end and back-end systems that may not be workable for some organizations or tasks. No sweat.)

But what about cardinality? Here is a schema more typical of literature:

   element law {
       element title { text}
       element part * {
            element title { text } ,
            ( element p { text } |
              element list {
                  element item  { text } +
              }
            )*
         }
    }

The Xpath for accessing a particular part’s title would be /law/part[2]/title so the PRESTO URLs would need some kind of convention.

In PRESTO we *might* have URLs for

     http://www.eg.com/law/
     http://www.eg.com/law/title
     http://www.eg.com/law/part
     http://www.eg.com/law/part2/title
     http://www.eg.com/law/part2/p3
     http://www.eg.com/law/part2/list4
     http://www.eg.com/law/part2/list3/item4

Now, I am not sure I understand the issues well enough to say which system for indexing is absolutely best. But I think the advantage of http://www.eg.com/law/part2/title over http://www.eg.com/law/part2/title is that it is probably a more common case that your system is interested in /law/part[2]/title rather than all titles of parts /law/part/title. But it is a matter of the particular use case and the consequent virtual schema.

(Another possibility is just to bite the bullet and allow XPath syntax directly in the URLs, with appropriate percent escaping. For example http://www.eg.com/l/law/part%5B2%5D/title. Is this reinventing XPointer? Well, in a way, except that in Xpointer you are locating a file then drilling down according to the actual markup: in PRESTO there information is merely hierarchically accessible according and you are using the Use Case concepts to zero in on the information.)

Rick Jelliffe

AddThis Social Bookmark Button

One question that comes up really regularly when I have been yacking about the PRESTO approach with people over the last month, is that people don’t see how Objects fit into it. They get Persistent URIs, they get REST, but the Object part is not so obvious. (Actually, I have had several people email me that they approach is one they have been tending towards in their work too.)

One reason, of course, is that the term Object-Oriented is generic and used for a family of related ideas, rather than being a single neat idea. But the PRESTO idea is that the public URLs should reflect an object-oriented modeling of the data and systems, and that you should have URLs for every object in your system even if there is no satisfactory representation of that resource.

Wikipedia says that an object can be viewed as an independent little machine with a distinct role or responsibility which is a good start, but I have always thought a key value is objects was that they can help model the system according to concepts according the users/developer’s/domain’s mind or usage. The aspects of being an object that PRESTO is interested in are encapsulation (the idea that entities should be self contained, with data and methods tightly coupled) and introspection (the idea that you can ask an object about its contents: methods, children, etc.). [UPDATE: Oi! NOT INHERITANCE, NOT RPC, NOT INTERFACES, NOT COUPLING STATE, NOT POLYMORPHISM] Bjarne Stroustrup has commented recently that problems which can be composed into a hierarchy are good candidates for Object-Oriented solutions (sorry, no reference here: it was in a Linux magazine I was reading today, maybe Linux Developer…has a Sun Solaris distro on the DVD.)

In pattern terms, PRESTO is a Facade pattern applied to URLs. In terms of UML, we might see PRESTO as saying that public-facing URLs should be constructed based on some entity analysis such as Use Cases or Package Models.

But the key way to think about it is just basic object concepts. The PRESTO approach says to form URLs so that each “directory” in the URL is an object, and its contents are sub-objects, data or other resources. Methods are not expressed as queries, but declaratively by identifying their result: so you don’t say http://www.eg.com/document/?getGraphic but http://www.eg.com/documents/graphic which then allows you to say http://www.eg.com/documents/graphic/title and so on.

Of course there are often many alternative ways of organizing or categorizing data. Which is why you appeal to use cases to guide you in which the best form is. Indeed, you might have alternative PRESTO URLs for the same data resource.

One piece of software that is highly useful for implementing a PRESTO system is the Tuckey UrlRewrtieFilter which is good for Java-based web servers. We are finding that Rregex-based URL mapping makes the whole thing quite easy and painless, in particular when retrofitting a PRESTO facade on top of an existing web site. The difficulty is largely where it belongs: in figuring out which objects are most interesting or obvious to the users. This is where modeling the particular Use Cases or even Configuration Items comes in.

Rick Jelliffe

AddThis Social Bookmark Button

The story so far

  • In the 1990s and earlier, Microsoft was notoriously prominent in its desire to keep its binary formats proprietary: it provided RTF for text-based interoperability but RTF did not allow full round-tripping of data.
  • In 2000, Microsoft started providing XML data dumps for spreadsheet data and each subsequent version MS Office has used XML more, with the Office 2003 providing quite full support, to the extent where now the default save formats, on the Windows platform at least, are all XML-in-ZIP file, the latest generation with the name Office Open XML (which people often write as OOXML.)
  • In 2004 a European Union agency recommended to MS that it should continue down the XML route and open up its formats by submitting them to some international standards body. (At the same time, a recommendation was issued for OASIS to submit ODF to ISO.)
  • In December 2005 Microsoft founded a technical committee at the ECMA standards body, TC45, which worked for a year and released ECMA 376 in December 2006; during this time the specification, which included much text based on documentation for the older binary formats, grew from about 2,000 pages to over 6,000 pages. A public draft was issued in mid 2006. (At the same time, around December 2005, OASIS submitted ODF 1.0 to for ISO consideration using a variant fast-track procedured: it was accepted with scant National Body review in mid 2006.)
  • At this time (December 2006) ECMA 376 was submitted to ISO/IEC JTC1, the international standards organization, for “Fast-Track” adoption as a standard: the fast-track process is used for standards which have been drafted at other organizations, and enter the process as Final Draft International Standards. At this stage, National Bodies had about eight months to review the standard and come to an initial position. Many National Bodies invested significant effort in attempting various reviews, however this period was also characterized by the raising of many spurious issues. (In early 2007, an update to ODF called ODF 1.1 was released at OASIS but not resubmitted to ISO, with improved accessibility features.)
  • In September 2007, the initial ballot of National Bodies resulted in a significant number of “No with comment” votes, which triggered a Ballot Resolution Meeting (BRM). The BRM had been widely expected, due to the expected large number of comments. in the ISO process, a “No with comment” has also been called “Conditional Yes but many journalists and commentators at this stage preferred oversimplification to reality. Over 3,000 individual comments were received, however the majority of these were repeated form-letter comments part of an organized campaign, rather than coming from fresh National Body Reviews.
  • In mid January 2008, the Editor for DIS 29500 released a promised Disposition of Comments document, containing suggested fixes from ECMA for addressing the National Bodys’ issues: these ranged from simple acceptance, to alternative approaches to rejection of the issue, with their justification for these. ECMA had bundled the issues into about 1000 different responses. I wrote earlier, The Editor’s Disposition of Comments …is usually the starting point for comment resolution, and, given that most comments are uncontroversial, is often the end-point too.
  • In early 2008 Microsoft releases the binary format documentation under its OSP covenant, and promises the mappings between the binaries and OOXML: this seems in direct response to requests for this from NBs, though the mappings are not in-scope for DIS29500’s text.
  • In late February 2008, a week-long Ballot Resolution Meeting was held in Geneva, Switzerland. It was attended by 120 individual delegates from about 34 different National Standards Bodies. The outcome of the meeting was a series of editor’s instructions to allow a new draft of the standard to be create: usually these instructions are completely specific though there may be some general ones, for example to use one term rather than another globally. (At time of writing, March 2008, OASIS has been working on ODF 1.2 which is slated to improve several important ODF weakspots, in particular relating to formulas and metadata. It is mooted for re-submission to ISO during 2008.)
  • The results of the BRM are available online and
    National Bodies now have one month (end of March 2008) to decide if the changed draft meets their requirements. For the new draft to pass, it will require 5 National Bodies (of the “P” class), to switch from Abstain or No votes (remembering that No with Comments may mean “Conditional Yes”)
  • Of the 1027 Editor’s responses, the BRM addressed 189 responses by specific resolutions and discussions of the BRM, and the rest using a paper ballot where each National Body in attendance voted: this accepted 825 of the Editor’s recommendations and rejected 13. (The issue of a paper ballot had been abstain on issues of lesser interest to them.
  • If the new draft is adopted as a standard, it does not remain static but can be “maintained” by the relevant ISO/IEC JC1 committee, SC34, Document Processing and Description Languages. Procedures exist for National Bodies to submit Defect Reports, which again attract the Editor’s attention and National Body voting acceptance, so the kind of process seen at the BRM becomes an ongoing effort, if there is enough interest by National Bodies.

The upshot is that, if DIS29500 mark II and ODF 1.2 both get accepted as standards, by the end of 2008 we should have two standards which together can thoroughly cover the field of representing current and legacy office documents, each representing one of the two dominant commercial traditions, with both under active and significantly open maintenance to fill in the remaining gaps and to repair pending broken parts, with clear cross-mapping to allow interconversion, with an increasing level of modularity so that the can share their component parts, and at least with a feasible agenda of co-evolution and other kinds of convergence.

And if we play our cards well, both traditions will have significant competitive motivation to accommodate the technical requirements of their competitors. Viola, harmonization? (Violà, harmonisation?)

The big picture changes

The “big picture” changes very often concern issues of conformance and modularity.

  • The draft is being split into 4 Standards,
    1. Fundamentals
    A large standard for the core of OOXML
    2. OPC
    Open Packaging Conventions: the details on using ZIP and referencing
    3. Markup Compatability and Extensibility
    4. Transitional Migration Features
    ContainsVML and features not recommended for new documents. Problematic terms like “legacy” and “deprecated” have now been avoided.
  • Six document conformance classes have been created: Core and Transitional classes for WordProcessing documents, Spreadsheet documents and Presentation documents.
  • Six application conformance classes have been created: Base and Full classes for word processors, spreadsheet and presentation applications.
  • The scope sections have been clarified.
  • Normative references are to be complete.
  • Use of standard formats for syntax: BNF
  • Use of standard measures for typesetting lengths
  • Use of standard format for dates
  • Use of IANA/ISO names for language and countries codes
  • Development of a prefix mechanism for spreadsheet formulas, presaging a full namespace modularity system like Open Formula’s.
  • Encouragement for applications to save equations as MathML even if they also save in the OMML maths.
  • Many casual references to MS-tradition technology removed and replaced by references encouraging W3C technologies for interchange

The small picture changes

The small-picture changes frequently are aimed to make the draft more “ISO-ish” and therefore make maintenance and future development at ISO/IEC JTC1 easier.

  • All known typos will be fixed
  • All known errors in examples will be fixed
  • All schema fragments will be marked informative to prevent clashing
  • ISO standard conformance language will be used: shalls and shoulds

The middle picture changes

The changes from the BRM usually relate to either correcting bugs or better documentation. Additions to functionality tended to be limited to providing better accessibility and better internationalization, rather than completing or expanding the general feature set. The Editor’s Disposition of Comments clearly tried to reduce the amount of gratuitous breakage of documents or applications, and the explicit resolutions of the BRM continued this policy IMHO.

  • Accessibility features to support better tabbing (in the fashion of HTML’s tabinfo) and table labelling. An informative reference to guide developers in accessibility features is being added.
  • Multiple changes to support right-to-left writing, half-width character terminology and less US-centric artwork and measures
  • The schemas have been re-written to be more compatible with the frailties of various XSD implementations. The XSD schemas will be included in the text as annexes with line numbers. There will be both Strict and Transitional schemas, following the model of HTML. The RELAX NG schemas have been regenerated accordingly and much improved: many people may find them preferable to the XSD schemas.
  • Hundreds of clearer explanations of multiple elements and functions.
  • Almost all bitfields will be replaced by specific attributes. (The bitfield which accords with ISO Open Font remains.)
  • Fixes to the CONVERT() function and a mathematically proper ceiling function, ISO.CEILING() for spreadsheets
  • A mechanism to prevent applications from executing files with incorrect types, to prevent viruses
  • Strings may not have non-XML graphical characters in them
  • Different hashing algorithms

Plus hundreds more.

Other Issues

Many other related issues were also discussed in the hallways at Genva. For example, the German DIN standards body is preparing a cross-mapping list to match features in OOXML and ODF: there really is very little information on this currently, despite the confident assertions that ODF can/cannot handle everything that OOXML does and vice versa. The Italian standards body is seeking to work on conformance suites for testing: obviously the schemas and BNF grammars allow validation testing of instances for document conformance, so I presume the test suites will be more concerned with application conformance. ISO/IEC JTC1 SC34 has been making various preparations to establish an effective and responsive maintenance regime: ODF could also benefit from this effort.

With over 1,000 changes, I certainly will have missed out some items of interest. Will these be enough to sway the necessary five National Bodies? The changes certainly provide objective extra information favourable to DIS29500 supporters, and the sheer number of changes suggests that ECMA is not going for a first-past-the-post strategy but trying to demonstrate a broader commitment to improvements even from antagonistic National Bodies. But though the anti-OOXML faction doesn’t have any new information to provide a counterbalance (discarding the frantic and self-justifying posturings over the BRM) I expect that they will try to explain their longstanding objections more carefully and acutely, since they do raise many good points.

Impressions

I thought the BRM went very smoothly, for a large high-stakes meeting, and I was happy to make some old and new friendships. In substance, the BRM was a typical ISO meeting of this kind: collegiality, druthers, voting, discussion, corridor meetings, rounding up supporters for measures, trying to track down definitive answers on technical issues, and so on. In accidents, it was very unusual due to size, content and ramifications not to mention the new blood pool.

I think we did pretty well in the Australian delegation, in getting many of our issues addressed completely and most of our issues addressed in part, but (like any standard!) the more you look the more holes you see. There are so many improvements that can and should be made by pro-active maintenance. At various times we had particular help from CA, MY, JP, UK, CZ, FI, US, and several others, so an unofficial thanks to those delegates from