August 2007 Archives

Simon St. Laurent

AddThis Social Bookmark Button

I mentioned this a a month ago, but that was, well, a month ago, and the deadline is tomorrow. The XML 2007 Call for Papers ends tomorrow.

Proposals need to include speaker information, a short abstract, and a suggestion for its track. We have four tracks this year:

  • Documents and Publishing

  • XML on the Web (I’m chairing that track.)

  • Enterprise XML

  • XML Training

Lauren Wood (the previous chair of this conference) has posted advice for proposal submissions that I heartily recommend.

Rick Jelliffe

AddThis Social Bookmark Button

Here is my free advise to headline writers: please use “Maybe” for the countries that vote “No with comments” on DIS 29500 (Office Open XML).

Those are effectively the four major votes that can be given on an ISO standard by a national body. As always, the best place for disinformation on votes is headlines.

An vote by a national body of “No with comments” is a “Maybe”, and not an absolute “No”. Looking at it more, I wouldn’t now go as far as Job Bosak’s comment that “No with comments” is the same as “Conditional approval”, however. What really matters is the particular comments: if they are doable or reasonable and inline with goals of the standard and the proposer’s conception of the standard, (and if no-one’s hair is on fire) then No means Yes. But if the comments are undoable or unreasonable or out-of-scope for the standard’s goals or depart from what is acceptable to the proposer, the No means No.

As in “New Zealand says Maybe!”, “India says Maybe!”, “Japan says Maybe”, “China says Maybe”, “Brazil says Maybe”, and so on. Is is not so difficult is it? (Now even then there is scope for variation: “New Zealand says Maybe but probably not” or “Japan says Maybe, but probably” for example. But that would require actually research.)

And for journalists struggling to write the story well, here is another big tip: the votes are on particular drafts and the technical and editorial issues in them. So when there is a “No with comments” vote, that is a vote on the particular draft — a book in progress — not on the underlying technology. A careful writer will distinguish between DIS 29500 (the book being voted on) and Office Open XML (the technology.) Sometimes this distinction does not make a difference, but sometimes it really does, especially in the case of “No with comments” where you may be in favour of having a standard for the technology but want some improvements in the draft. In that situation, treating “No with comments” as the same as “No” misrepresents the process.

Rick Jelliffe

AddThis Social Bookmark Button

You’ve probably seen it. IBM’s Rob Weir’s 2006 diagram comparing the number of pages of various standards versus the time they spent in committee. It makes its appearance unchallenged regularly: indeed IBM (business rival of Microsoft)’s Bob Sutor gave the diagram a prominent place in his blog this week with what, presumably at this last stage, contains the essence of IBM’s argument against DIS 29500 and Office Open XML.

At the Standards Australia meeting, the diagram was brought out again, and I protested that it was misleading, but seeing Bob’s blog makes me want to explain my criticism more. Here is the scary diagram:

spec-speed2.jpg

Digression

The issue of page count and book size is prone to publicity stunts. If you look at this web page, for example, you can see two different printouts of the open XML Spec, The first manages to fit in boxes under a man’s arms (and we don’t know how full the boxes are) while the second manages to be taller than a man! What can account for this doubling of size? Perhaps it is the magic of single sided printing and thick paper :-) (In the 1990s I was discussing a book with a publisher who said “it has to be 1.5″ thick, but if you don’t have enough material we will use thicker stock”! ) Say we have 6500 pages, and we print it at the maximum common paper weight of 105 weight Bond ledger, that gives us almost 3 metres of print out (10′)! But if we print it at the other minimum common weight of 16 weight, that gives us a tad over 50 cm (20″). On average paper weights, this should give about 64 cm (25″).

But back to the main story. I’ll deal with the issues I have with in reverse order of their seriousness.

Apples and Oranges

If you are using page size to compare documents, you really should make sure the documents are typeset the same. I moved the Open XML spec down from its extravagent 11pt body font and large heading spacing to follow the ISO standard 10pt.

Viola, I estimate that about 1,000 pages can be reduced by this. (Added: I estimate this because I tried it. I saved 800 pages on part 4 alone just by moving to 10pt and more typical ISO clause spacing. Technically, this is because there is so much display content and two-line paragraphs that get pushed to the next page, cascading with many paragraphs taking one line fewer.)

spec-speed3.jpg

Difficulty of Review

The diagram uses page size as a unit of preparation and review. However, not all pages are equal. A page that contains normative text requires much more review than a page of informative text. A page that contains auto-generated text requires almost no review at all: you sample enough instances to have confidence in the autogeneration and then skip the rest.

Now this is especially relevant for DIS 29500, because it contains enormous amounts of non-normative/tutorial text and of autogenerated boilerplate. ODF editor Patrick Durusau this week tried a an experiment where he removed this fluff, and he reduced the WorkdprocessingML specification from about 1880 pages to about 600 pages (and he thought it could go a few hundred pages more!) Most standards avoid tutorial and non-normative material because it increases the tedium of the review process and confuses readers. A good tutorial is usually a bad standard, and vice versa. DIS 29500 is a really extreme example of this.

So lets say that only a quarter of the text is normative and non-autogenerated (based on Patriclk’s results, and considering the impact of the normative Part 3 and so on, And that the non-normative text and autogenerated text takes about 1/3 of the review effort. That means that, effectively for review purposes, the document requires only half the effort for the number of pages.

So divide the effective page size in half. (The legend “Number of pages” becomes “Review effort expressed in terms of equivalent number of normative pages”)

spec-speed4.jpg

Time spent in Review

Now lets look at the other axis. Wier’s numbers here seem to be based on the time spent in committee before coming up for a vote. That might be interesting a year ago, but it is positively misleading a year later. Why is it still being bandied about like this?

In the case of ISO fast track standards, there is the whole review process by ISO that is omitted: the informal discussions with SC34 before submission, the 1 month administrative review period, the 1 month contradictions response period, the 5 month technical review period just coming to an end, and the ongoing review where each national body looks at each other’s comments over the next five and a half months before the Ballot Resolution Meeting in Geneva, which I expect to happen. That is a full year.

So add an extra 370 days there.

spec-speed5.jpg

Nature of Review

The work that a committee does in compiling or creating a standard for a pre-existing technology is very different from what the work that a committee does in creating or augmenting a standard. When the proprietary Torx screws became an ISO standard, one can imagine that the committee had little to do. By contrast, the committee that produced the ISO PDF/X standard had a bit more to do, but still no where near what they would have to do if they were developing a standard fro scratch.

The work is review and discussions of policy, relieved of what-ifs and who-needs-this? As a completely conservative estimate, lets say that development of new material takes half the time, and review takes half the time.

Since we are measuring this in pages, lets be conservative and say that this relieves the committee process of 25% of its workload, and express that in effective pages.

spec-speed6.jpg

Since we are looking at the workload of a committee, what about where a committee doesn’t have to author much, but is presented with a selection of workable drafts from the pre-existing documentation of a product? That is obviously a lot less work than writing for scratch, especially for the editor.

So lets say that this makes a committee 25% more effective, and express it in effective pages as before.

spec-speed7.jpg

The other standards

Now, of course, to compare apples with apples, we would have to do the same procedure to the other standards, and they would move in the same kind of direction to a greater or lesser extent. But none of their shifts would be anywhere near as much as Open XML’s because it has the quintuple whammy of typesetting, fluff, the BRM, the lack of need of development, and pre-existing editorial material.

Furthermore, these other standards are not standing still. ODF has moved to ODF 1.1 with 1.2 in with works.

I have two other additional reasons why I think the diagram (or, at least, the way it is used) is misleading.

Ex nihilo?

The first reason is related to the last segments above. It is really not fair to compare a markup language for an old technology with a markup language for a new technology merely on the basis of the committee time. Microsoft moved into documenting text formats for its standards when it purchased RTF from DEC around 1990. A lot of the documentation in Open XML is adapted directly from the RTF and DOC documentation. Its basic strengths and weaknesses are well-known and long documented.

There have been perhaps fifty different versions of the .DOC format, on six different operating systems over the last twenty of more years. To ignore this history and just use committee time as the metric seems to me to miss out something important. A new standard does not come with all this prior work (and baggage).

I am not sure how to diagram this. Perhaps a line indicating the time the technology and documentation was in development before the start of the committee process? Lets date that from the advent of RTF rather than from the first .DOC format.

spec-speed8.jpg

VML is a particular issue here: it was introduced into IE 5.5 and presented to the W3C committee. To ignore that early development and attempted standardization work seems to miss something important, again which is why I think we have have to be careful not to be mislead by the diagram.

Separate Technologies

Finally, my other problem with the diagram is that people use it to say “this is so big it cannot be reviewed”. However, Open XML is made from five or more completely distinct sublanguages: OPC, WordprocessingML, SpreadsheetML, PresentationML, DrawingML, VML, and then the extensions mechanism of Part 5. One person iis not expected to review a whole standard, it is done in co-operation with a committee. India is a good example here: they had separate task forces working on each of the three major application schemas.

So while the size of the draft in total is large, it can be decomposed into smaller sections and reviewed. There have been over 2200 people involved in national standards bodies reviews, I am told: that is a lot. If I was being as free with numbers as some people are, I would say that this represents about three pages per person! But of course, that would be just as flawed logic as accepting Rob’s diagram at face value.

So lets divide up the specification into its parts, and see where they fit on the chart. I’ll take into account the extra time for review, but just use the current raw page count for OPC (part 2), and the individual languages of Part 4 and 5. We get a diagram showing the size of each distinct (and therefore separately reviewable) sublanguage in page size of the current draft.

(If you select “View Image” or the equivalent in your browser, you will be able to see this a bit more clearly: the OReilly formatting system may get in the way here.)

spec-speed8.jpg

And finally, lets have a look at what happens when we look at these separate languages, but get rid of the fluff as I suggested in the submission I sent to my national body for their consideration on the Australian vote. For WordprocessingML we will use the number that Patrick Durusau found when he stripped out the fluff: about 800. For the other largest four, we will just say that half is fluff, being conservative. (Actually, in my submission I want to remove some lists of examples such as border art to another part, but border art is hardly taxing on the reader.)

So this is a diagram of the estimate page count of normative pages in the component language standards of Open XML, against the time spent in Ecma and ISO development and review (and assuming a Ballot Resolution Meeting).

spec-speed11.jpg

Note that this diagram does not include the “effective size” considerations above, so the position of the new items can be compared directly with the other pieces of data on the page, as apples to apples. To the extent that the other issues raised above apply to each language, their star would move left (and up); however, for a good comparison the other standards mentioned would also have to have their position adjusted in accordance to the same factors: however, as I mentioned, because the other technologies consist largely of normative material, the adjustment would not be as great; the other technologies might also need to have ISO process time added too, I don’t know whether Rob’s numbers include that or not (the effect would be add six to twelve months in an upward direction to some of the blue points.)

Bottom Line

So that is seven reasons why I think the diagram is misleading. Or, at least, why the diagram itself does not give data that is particularly useful for anything other than mindless sloganeering.

What I don’t understand is why people are not on to these kind of tricks. Big standard, ooh scary. Have people never heard of Adam Smith and the division of labour? Have people never changed font size and had a different sized document as a result? Do people think that all text is equally taxing for review? Do people think that adapting a standard from pre-existing text is not easier than writing (and indeed) developing the standard from scratch? I suspect that many people see that on the original graph the OOXML point lies so far to the right, and because pages are easily countable, they don’t have any alarm bells ring.

So let me ring your bell, if I may: what the original diagram tells us is that the standard has a lot of text. And that one stage of its life in a committee took about a year in 2006. both those things are such a partial piece of the picture (where is 2007?) that while they are of some sensational value, the diagram can be misleading.

M. David Peterson

AddThis Social Bookmark Button

Fedora Commons - About - News

Fedora Commons today announced the award of a four year, $4.9M grant from the Gordon and Betty Moore Foundation to develop the organizational and technical frameworks necessary to effect revolutionary change in how scientists, scholars, museums, libraries, and educators collaborate to produce, share, and preserve their digital intellectual creations. Fedora Commons is a new non-profit organization that will continue the mission of the Fedora Project, the successful open-source software collaboration between Cornell University and the University of Virginia. The Fedora Project evolved from the Flexible Extensible Digital Object Repository Architecture (Fedora) developed by researchers at Cornell Computing and Information Science.

Nice! Congratulations, Fedora Commons!

The press release continues,

Rick Jelliffe

AddThis Social Bookmark Button

Vote “No”? But aren’t I supposed to be Microsoft’s biggest fanboy? Well, what I mean is a conditional approval, not a rejection. There are some things that can be fixed and should be fixed, and an ISO Ballot Resolution Meeting is the best forum to make sure it happens.

I’ve been quite active in the debate on adopting Office Open XML as a standard,* and this blog has frittered away many bits on explaining why (because it would be useful in my industry, which is industrial publishing and markup, and we have been demanding it for a long time) and why many of the specific reasons given against OOXML are flimsy (how many self-assured people have raised “autoSpaceLikeWord95″ who have no idea what a fullwidth character is, for example?) But not all was plain sailing: along the way I have pointed out several flaws that I thought needed to be corrected. A mild diversion has been to look at the various claims of bribery or faulty procedure bandied about.

On my travels, when I have been asked about how National Bodies should vote, I have always said that there is nothing wrong with a “No with Comments” vote, if the comments were doable. Indeed, this is exactly the vote that I have recommended to my national body, Standards Australia.

The actual list of comments I sent is here. Please note that these are just one person’s comments, not the official position. I have no idea how Standards Australia will vote, but I strongly urge them to vote “No with Comments”, specifically with my comments. I have tried in the comments to address many of the issue that people raised, and to limit the comments to issues that are relevant to Australia (which Standards Australia is quite keen on.)

Now when reading these comments, please realize that the intent is to state the technical and editorial position as clearly as possible. (When I say something is unacceptable, that is only in the context of the suggested fix to make i acceptable, not any claim that something cannot be fixed by the normal BRM process,) The whole point of these comments are that IMHO the big flaws in the standards are fixable (and fixable by the current processes) and that the edge-cases are not critical and can be left to maintenance.

In my comments I have attempted to expose the principles behind the comment, and to limit them to comments relevant to Australian industry. I definitely concentrate on getting the high-level issues right: the name of the standard, the organization of it, the conformance section, the over-abundance of non-normative text, the need to allow standard notations, and a future-proofing issue. My view is that getting these high-level issues right takes the sting out of the tail of many individual problems and edge-cases, and addresses many of the technical issues that people have raised piecemeal,.

Rick Jelliffe

AddThis Social Bookmark Button

The deadline for the National Bodies to vote on DIS 29500 Office Open XML (fast-tracked from Ecma 376) is coming up on September 2: this is the vote that comes at the end of the 5 month review, which is 7 months since the draft was submitted, and probably not the end of the line. Here is some of the news, as a companion to my next blog item, which is about what I recommended to Standards Australia for our vote.

There are not many informed ideas of where the votes will go. The US body looks like it will vote Yes (I predicted an abstention there) as seemingly will Germany (I would have predicted a “No”). India looks to be voting “No with Comments”, which I am pretty happy about since that I commended that vote to them when I was there.(Note that that news story gets it wrong about the impact of a “No” vote—a simple”No” and “No with Comments” are utterly different beasts, as Jon Bosak has prudently pointed out. (But even a “No with Comments” may not, in effect, be a conditional yes if the comments are impossible to fulfill: obviously no National Body will put in merely vexatious comments, they don’t want to wast time, however, they will state their requirements in a clear way that allows rapid resolution of issues.)

It seems likely that there will be a Ballot Resolution Meeting: if there are not enough “Yes” votes and enough “No with Comments” (i.e. ‘conditional yes’) votes, then a meeting is scheduled to see what technical changes need to be made to satisfy enough national bodies’ requirements. A meeting has been scheduled for Feb 25-19 in Geneva, with UK’s Alex Brown appointed as convenor.

There is a last minute frenzy on the contra side: IBM’s spokeman is claiming lots of alarming shenanigans without actually giving us the benefit of any details: names of countries, parties, dates, anything tangible. Stephane Rodriguez is complaining that he has to look at the schema and documentation when editing Open XML files; his new blog is notable for the number of times it says that there is a problem with OOXML but actually refers to the some implementation issue in Office 2007

Behind the scenes, Patrick Durusau (the ISO ODF editor) has been working on a really interesting and useful project. While he is not keen that people use ISO Open XML, he is keen that the quality of ISO standards should be maintained and he sees OOXML as a way to get MS’ technical requirements on the table to help future ODF improvement (whether by cherry-picking, mix-n-match or knowing what to avoid.) I suggested to him a time ago that one approach to fixing DIS 29500 would be radical surgery: removing all the explanatory and non-normative material. At the moment it is far too tutorial. That is fine for the Ecma version, but gets in the way of an ISO-quality standard. I had also suggested that the schema fragments were otiose too, and that the 11pt body text should be 10 pt. . So Patrick has gone ahead and stripped out the fluff from the WordprocessingML chapter and with tighter formatting he was able to go from 1874 pages to 607 pages without altering the technical content!

Knowing Patrick, I expect he would not release his Open-XML-Lite because having extra drafts floating about in public just provides more fodder for the lunatic fringe, however he sent me a copy and I think it is great. It is a real proof of concept that there is indeed a workable spec lurking inside DIS 29500. I would really urge National Bodies to include comments that request or require that the non-normative material in DIS 29500 be removed. That will make maintenance, editing and use much clearer. The Ecma TC45 got it wrong here; or, at least, they went with the “friendly” view of a standard, where it is best that a large ISO standard avoids being too tutorial.

I am hoping that once the vote is over, the PR considerations of the big boys will take a back seat: with a couple of “Yes” votes from some large countries, MS has its marketing material to say that Open XML is credible; with a couple of “No” votes IBM has its marketing material to say that ODF is the way forward; and with enough good comments and a sharply-run at a Ballot Resolution Meeting, the baying mobs will lose interest; and the nerds can get down to improving the shortcomings exposed in both OOXML and ODF., If things go as the process is geared to make them go, IS 29500 OOXML and IS 26300 ODF should ultimately provide a really useful pair of technologies.

Kurt Cagle

AddThis Social Bookmark Button

It was perhaps inevitable - having turned the geospatial Earth into an animated, zoomable extravaganza, Google has turned its gaze skyward. With Google Sky, the tens of thousands of Hubble based images (as well as those of more prosaic Earth-bound telescopes) have been knitted into a seamless fabric that lets you explore the universe in myriads of ways - from zooming in on the Pinwheel nebula to charting the luminescent clouds of the Eagle hatchery.
Rick Jelliffe

AddThis Social Bookmark Button

This decade has seen a tectonic shift in technology: the new information applications which are succeeding are those in which information is based on simple topics; the new document major document formats are those which allow the packaging of a topic.

The organization of information in to simple interlinked topics, typically something that can be described in a single phrase, is the common factor between such seeming disparate but succeeding technologies as the web-based Wikipedia, Amazon, Google, Ebay, Flickr, MySpace, YouTube, blogs, RSS, but also has had strong impact in non-WWW areas: the ITIL Configuration Item, the SCORM Learning Object, the S1000D Descriptive Module, integrated UML systems, for example.

The difference from the WWW in general is that though web technologies indeed encourage small pages, their is no necessity that pages are about one topic in particular. So the WWW is an excellent basis for implementing topic-based systems, but not itself one. Similarly, RDF may allow resources to be linked, but these are not necessarily at the level of topics. Another way of looking at topics is that a lack of topicality is what makes an poor index item poor.

There has been a decade long process at ISO SC34 to make and develop a series of standards based on topics, for example the Topic Map standard, IS 13250. This is good technology (like Xlink and RDF) to look at when considering how to implement a topic-based system.

The rise of Topics represents a great challenge to operating system and desktop suite vendors. When we look at Windows, or Mac or Linux window managers, we see that they really interact with the user at the wrong level. They say that the topic the user is interested in is applications and files. But how many people nowadays start their computer interaction with a web browser pointed to Google? There are still people whose organizing topic of interest in their computer interaction is the file or application, of course, but they have been swamped by people who are interested in the topic.

There are interfaces which organizes the user with different topics: most notably the Sugar interface of the One Laptop Per Child ($100 computers) in which the primary metaphors are the person (and their private activities and journal), the neighborhood, and the group (and group activities and bulletin board.) The interaction topics are “people, places, objects, actions”. But as with the desktop, these are not topics in general, just the topics of one domain (a fairly compelling domain, that of children and communities).

Indeed, we can see the large successful web applications as being topic-based interfaces each for particular domains and scopes. A lot of the Web 2.0 or Social Interface systems talk focuses on the human or social or write-able web aspects; my question is this: should we think of Topics as the “how” and the social aspect as the “why”, or should we think of the Topics the “why” and the social aspects as the “how”?

Moreover, should Linux, Windows, Mac and all seriously respond to the rise of Topical Interfaces by ditching the desktop metaphor? I tend to think yes: in terms of my supprt/runner/plug-in model topic interfaces belong at the “suite” level, and a desktop interface is just another suite.

One reason I found (and still do find) the Windows desktop so cumbersome to use compared to the a UNIX shell or the old Mac desktop was that it never seemed to provide me with the topics I was interested in. When the topic was “Installed programs” it lets me look at a menu from the start button, but not all programs are there; I have to switch to a completely different system, the file explorer, and look in Program Files and figure out from the files and directories what applications are there. We have to fight with the army we have, not the army we want, but we won’t win unless we have the army we need.

Topical Interfaces have eclipsed the Desktop Interface and are severely challenging the central position of the file., because increasingly the value of some information is in its linked-in-ness to some larger system. From this point of view, the recent trend (JAR, WAR, EAR, ODF, Open XML, SCORM, etc) to use ZIP and therefore package together all the files needed for one application session can be seen as an attempt to turn documents themselves in to a container for a bounded topic. OOXML’s Open Packaging Convention (OPC) represents the high-point (though not the state of the art, for which see RDF and ISO Topic Maps) in this trend, adding a linking and typing mechanism (relationships) within the ZIP package, However, the moves to make a platform out of the office suite and out of the Web browser (and the various Java Rich Client Platforms such as Eclipse, NetBeans, and so on) fall short of providing the integrated, topic-based interfaces.

The two worlds need to converge: we need Topical Interfaces which lets us navigate between and within topics and perform transactions, but which also allow each Topic can be bundled and shipped around as a document.

Kurt Cagle

AddThis Social Bookmark Button

I don’t normally like using this column for promoting my other projects, but I’m weighing this against the fact that I actually have some interesting news to pass on. Thus, my apologies for the self-aggrandizements - I think you may find it worth it.

First, I have recently significantly upgraded the XForms.org portal. While I still support the forum, the role of the portal has expanded to become a general resource for anyone working within the XSLT, XForms, or XQuery space, and I’m expanding this into the Semantic Web realm as well. From XForms.org, you can find relevant blogs from the web, news articles, job listings, and linked resources, and I shall soon be adding calendar listing s of conferences and other events. I’ve also simplified the interface, such that commonly requested features such as the most recent aggregate blogs are available with one click in a simple interface, and specialized listings are no more than two clicks away.

Uche Ogbuji

AddThis Social Bookmark Button

I’ve heard it 1000 times since ‘97. “XML, it’s just plumbing”. Maybe, but it hasn’t really felt that way in past years. Too much was still unsettled, and and there were too many people who were not interested in letting things settle (including me). On the rebound from 2 very enjoyable XML conferences, XML Prague and Extreme Markup Languages, it does finally feel to me that the era of absent-minded XML pipe-laying is upon us. I think that’s a good thing, especially now that XML is well enough established that few people choose to build edifices without it. This does mean that we have established what I’ve always characterized as a basic writing system for data integration, and now the really fun stuff can begin as the philosophers and politicians work on libraries to suit their schools (yeah, I know I’m starting to pile up the metaphors, and why not?)

Mike Hendrickson

AddThis Social Bookmark Button

Boston and Cambridge

Ignitebostonlogo

Summer is flying by and as we usher in fall, we wanted to give all New Englanders a heads-up that we are having a second Ignite Boston. The second Ignite Boston will take place on Thursday, September 6, from 6 to 10pm at Hurricane O’Reillys. Yes that is right, Hurricane O’Reillys. No, it’s not Tim’s office after FOO Camp. We’ve picked a venue that is more acoustically-oriented and should allow everyone to hear what is going on.

And we are planning to mix-up the format a little bit. There will be some short “launches,” followed by lightening talks, and a couple of other ideas that we will inform you of in the coming weeks. Let’s show our tech colleagues around the country that Boston/Cambridge have a vibrant tech community that gets involved in talking about cool new technologies and ideas. Not to mention that it is a social event to get to know other developers in the area.

If you plan to attend, email IgniteBoston at oreilly dot com for the chance to win $300 worth of O’Reilly books of your choosing. You must be present to win.

If you are interested in connecting with some of the folks who attended the first Ignite Boston, we have a social network set up for this purpose. You can reach our Crowdvine network here.

Another reason we wanted to announce this event this early, is so those of you who would like speak for five minutes on something cool, new, or exciting you can get into the queue sooner rather than later. Please submit your idea/s here:

Presentation Guidelines

  • Be no longer than 5 minutes.
  • Be on an innovative topic (no sales pitches, please!).
  • Be viewable on a PC [a MacBook Pro with Powerpoint, Keynote/has remote control, and PDF] with standard AV equipment.

To submit a proposal.

For anyone that’s never been to Ignite, you may find it useful to see a talk or two. Here’s a link to a good example [but poor audio quality] from the first Ignite Boston talks.


Technorati Tags: , , , , , ,

M. David Peterson

AddThis Social Bookmark Button

Update: *EXCELLENT* follow-up post from Wladimir in which he closes with the following,

I guess I need to thank Danny for so many great articles in such a short time. On the other hand, maybe instead I should remind him that denial-of-service attacks are illegal, even in the USA.

I’ll let you come to your own conclusions as to what that last sentence is referring to, though I will point out the fact that no matter who you are or what you believe justifies your actions, while blocking ads is not a crime, DOS attacks and other forms of Internet harrasment and vandalism most certainly are.

If you are guilty of any such crimes, please don’t turn yourself into the authorities (our prisons are filled with too many people who shouldn’t be there in the first place), but please stop, think, and then find ways to get over whatever it is you are hung up on in a peaceful manner.

Thanks! Our Internet will be a better place if you are willing to consider the above request.

Update: Wladimir Palant, the *WONDERFUL* developer behind the *WONDERFUL* tool AdBlock Plus recently left the following comment that I thought the rest of you would find interesting,

Thank you for this article, it is real fun to read it. Btw, the numbers you were asking about - I don’t have exact numbers either but it seems that no more than 2% of Firefox users have Adblock Plus installed. Which makes this campaign as ridiculous as ever.

Of course one can only assume that after all of this attention, the number of AdBlock Plus users have increased, but not so much as to drastically change the above percentage to the point where any of the legitimate sites on the net in which use ad revenue as their primary support are going to be noticeably effected. In fact if you think about it, it’s quite possible that, while ever-so-slightly, the reduced cost in bandwidth savings from those who have no interest in the ads being displayed will *more* that offset any potential loss in ad revenue.

In fact, if you *really* think about it, if all of the people in which had no desire nor willingness to click on the ads presented on your site were to install AdBlock Plus there’s an ever-so-slighter (is slighter a word? Probably not, but today let’s make it an honorary word just for fun ;-) possibility that the net result will be that of increasing your cash flow instead of decreasing it.

Okay, maybe thats a bit of stretch, but if nothing else it’s definitely something to consider. Of course if it turns out this theory were to actually hold any water you would have none other than Wladimir Palant to thank for your decreased cost structure and therefore increase in monthly revenue. And according to the following forum entry from about this time last year (which was in response to a question regarding Wladimir’s preferred charity), here’s how you can thank him for your new found cash cow, ;-)

I don’t favor any organization, feel free to choose the one you like

Edit: On the other hand… I do favor one organization: http://www.mozilla.org/foundation/donate.html

Seems reasonable to me. :D

Thanks, Wladimir!

Update: NOTE: For those of you who first read this update at the top of my last post, here it is again but this time at the top of the correct post! ;-)


I *LOVE* this comment from an article linked to from Yours Truly (a handle, not a self reference ;-),

Upon clicking the link to http://whyfirefoxisblocked.com/ I was met with a blank page. Interesting, I thought to myself. Let’s check this out in more detail… I bet they want me to wipe the dust off my Internet Explorer and access their site that way. Admit defeat? Go back to using Internet Explorer? Hardly. I simply opened a new tab in Firefox and went to Google. In the Google search field I entered the search term: site:whyfirefoxisblocked.com and then loaded the conveniently offered “cached” version of the page in question. It loaded smoothly in my AdBlockPlus-enabled copy of Firefox.

Absolutely *CLASSIC*! :D Thanks for the laugh, Yours Truly! Of course the real test would be to do the same for the site that you would have been redirected from, but two things,

1) Why waste any more of your valuable time.
2) The spirit of your hack is most certainly in place, which leads to one very important observation,

As mentioned already: Don’t Fight the Internet! There’s fame (the good kind) and fortune and good times for all in whom find ways to embrace the way the web *truly* works, not the way you think it should work. And if anything this is the point of the entire post.

Update: Based on the evidence that has been mounting up in my inbox and in comments I’ve done a quick research project and have come to the same obvious conclusion that everyone else has: That the content that follows that now has a strike through is more than likely a completely bogus attempt at justification. My apologies to each of you that were simply following Digg, Slashdot, Reddit, and other links for proliferating the garbage that is being fed from this guy.

Oh, and Danny, (AKA Jack Lewis),

You know what, nevermind. Why even waste any more of my time.

No wait, I’m sorry, I do have something else to say: You are not a victim of terrorism. You’re a victim of yourself.

Best of luck to you.

Oh, and one other thing: If you are bothered by the ads on this or any other site and would rather read this or any other *FREE* content without being bothered by ads you find annoying: I’ve heard that Ad Block Plus is pretty good. Of course you’ll need Firefox if you don’t already have it, but if you’re interested in my opinion, Firefox is as good as a browser gets.

In fact, maybe even better.

Enjoy your ad free Firefox browsing days, everyone! The content here on O’ReillyNet is free to read however you might choose in whatever browser you might choose. If you choose to reprint it (beyond that which can be considered fair use) please do so under the terms of the Creative Commons by-nc-sa. Otherwise, do what you want. That’s your right.

And as always, thanks for reading! :D

Update: via a comment from Danny Carlton,

It’s my site, and if i want to control how people view it, I’m not letting a bunch of terrorists force me into changing that–and when you attempt to change someone’s behavior by threat of harm, you are a terrorist. The vile, obscene emails and phone calls, they attempts to shut down my server with DOS attacks and bandwidth eating programs, are all acts of terrorism, and it’s really interesting how many people who seem to get offended at being called “thieves” have no problems acting like terrorists.

Folks, I don’t care who you are or what it is you think you’re accomplishing, as far as I’m concerned anyone who involves themselves in this type of activity is absolutely as Danny specifies,

A criminal.

That’s absolutely shameful to do that kind of crap. You mind not be a criminal for blocking ads placed in the content you read, but you’re certainly a criminal if you take part in any of the crimes mentioned above.

Whoever is involved with the above: STOP!

It’s not funny. It’s not cool. And it certainly isn’t justified. It’s stupid. It’s illegal. And it needs to stop.

[Original Post]

Don’t fight the Internet! I promise, you’ll lose.

Why FireFox is Blocked

The Mozilla Foundation and its Commercial arm, the Mozilla Corporation, has allowed and endorsed Ad Block Plus, a plug-in that blocks advertisement on web sites and also prevents site owners from blocking people using it. Software that blocks all advertisement is an infringement of the rights of web site owners and developers. Numerous web sites exist in order to provide quality content in exchange for displaying ads. Accessing the content while blocking the ads, therefore would be no less than stealing. Millions of hard working people are being robbed of their time and effort by this type of software. Many site owners therefore install scripts that prevent people using ad blocking software from accessing their site. That is their right as the site owner to insist that the use of their resources accompanies the presence of the ads.

Here’s the thing: If people are going out of their way to block ads via Ad Block Plus do you honestly believe they represent a significant percentage of the +/-2.5% of the people who actually ever click on web ads in the first place? Wait, hold up, I think you answer your own question in the next paragraph down, but first let me take a quick moment to point something out,

M. David Peterson

AddThis Social Bookmark Button

Don’t you just love Jeffrey Zeldman? I know I do for the simple fact that he has no problem saying it like it is and in many cases he’s right on the money,

Jeffrey Zeldman Presents : What crisis?

The glacial pace of the W3C has given browser makers time to understand and more correctly implement existing standards. It has also given designers and developers time to understand, fall in love with, and add new abilities to existing standards.

So the glacial pace can’t be the crisis. Maybe the problem is lack of leadership. One worries about the declining relevance of The Web Standards Project. (Note the capital “T” in “The”–people who believe in standards should also believe in and follow style guides.) One has worried about the declining relevance of The Web Standards Project since 2002.

Nicely stated! Of course, just a paragraph or two above Jeffrey asks the question,

M. David Peterson

AddThis Social Bookmark Button

As per a comment I made to a post from Eric Larson to the internal Vibe* mailing list regarding the usage of Mercurial instead of Subversion for our RCS,

Of course maybe someone will come along and create a BitTorrent-based Darcs or Mercurial plug-in. Now *THAT* would be cool! :D

My point was in relation to the fact that with a decentralized RCS (which in most cases creates an exact copy of the repository with each checkout), as the size of the repository increases so does the cost of hosting that repository with each new checkout. But if a BitTorrent plugin were to suddenly surface?

Like I said, “Now *THAT* would be cool! :D”

Anybody care to become the *WORLDS BIGGEST ROCKSTAR CODER*? This would certainly be one way of becoming just that. :D

M. David Peterson

AddThis Social Bookmark Button

Dare Obasanjo aka Carnage4Life - Google Working on Social Network Aggregator

What I find more interesting is being able to bridge these communities instead of worrying about the 1% of users who hop from community to community like crack addled humming birds skipping from flower to flower.

Rick Jelliffe

AddThis Social Bookmark Button

Schematron is an ISO standard (ISO/IEC IS 19757-3) schema language for expressing assertions about the presence or absence of patterns in a document, usually using XPath. ISO standards are supposed to contain verifiable statements about some technology. And there is an schema for ISO standards (refer to How to write your own ISO Standard. So why not combine them? Executable specifications may provide the best form of verifiability!

I’ve made a little stylesheet that converts Schematron schemas into ISO Standard annexes. Each pattern becomes a separate clause, and assertions are treated as constraints and report statements are treated as errors that must be reported. The stylesheet handles abstract rules and abstract patterns (though these are starting to go into XPath territory and so are borderline ugly), and the @see attribute. Phases are treated as conformance profiles. Diagnostics are stripped out, they might perhaps have some use in application standards rather than document standards.

As well as its assertions, Schematron allows quite a bit of rich text and titles. The stylesheet handles bullet and numbered lists, most kinds of inline styling. The output is validated against eh RELAX NG Compact schema from the draft TR that I was using. (I had to clean up numbered lists a little: the drft stylesheet provided its own autonumbering when using <ol>.)

So is this a serious idea? Actually, yes. Schematron was developed with the human aspect of schemas as a very high priority, unlike any other schema language that I am aware of. By design, it is intended to be useful for generating documentation suitable for domain experts rather than XPath developers. (I am working on a commercial product that provides this as part of a collaborative schema development environment; the betas look good.)

So I hope that as more organizations take up Schematron to specify part or all of their standards, they will adopt this kind of approach, so that they end up with standards with no gaps between what is required and what is validatable. Note that you can still make Schematron assertions even when there is no XPath to check it: so Schematron does not back you into the corner that other schema languages do, where you have no high level constructs to document constraints beyond the capability of the validation expression language: refer to Expressing untested and untestable constraints in Schematron.

The stylesheet and an example

Schematron Validation Reporting Language is a small language specified as part of ISO Schematron for representing the output of a validation,. It can then be transformed into lots of other uses.

First: here is the Schematron schema for SVRL, unchanged from the ISO standard except I added three IDs that were missing (the XSLT expects patterns to have IDs): Download file

Next, here is the XSLT script: Download file

Here is the output from the script, using the SC34 schema: Download file

And , here is that output then converted to HTML, using the draft previewing script from ISO. (The SourceForge project has an XSL-FO generator): Download file

As a bonus, here is a blank XSLT template with all the Schematron elements exposed, for anyone who wants to make their own complex pretty-printer/transformer for Schematron schemas:
Download file

The annex generated is, I think, pretty acceptable as a draft standard, especially since the schema was written as a real schema and not as text in a standard per se. Obviously some things can be improved, such as being consistent with ’should’ and ‘is’, but I think this is a viable, useful and efficient approach to improving the quality of standards for XML vocabularies and document types.

Uche Ogbuji

AddThis Social Bookmark Button

One reason I’m looking forward to Leopard is that unfortunately I’m a victim of the bug where my MacBook Pro 17″ occasionally reboots when I close the lid. Most of the time things are OK, but once a month or so I close the lid and I hear the “bong” chime of the computer restarting. When I open it back up (either right away or after a while) it starts back up as if I’d powered it on. Needless to say I lose any unsaved work, which has caused me to be even more annoyed at software that does no auto-save such as TextMate. It seems to happen in clusters, a few times in a few days, then fine again for another few weeks or so. Anyway here’s hoping Apple has a handle on this one either in Leopard, or in the hardware update to the MBP line that came out a couple of months ago. I’m provisionally happy enough with mine that I’m irrationally eyeing the 1920×1600 and 4GB RAM options in the latest (though the high res is apparently not available with the glossy screen. What’s up with that?).

Anyway, other references to the closed lid reboot bug:

* MacBook restarts when closing the lid
* MacBook Restarts when put to sleep

Update: s/Tiger/Leopard/g. Can’t keep the big cats straight.

Rick Jelliffe

AddThis Social Bookmark Button

You too can write your own ISO standard! Here are the steps:

1) Download the ISO/IEC Directives Part 2 Rules for the structure and drafting of International Standards. These give the general editorial guidelines. Read it all.

2) Download the documentation for the XML schema for ISO Standards, which is in Technical Report 9357-11. A good draft is available from SC34 Website. Read it all.

3) Download the Open Source schemas and stylesheets are available at SourceForge and embody a lot of the rules of the ISO/IEC Directives Part 2. They have been contributed to over the years by such people as Murata Makoto, Martin Byran, Ken Holman and James Clark and used in many standard: I used them for ISO Schematron for example. (If you want to use Word templates or whatever, these are available from ISO, but this is an XML list so it doesn’t deal with that.) Install and configure your production environment to use them.

4) Try to follow these writing guidelines:

  • When writing, think about clarity. A good rule of thumb is “Will this sentence be easily translatable into a language that does not have the words “the”, “a” and “it” or which does not have the future or past tense available?” and “Can a recent graduate understand this?” Note in particular that you must use “shall”, “should”, “must” in very particular ways, that you need to use the definitions section as much as possible, that you need to clearly distinguish normative text from informative text (which is not the same as required and optional/discretionary, and different again from the legal “Required Parts”), you need to be clear about different levels of conformance, and that you need to be careful with normative and non-normative references (see the Directives!)
  • Download any other standards in a similar domain, and try to re-use the phrasing and declarations from them. When writing, try to use the standard vocabulary that ISO suggests in standards such as IS 2382. If you use terminology that differs from these, make sure it is in your definitions section. Note that there are some trick words that have specialized meanings: so “define” is what you do, but “declare” is how you do it (loosely).
  • A standard should only contain verifiable statements. That rules out most adjectives, unless they are defined, and is why standards tend to have Germanic agglomerations of nouns. Where possible, try to specify the requirement in an executable form, such as a schema language, then use the text to fill in the gaps. Where possible, try to specify the requirement using a formalism, such as predicate logic or BNF or UML, especially if there is an unambiguous notation or a standard for these. Where possible use diagrams, however only use them if there is a common or standard diagraming type for which a reference is available.
  • When writing, avoid dependencies on other standards. Reference the most general version of other standards possible. Unless there is a good reason, allow the other standards to be maintained without this then making your standard outdated. Avoid specifying or summarizing other standards: completely in normative text, and as little as possible in informative text unless the other standard is not freely available.

5) Write your draft

6) Track down IP issues to the best of your ability. Also, try to have reviewed it for Internationalization, Security and Accessibility issues: the more that these are designed in from the beginning, the smoother things will be downstream. Most importantly, you need to show that there is some market (users) for this standard, that it is not some crackpot technology. One important thing that will influence reviewers is whether there is developer buy-in: is there an open-source implementation, is there some company willing to produce products that use the specification, and so on. If you want commercial buy-in, think about the carrots (an economic case why it would benefit vendors) and sticks (getting regulators or procurement departments to require it.)

7) Decide whether it should be an ISO/IEC International Standard, an ISO/IEC Internation Standard through fast-track, a Publicly Available Specification, an ISO/IEC Technical Report, a National Standard, a Consortium Standard, or just something on your own website. If you decide to take it through ISO you have to find or become a champion: you can go to your local national standards body and get them to propose it (or adopt it as a national standard first), you can find a friendly committee person on the relevant committee and get them to propose it from their Working Group, or you can find some boutique standards body that has liason with ISO (such as OASIS or W3C) and put it through their processes. You need to find an editor who is participating on the committees and can travel to enough meetings (See if your national body offers any travel subsidies; demand that the ISO working group use teleconferecing). You should expect that your draft may be substantially changed, especially if you have not written it according to stage 4). At this stage, remember that you are not alone: there will be other committee people and interested people around the world who can provide advice, only rarely crazy, and you cannot be too proprietorial: some parts of the standard will improve in your eyes, some parts will get worse in your eyes, but that it all OK because it becomes a collective effort. Especially remember that a really stupid comment from someone is undoubtedly a sign that your deathless prose is crap and needs to be fixed. Don’t take criticisms of the draft personally, and learn committee skills: how to challenge clearly, take the stated requirements of others seriously, and acquiesce gracefully—not understanding something or losing an argument does not involve a loss of face, but you have to give face when winning on an issue too. Don’t “play to win”; instead “play to win/win” (I am embarrased to write that!)

8) When a draft is produced, contact the various technical committees around the world to help answer questions. Actually, the ISO committee process itself provides a good forum for this; if you are fast-tracking you may need to do extra work to explain the draft.

9) Ask the committee to ask ISO to get the standard added to ISO’s free list. A standard that is not on the WWW is at a total disadvantage.

10) Assuming the vote on the Final Draft was “yes”, you now have your standard! Congratulations, that has only taken three years or so. Now you have to commit a little time over the next few years to maintain it and fix corrections that come up, and to try to get buy-in from the public. If you have a “grass-roots” standard like ISO DSDL (RELAX NG, Schematron etc) which do not fit into the plans of the military-industrial complex, then your expectations need to be modest and you need to think about how to encourage activity in the Open Source eco-system. Remember a good standard is one that meets its particular user’s needs, not one that takes over the world.

However, your name won’t be in the standard (unlike W3C or OASIS), or in the bibliographic entries. So don’t do it, or participate on committees, if you want to see your name on Amazon.

Rick Jelliffe

AddThis Social Bookmark Button

Over the last month I have been collecting examples for fun from the web where scuttlebutt on the websites of well-known commentators has claimed procedural or other irregularities at standards bodies or participants. I started this off on the luridly titled “Bribery Watch page, but it is more “Innuendo Watch.”

Here is a little map (drawn dynamically) with the countries mentioned in red.



Some of the claims have a French farce aspect. For example a mistranslation of “seat” and “chair” caused a great flurry.

However, one persistent theme is the idea that the industry people who actually want a standard should not participate in the standards process. Sometimes there seems to be some idea of neutrality floated, sometimes some idea that people who come late have less legitimate opinions than people who come early, othertimes that the process is flawed unless people are allowed late. But the basic idea is that if you agree with MS on anything or have had any business connection with them, they own you, perhaps even bribed you, and your every opinion is inappropriate. But never an acknowledgment that standards are community self-help efforts participated in, for the most part, by the parties who want to use the standard; and that the standards process is not a tool for cartelization.

Jim Alateras

AddThis Social Bookmark Button

James Snell has just published an article on developWorks, which illustrates how to use the Atom Publishing Protocol to publish Common Alert Protocol (CPA) alerts. CAP defines a XML data model for specifying hazardous alerts and notifications. The article uses the Apache Abdera implementation of APP to indicate how to publish, modify and delete CAP alert documents.

Rick Jelliffe

AddThis Social Bookmark Button

The licensing of IP for standards has four aspects: what the (case and statute) law says, what the standards bodies require, what the IP owner grants, and how the developer (adopter) is acting. Standards themselves never seem to have useful information about patent IP, and even their copyright boilerplate needs to be checked against licenses given by the copyright holder: W3C and ISO don’t like you copying their standards, Ecma does, for example.

law.gif

For an introduction to the legal aspects, see ConsortiumInfo.org, which is by a lawyer for OASIS. The Dell case is pertinent.

For an introduction to the standards body aspects, see Standards Law, which is by a lawyer for Microsoft. It has a reference to the ISO requirements. For the boutique standards bodies: OASIS, Ecma, W3C

For examples of the kind of grants that companies make see
Microsoft Open Specification Promise, IBM Open Source Portal, Sun’s OpenDocument Patent Statement. Adobe has not put their equivalent online if it has been finalized, as far as I can see. (Microsoft also has a “Covenant not to sue”, however this seems to have disappear from its website in a rearrangement of links. They need to get it put back online.)

So what does the user have to do with it? Some licenses provide particular conditions relating to private or not-for-sale use: the GNU licenses for example. Other times licenses are revoked if you try to sue the IP owner: these defensive patents are bargaining chips in legal wrangling.

One key term to understand is RAND: Reasonable and Non-Discriminatory Licensing. It is pretty much the bottom line for standards organizations. However, RAND licenses are controversial, and in the views of many of us, something that should be avoided by modern standards bodies in the age of Open Source and Free Software which, like standards, have strong counter-monopolistic and even communitarian aspects.

Another concept to understand is the Open Standard. Not all standards from standards organizations are Open Standards under anyone’s definition, especially older standards and standards which involve semi-scientific research and development (compression patents, for example) where the IP holder would only license a vital technology under RAND or not at all. (There is some creep on what an Open Standard is, to conflate it with Open Source or free implementations.)

And it should go without saying that someone cannot grant a license to IP they do not themselves hold. So all covenants and licenses only extend as far as the material in question. This is important for extensible formats such as ODF and Open XML, because the ZIP container allows any kind of media or binary file.

See the IBM material for a definition of Necessary Claims and Required Portions.

Uche Ogbuji

AddThis Social Bookmark Button

I’ll use this entry as an anchor for my observations on the final day of Extreme Markup Languages. I’ll update it with a note each time a new talk begins, but I’ll add my comments on the talk in the comments section. There is a numbering scheme for the talks, to correlate to comments.

If you happen to be reading this in an aggregator, much of the meat is in the comments, so you might want to click through.

D4.1. “Lessons from monitoring the hedge funds: Markup identifies and delineates. Does it give your position to the enemy?”, Walter Perry

D4.2. “Declarative specification of XML document fixup”, Henry S. Thompson

D4.3. “Topic maps, RDF, and mushroom lasagne”, C.M. Sperberg-McQueen

M. David Peterson

AddThis Social Bookmark Button

via a recent link sent to the Vibe* internal mailing list from Russ, it seems Universal is going all retro on us with plans to “test” the DRM-free digital media business. Interesting enough, as Russ points out,

… although not on iTunes strangely enough, could just be a case of catchup.

That or a political move in attempt to break the lock iTunes currently has on the digital market.

From the same BBC News article linked to above,

Rick Jelliffe

AddThis Social Bookmark Button

Just when I thought I had escaped, I had a request yesterday from Microsoft to join in a call with a journalist from ZDNET Asia about a blog An open document standard for China. Preparing for this gave me a good chance to review the use of Native Language Markup in Open XML: the area is quite arcane so it is a good topic for a blog (good because you probably won’t get the information elsewhere and good because your feedback can help if I have missed something.) I have included some asides and personal background, probably not even of interest to my mother, in small print that you can skip.


The Peter Junge blog basically warms up Rob Weir’s Swiss cheese (hmm, something wrong with that phrase): impossible in its thrust (a single file format that can cope with all cases?), alarmist in general (This kind of legacy is full of pitfalls for the open source developer.”), over-reaching in its analogies (see my Power plugs and low-hanging fruit), too strong in its conclusions (look at how “may” and “might” are used to say “will”) and misleading in its use of details (what has footnoteLayoutLikeWW8 (Emulate Word 6.x/95/97 Footnote Placement etc) to do with open-source developers in particular, especially since the spec gives the advice that “Typically, applications shall not perform this compatabiliity”? It is flag not a requirement for goodness sake.)

Native Language Markup

Native Language Markup is the use of names and symbols in markup of the users native language. This implies the use of the user’s native script (characters). It is different from “natural language” because names in markup may still have artificial limitations (such as no spaces or apostrophes) or use contracted forms that would not appear in natural language.


Native Language Markup was a term I developed in the early 1990s, when Allette Systems gave me a project to figure out why SGML was not popular in Asian countries. I came back with various items, and collected them into the ERCS (Extended Reference Concrete Syntax): these included things like allow native characters in tag names (SGML had large character set limitations for names then), hexadecimal numeric character references, the ability to reference any character by its Unicode number, and an initial set of the characters in Unicode that were suitable for use in markup. These were endorsed by a standards-related expert group, the CJK DOCP group, and when XML development started, were adopted into XML. This was recognized by a kind comment of Gavin Nicol in the 1999 Journal of Markup Theory and PracticeThe importance of native language markup, and the role the SGML declaration plays in an SGML system, are fairly well understood these days, partly due to the tireless efforts of Rick Jelliffe on the ERCS, and partly due to a lot of work done on HTML I18N (Internationalization)” Now, of course, I am not saying that I invented the idea that words you can read are more useful than words you cannot read! ERCS was a set of concrete technical proposals, and Native Language Markup is a name for the issue. Anyway, the bottom line is that this is a subject that I think is really important.

Native language markup has proved itself. Murata Makoto demonstrated at a conference last year how the Japanese government XML was using it, and China’s UOF format. It is not just an issue of translation: many languages have terms which do not have a satisfactory English equivalent. Nor is transliteration a useful approach: many languages require a romanization system with accents or tone marks to be useful. A technology that does not allow non-ASCII characters imposes a burden on non-ASCII users and limits the acceptance rates to the highly educated and foreign-literate.

However, native language markup becomes inappropriate whenever there is a cross-over between language groups. Most Australians stop learning new characters about the age of 5 or 6; Chinese language markup is not easy for us! So for international standards for fixed schemas, there is no practical alternative but adopting ASCII and English wordings.

ISO/IEC JTC1 SC34 has recognized this, so as part of the IS 19575 Document Schema Description Languages standard, there is a technology spearheaded by UK’s Martin Bryan called the Document Schema Renaming Language (DSRL or Dis-rule). This is a convenient language (Martin has an XSLT implementation) that allows conversion of the markup in documents (or schemas, potentially) to and from different languages (as well as other uses.) Non-ASCII-using nations looking at adopting ISO standards should look at whether they should also adopt a DSRL mapping into native formats. So that developers would work in the document using native language markup, then convert the document to the ISO standard form before shipping, for example. Or that internally in a country, the localized form was used, but it could be translated to the international form for shipping. My belief is that DSRL should become a standard part of the XML processing chain, because it addresses all sorts of versioning and localization issues.

The evolution of standards

Character sets have posed an big problem for standards makers.

  • In the 1960s/1970s generation of technologies, 7 bit character sets were used: the ASCII/EBCDIC generation. Technology standards rooted in the 60s had to cope with 7bit data transmission.
  • In the 1970s/1980s generation, communications systems moved from 7-bits to 8-bit clean systems. Typically with this generation and under the influence of the C programming language, instead of characters systems adopted a byte mentality, where a string was a sequence of bytes. Standards from this period naturally followed. However, because international data exchange was not important, the standards from this time pay no attention to identifying which character encoding was in use.
  • In the 1980s/1990s generation, attempts were made to extend the existing systems to cope with extra characters. This would involve adding overloading character escape mechanisms to allow character references to the local character set, or variable-width character sets which are ASCII compatible for single bytes but which allow multiple bytes for non-ASCII characters: UTF-8, Big 5, Shift-JIS are examples of standards that reflect this. These fitted into the constraints of 7-bit and 8-bit clean systems. However, the standards infrastructure was aimed at localization not internationalization: the advent of the PC retarded the reach of the internet initially but by the advent of the WWW suddenly there was a world-wide data incompatability problem: the standards and systems did not adequately support for resources to say which character set was used. Examples of this was HTML forms: for a long time, there was no definition of which character encoding should be used when sending forms data. In the standards world of the time, there was a real split between the internationalists, who said that everyone should adopt Unicode, and the nationalists, who said that every country should adopt locally-optimized formats.
  • The 1990s/2000s generation is the XML generation. It has been recognized that internationalization needs to be pervasive (standards should have first class support), systematic (based on Unicode), and friendly (allow people to be conservative in what they send but generous in what they recieve.) With XML, we defined XML in terms of Unicode characters, but allowed the user to use any encoding they wished: this was safe for data, because XML allows character references in terms of Unicode character numbers, and because the XML encoding header provided an effective in-band way to make sure that the character encoding of a document could be maintained. This effectively satisfied the requirements of both the nationalists and internationalists. Another example of this approach is the XML approach to URLs: in this case we deliberately went against the standard URL syntax to allow non-ASCII characters in system identifiers and namespace names, because native language markup is more important than compliance with that standard (or, at least, because conversion to ASCII-only transfer syntaxes should be a library function, not a document-writer’s job) . Sometimes the existing standards are sub-optimal and have to be ignored, even by other standards!
  • One of the last links in the chain for documents came with the much delayed release of the IRI specification. This officially standardized the system that XML adopted (and the address bar of browsers naturally had been using) of extending URLs to allow any characters. Protocols that used URLs would still use percent delimited ASCII, and address bars would still display any character, but with IRIs it becomes easier for the standards world to specify exactly what is needed. Nevertheless, the terminology IRI has not become common yet, with the result that people often say URL when they actually mean IRI, and with the subsequent result that sometimes standards drafters write URL when they mean IRI.

Native Language Markup in Open XML

The Open XML schemas use ASCII and English wording. Anything else would be rejected at ISO of course. Data values, for content and attribute values, allow non-English characters. Typically this is formalized so that things that may appear on user interfaces (such as style names) have both a print name and an internal identifier: this allows documents to be localized as far as their user interface information but international as far as their internal identifiers: good for off-shore document processing for example.

Formulas in spreadsheets are an interesting area. In order to be user friendly, the function names of course need to be meaningful to users. However, a standard cannot contain every language variant (I am told that Word 2007 has over 100 different localized versions.) So Open XML takes the view that this is the application’s responsibility: the Spanish version of a spreadsheet can present the formula to the user using Spanish words for example, but the markup is generated with the common form. (This is a respectable option: dates are usually handled as 8601 format rather than localized forms, in international standards along the same lines.)

IRIs

One area that deserves special attention, because it is so intricate, is the availability of IRIs in Open XML. This is an issue that has received a bit of attention, and was the issue in Peter Junge’s blog that has triggered this blog. The bottom line is that in the current draft text you can use any character for a relative IRI inside the package or to your file system (relative references) but for external references the current spec says the markup should use URL syntax.

Note that this does not mean that a URL on a user interface cannot use Chinese characters. Nor does it mean that Chinese characters cannot be percent encoded into a URL. This is an issue of Native Language Markup, at the software developer level.

I suspect this is just a drafting error, and I am pretty certain that JIS (Japanese Industrial Standards) at least will call for its correction in the final text. It is a strong enough requirement to force a “conditional yes” vote. The current datatypes use anyURI.

There are more details on Open Packaging Convention below.

Lets look at what Peter Junge’s blog said:

Another standard that Microsoft does not support, is the RFC 3987 specification, which defines UTF-8 capable Internet addresses. Consequently, OOXML does not support the use of Chinese characters within a Web address.

It is a textbook example of what is wrong with so much of the anti-Open XML material.

Lets look at the first sentence. Now RFC 3987 is the IRI spec (which was co-authored by Michel Suignard of Microsoft,) If you look at DIS 29500, Part 1, Annex A, Resolving Unicode Strings to Part Names, it has clauses such as “Creating an IRI from a Unicode string”, “Creating a URL from an IRI”. If you look at Part 1 Section 8.2.1, that annex is invoked. So the simple statement that Microsoft does not support is incorrect. (The explanation of IRIs is technically pretty garbled, but it is not easy to express in a single phrase.) The second sentence is incorrect too: you can have Chinese characters in the a web address, as long as they use URL syntax and are percent encoded.

Now there may be some way to weasel word this, that it really it says

Another standard that (DIS 29500) Microsoft does not support (in one case), is the RFC 3987 specification, which defines UTF-8 capable Internet addresses (internationalized WWW resource identifiers which map to ASCII based standard URLs using percent-encoded UTF-8. Consequently,(draft) OOXML does not support the (indirect) use of Chinese characters within a (external) Web address (in markup).

But an ordinary reader of a blog simply is not technically equipped to understand this. How anyone would write this if they ever had read the draft is beyond me. I mean that seriously. If you are writing a blog, and making comments about IRIs, how simple is it to download Part 2 of the spec, open it up in Acrobat or your PDF reader, and search for IRI?

Chinese Native Language Markup

I think the main trouble with Peter Junge’s blog comes from a misunderstanding of the ISO process and the position of voluntary standards. I don’t think he knows what a standard is.

When he says I hope China will not support OOXML in its ISO voting, but force Microsoft to consider talks for one harmonized office document standard for the whole world. it sounds nice and tough, but the ISO process is not geared to that kind of win/lose approach. In the ISO situation, when you find an error that can be fixed (such as this IRI mistake) you don’t throw the whole thing out, you point out the problem, propose a fix, and work together. Just because Open XML gets added to the library of voluntary standards at ISO, it does not mean that the Chinese national body is thereby forced to adopt Open XML in preference to UOF in any circumstance. Chinese businesses will be sending and receiving documents from overseas in formats outside the control of the standards bodies, and governments have little interest in making arbitrary restrictions on world trade now days; it is better for that data to be in a standard format than a non-standard one. Non-Chinese countries are not going to adopt UOF but still they will produce and receive documents: I am sure that the UOF people are entirely aware of this.

What the current generation of document standards (ODF, Open XML, UOF) does is expose all the different functionalities required. This is a great pre-requisite for getting Chinese and other requirements publicized.


Now the area of East Asian native language markup is one that is particularly important to me. I started off in SGML while working in Japan, I had a lot of contact with really wonderful East Asian experts because of my involvement in ERCS and CJK DOCP, and because I ran the Chinese XML Now!” project at Academia Sinica, Taipei in 1999/2000. This was a project (academic/practical, *not* political !!!) to try to work through issues relating to XML and Chinese. (The Chinese XML Now page is now old, and I hope there are much better sites now, but it did have a few million real hits as far as I could work out.) Schematron, now an ISO standard, came out of this work, because I wanted to develop a Schema language that did not depend on tokenized grammar rules, lay Chinese understand their language in terms of characters not words per se.


One part of this project was for me to represent Academia Sinica (*not* Taiwan) at various non-national level standards groups. One outcome was that in the XML Schema Working Group (repesenting Academia Sinica) I championed and suggested the name for anyURI: to allow better native language markup than URIs; the IRI standard was not available then.


ASCC’s reason for hiring me was a little shocking: my boss, a really incisive and surprising man, told me that Westerners on standards organizations do not listen to Asians (from Asia articulating Asian-only requirements), and so they wanted me to advocate for them (and for Chinese language requirements in general) because a white person would be more acceptable.


Now this is not so much a claim of personal racism at all: in part it is due to the language barriers, partly due to time zone and travel problems, partly due to the difficulty that people from respect-based cultures have in contention-based committee systems, the problem that people from seniority-based cultures have in expert committees, the problem that people from face-based cultures have in ad hoc discussions, and also the difficulty in getting up to speed with issues and procedures as a newcomer.


In the ISO SC34 committee on Document Processing and Description Languages, there has been an effort to schedule meetings in Asia (Korea last year), issues have to be tabled six weeks before meetings in order to prevent surprises and give people a chance to translate and discuss, and in general votes on important issues are not taken on the same day they are proposed, in order to allow consultation with national technical committees in different time zones. But nevertheless, learning how to operate effectively in committees dominated by Western-style relationships s a real difficulty (from my recent trip to India, this clearly doesn’t apply to Indians! So I mean “Asian” in the Australian sense of mainly East and South East Asians, not in the UK sense of “Indic”. )


These kind of thing are good, yet there will, in my opinion, always be a difficulty there. Of course, Westerners will learn how to interact with Asians better, and Asians will learn how to participate with Westerners better. But it is up to the nations that use a particular language or script to work out their requirements and communicate them effectively. UOF is at minimum a good exemplar of this. The Japanese kinsoku rules are perhaps another example.

Open Packaging Conventions, ZIP and IRIs

Open XML Part 2 Open Packaging Conventions sets out all details of packaging in Open XML: the profile of ZIP to use, the part referencing system, digital signatures and so on. It is the part that has URLs and IRIs etc.

Now here it gets a little complicated. The ZIP technology is not standardized. What difference does it make in practice, is a reasonable question The difference is that in a standard, you get pro-active in areas such as internationalization and accessibilty. Often proprietary formats leave internationalization issues unspecified for as long as possible: it is at the bottom of the list of work items. Now the difficult with both internationalization and accessibility, you cannot just add them on as an afterthought, casually. They can be quite disruptive, and consequently take time to get buy in.

Just as the RFC for IRIs didn’t actually come out until January 2005, the ZIP specification didn’t specifically sort out using UTF-8 for filenames until 2006-09-29, according to the release notes. That is only one month before Ecma 379 was released. This shows the dilemma for standards: now we are a year later, how should we trade-off the need for Native Language Markup on the one hand (now that ZIP spec has a way to support it) and the need to be compatible with interoperability in the form of actual ZIP libraries in the real world (and therefore fit in with major platforms and not require users to upgrade or switch libraries unnecessarily?) Document standards are full of these kinds of trade-offs, and reasonable people may differ.

So even though I said earlier that I think is probably some typo, there is also the real issue of compatibility with ZIP implementations to consider. It may have been the pragmatic choice in 2005, or whenever the OPC work was done, but probably not now.

My own opinion is that first I would like some objective evidence. How do libzip, Java, .NET handle UTF-8 etc part names in ZIP archives. (I believe Java is cool with them, I haven’t looked at the rest.) But unless there is some major lack of support, then I think it is a no-brainer to fix it.

So when we look at the treatment of IRIs in Open XML, it needs to be against the background that URLs don’t allow non-ASCII characters directly and that (at the time of drafting) ZIP did not have an adequate system either. So the choice of making translating the IRI in markup to a URL-like percent-encoded syntax for filenames was reasonable.

The difficulty with that is that it leaves it up to the user interface to translate back into the non-ASCII characters. Now that does work (e.g. web browsers) so don’t imagine it is a showstopper, but it does add an inconvenience.

Now if I were to predict where a possible problem might be, it is that there may be ZIP libraries which are still in the 70s/80s stage mentioned above: allowing bytes in file names and not saying what encoding is used. The effect of this is that if a ZIP library saves filenames using the locale-character set rather than UTF-8, the filenames will be garbled on systems with other locale-character sets.

So I think the OPC needs improvement in this area. Hurrah for the standards process! OPC should allow IRIs in external URLs as part of acceptance of the standard. But I think OPC should also support for UTF-8 part names in ZIP packages rather than requiring percent-encoding. only: I don’t know whether this should be arranged as part of the current ballot process, in maintenance or in a subsequent version.

M. David Peterson

AddThis Social Bookmark Button

Update: It just keeps getting better. Or is it worse? Guess that depends on your perspective. And with that, from a Wired News article from two days ago,

Crew Member: Previous AT&T Show Had “No Politics” Policy
By Eliot Van Buskirk August 13, 2007 | 10:26:44 AM Categories: AT&T

A crew member who worked on a show webcast by AT&T confirmed that there was a policy in place to remove artists’ political comments from shows before they were webcast.

“I can definitively say that at a previous event where AT&T was covering the show, the instructions were to shut it down if there was any swearing or if anybody starts getting political. Granted, they didn’t say to shut down any Anti-Bush comments or anything specific to any point of view or party, but ‘getting political’ was mentioned.”

The crew member went on to say that the order to mute political speech was issued by Davie Brown Entertainment, which had been hired by AT&T to produce the recordings.

Sure, the policy — which AT&T initially denied was in place — applies to all political speech, not just criticism of Bush. But most bands, when they get political, tend to lean pretty hard to the left (especially when they’re on the stage of Lollapalooza, which is trying to hang onto a rebellious, “alternative” reputation).

Randall L. Stephenson, the CEO of AT&T, is also the Vice-Chairman of the President’s National Security Telecommunications Advisory Committee, and has motivation to shield Bush from criticism. And as some readers of this blog have pointed out, AT&T is free to do whatever it wants to the audio on its webcasts.

But one has to wonder whether the same political filtering policy applied to AT&T’s webcasts could eventually affect to the company’s portion of the internet backbone, in the absence of the net neutrality legislation it actively opposes.

PLEASE NOTE: I believe it’s important I point out the fact that I personally am not Anti-Bush. In fact, I voted for him in both 2000 and 2004. Did I make a mistake in doing so? Well, that’s neither here nor there as there’s nothing I can do to change the past, only learn from it. Even still, as per a post I made a year ago last February,

Rick Jelliffe

AddThis Social Bookmark Button

I see (hat-tip Tim Bray) that Apple has released a new spreadsheet, Numbers, that looks very pretty. I think it brings out a couple of points about document standards too.

The first is that every time a new application comes out, it needs to have some new distinctive feature to sell itself. It may be simplicity, it may be beauty, it may be speed, it may be features, but there has to be a reason why people would want to buy it. If the distinctive be novel features, then there is every chance that the format will need to save data for that particular feature in new markup (e.g. a new namespace).

And as soon as we get to application-specific data, we have left the world of standards and guaranteed interoperability. Now this is a real problem for applications and catholic standards. (A little bird told me that Lotus won’t be adopting ODF as its native format for much this reason, and I haven’t seen Adobe Framemaker go native with ODF either. I would expect that the same is true for the delayed Mac version of Office with Open XML support: the problem being feature mismatch not format complexity per se.)

Which leads us into the world, not of guaranteed interoperability, but of graceful degradation.

It is interesting in the Apple promotional material that they say Numbers handily imports spreadsheets created in earlier Excel formats, as well as Excel 2007 documents created in new Office Open XML formats. I think this backs up a prediction I made recently, that no application can afford to ignore major formats for too long.

Rick Jelliffe

AddThis Social Bookmark Button

Australian national body Standards Australia had an industry forum today on Open XML. The agenda and invitation for this is up at Tom Worthington’s website.

The Invitation

Here is the most interesting part of the invitation:

This forum is being conducted by Standards Australia as a courtesy to stakeholders. It is an extraordinary meeting that we are not required to hold, but do so to provide an open process. We appreciate your attendance and expect that you appreciate our effort in making this opportunity available to you.

Standards Australia values its vote as a participating member of all international committees, and does not exercise it injudiciously. We provide considered Australian viewpoints that are beneficial to Australian stakeholders, including industry, government, academia and the general community, through the facilitation of trade and the inclusion of clear Australian requirements in international standards.

The JTC1 process has established that the ECMA-376 document is not contradictory to existing standards and ECMA has responded to a number of technical considerations raised in the initial consultation period. This forum is not to debate the merits of the JTC1 decision making process or the validity of the ECMA response.

While technical comments are welcomed, it would be entirely counter productive to use this forum to reiterate technical comments that have already been raised and are likely to be debated in every JTC1 member body in some form.

We are looking for creative, positive contributions that emphasise our commitment to representing truly Australian views to the international community.

More on that later.

The Speakers

The meeting had 30 to 35 attendees (I didn’t count, oops), based on the membership of existing technical committees and people who had sent in comments to the process so far. It was not a voting meeting at all, just a meeting to help consensus and to give more information to higher committee members. (However, participants can submit comments by Aug 21 for consideration by the Australian CITeC Standards Sector Board.)

The meeting was a three hour affair, with the first half invited speakers and the second half question and answer and commentary.

The first half started with an introduction by Standard Australia’s Alistair Tegart, who provided good strong chairmanship that left most people frustrated that they had not had a chance to say more, but which gave everyone a chance to make their most important comments in the allotted time. The interesting thing was that discussion of technical minutae was strongly discouraged (wrong meeting for that), which is a nice break for me. Discussion was civil, everyone friendly in the coffee break, and frank in the meeting.

I had been invited to speak on the subject of General overview of the standards process because of my involvement as Australian delegate to (what is now) SC34 in the 1990s and my continuing involvement with standards. A nice comment afterwards (by a law professor!) was that mine was the only talk with new content. I tried to present an SC34-based perspective on standards: what SC34 standards are, how the preference for enabling standards rather than applications has been overtaken by the fast-track process, the basic standards posture for Australia in SC34 in the mid-90s (need for simplicity to suit our small development teams, I didn’t mention support for regional neighbours though it was important) and how each different country has different requirements. (For example, some countries have a requirement that they do not want to be blocked out from international contracts because of the lack of standards.)

Then a quick mention of some of the issues that I prototype in this blog: that ISO standards for documents are voluntary, that standards form a library of choices, that the mere existence of alternative standards does not prevent any group from choosing one over the other, that standards such as PDF and Torx are not open in the sense of allowing arbitrary change but nevertheless valuable, and so on. I emphasized again that the ISO process is a win/win system in which attempts by one group to stymie another’s needs does not fit.

That took about 15 minutes, then there were speakers on the case against the adoption of Open XML (the scheduled IBM speaker was hospitalized so we were treated to an emergency podcast from Rob Weir which was basically the same content as his Technical Case against OOXML.) and for the adoption: a quick tag team with an MS representative, then the local CompTIA representative, then a CEO. The CEO, Richard White from CargoWise EDI, was particularly forceful on how it would help his business.

The Discussion

Then after coffee we had over an hour of moderated discussion. By and large it went as expected: people from local industry welcomed it as solving a real problem, people from business rivals of MS didn’t like it, people who identified themselves with Open Source didn’t like it, people from academia or standards bodies seemed to think that having it as a standard would somehow force them use it (I didn’t get this.)

I had to gag myself a few times. The local Google Maps operation was represented, but I was quite surprised to hear Lars Rasmussen say how difficult it would be to implement Open XML…surprised because he had told me last year how Google maps used VML to deliver to IE and how format was simply not a problem. (Here is the first line sent for a Google map, for confirmation: note the namespace declaration and stylesheet reference:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml">
<head><meta http-equiv="content-type" content="text/html; charset=UTF-8"/><noscript>
<meta http-equiv="refresh" content="0; URL=http://maps.google.com.au/?output=html"/><<noscript>
<title>Google Maps</title>
<link rel="stylesheet" type="text/css" href="http://www.google.com/intl/en_au/mapfiles/86/maps2.css" />
<style type="text/css">body{margin-top: 3px;margin-bottom: 0;margin-left: 8px;}
#vp {position: absolute;top: -10px;left: -10px;width: 1px;height: 1px;visibility: hidden;}
#homestate {display: none;}
v:* {behavior: url(#default#VML);}   ...

It seemed strange to be saying that a technology was too big to implement when you were in fact using that technology successfully. Maybe the Google speaker didn’t realize that VML is a part, though obsolescent, of DIS 29500. I think what happens is that “implement” gets stretched to mean “implement all the parts of a specification”: so “It is too big to implement” means “It is too big to implement it all” in Googlespeak. But Australians aren’t going to implement a full new office suite. It is too big, even if you just used ODF; and Open Source people will more naturally join the existing Open Source and Free projects rather than set up new ones, it seems to me. For Australian requirements, “full implementation from scratch” is an imaginary and spurious requirements.

What has maybe slipped Lee’s mind is that most integrators will use DIS 29500 in the same way that Google Maps would be using it: just cherry picking the parts that are needed (in their case, the subset of VML.) And, in particular, when you are using it as an end-format, you only need to “implement” (i.e. generate) the elements that correspond to your input. Not the whole thing.

When I was talking to the local Google people last year, they told me that Google doesn’t actually have any fulltime people allocated to standards work in general. I gathered that was a little pedestrian for them, because they made their money by innovating not by following the pack: sounds like a recipe for QA disaster to me. I don’t know whether their foray into web-based applications will make them a little more savvy with standards.

Another Google guy (who turned out to be a ring-in: Georg Greve, initiator and president of the Free Software Foundation Europe, who gfim says was flown in from Switzerland by Google especially for the occasion) stood up and recommended we should track what the Indian standards body’s concerns about binary mappings. Again I had to gag myself (actually, Alistair did it for me) because I believe I actually was present at the meeting in Delhi where that issue was raised: by the Indian representatives of Sun and IBM. I am afraid I couldn’t help thinking this was the classic Colbert “Echo Chamber” effect, similar to Wikipedia’s astroturfing: one member of a collective puts something up in one forum, then other members of the same collective bring up the first as independent evidence. (In this case with the added twist that Sun and IBM were not mentioned: a lay listener could easily have had the impression that this was some kind of position adopted by the BIS, whereas, as far as I know, it has not so been. I hope the Georg will be a little more careful with attributions in the future, because people can so easily get the wrong impression.)

An interesting comment from local IT29? committee head, Jamie (surname illegible sorry), that for educational users, they needed to guarantee interoperability and could not force students to purchase particular programs. I didn’t uite see the logic of how this meant that DIS 29500 should not be adopted at ISO. Marcus Carr from Allette Systems (who I consult for and teach standards seminars by) sits on a local IT committee too and responded that students would be better served by PDF if guaranteed interoperability were the issue.

A few comments later, I got a chance to mention that none of the XML formats today provide guaranteed interoperability in the sense of visual fidelity, for the reasons that readers of this blog will be familiar with: every application supports different feature sets, has different fonts, hyphenation dictionaries, kerning tables, line break algorithms, and so on. Plus the formats are extensible, so can have all sorts of strange media types. Standard documents may perhaps be a necessary condition, but they certainly are not sufficient. What is needed are profiles, which restrict the features, requires certain application behaviours and require certain fonts. And for uncramped page designs that reduce the chance of page overflow on different systems.

Several speakers raised the issue of IP rights and worries about the MS Covenant Not to Sue and the Open Specification Promise. MS gave the usual response: we have run it past lots of external lawyers who say it is fine, and since OSP is so similar to Sun’s equivalent, why don’t you have the same concerns about ODF. I think Standards Australia has been playing a little coy here, because they are trying to be scrupulous not to be seen to take sides, I guess. I had asked them in email to have a clear position on standards and IP from the legal perspective, and they ended up saying, in effect, that for Standards Australia, it is JTC1s responsibility and competency to evaluate the IP issues of drafts submitted for Fast-Tracking, and not a technical issue for voting.

I think a better and more complete answer would be better. People who are interested in this are should first read the excellent webpage by OASIS lawyer and anti-Open XML conduit Andy Updegrove, especially on the Allied Tubemakers and Dell cases. Standards don’t exist in a vacuum, and MS standard’s participation and the very strong and constant statements that MS would be considered in any court.

Another aspect of the IP discussions is that typesetting and desktop applications are not now a new thing. With a 20 year limit, patents before 1987 have expired, which is well after the invention all of the basic ideas in office suite software. (Last week in Thailand, MS’ Oliver Bell was asked a question on this issue, and IIRC he said that actually MS only uses its patent portfolio defensively and has never sued on IP. Does anyone have a list on this?)

Jamie ? pointed out that participation in a standards body does not nullify the IP; however, the issue is the scant chance that submarine patents are enforceable. So add the two covenants, external legal opinion, vetting by Ecma and ISO/IEC JTC1, the age of likely patents, the recent stronger court awareness of junk patents, the difficulty of enforcing submarine patents, the multiple statements made by MS executives and staff to the highest level, and the basic fact that a document standard is more concerned with schemas and general description and no of methods or algorithms, and I don’t know how much more would be possible to satisfy someone.

One comment mentioned the idea that MS covenant not to sue etc does not cover external technologies. Of course, not, but that is no different for ODF and HTML.

Other speakers brought up a few of the usual suspects. autoSpaceLikeWord95 made its scheduled appearance, along with statements that made it clear that the speaker had never read the spec and was parroting. There is an easy way to tell a parrot in this area: they will say something like “The standard is full of compatibility elements like autoSpaceLikeWord95 which are undocumented and prevent implementation”. In fact, there are 64 compatibility elements, and IIRC correctly all but two are adequately document with explanations of their general functionality. AutoSpaceLikeWord95 is optional and is clearly marked as deprecated: it seems to be a warning flag that some document was originally created by Word95 with this bug and it had never been corrected. The bug is related to the treatment of Fullwidth character used in East Asian typesetting (zenkaku): certainly for Australian users it is utterly extraneous to our national requirements.

The issue of the definition of various functions in SpreadsheetML came up too, from a localization perspective. (Again, irrelevant to Australia.) If the moderator hadn’t been so tough, I would have liked to have asked whether the speaker wanted to remove or rename the existing function (and break everyone who used this function’s spreadsheets) or merely to add better localized functions (which belongs in a maintenance phase.)

On the issue of maintenance, I did get another opportunity to spout. I said that it is too early to tell what effective systems for maintenance will occur. When OASIS and ECMA submit their standards, they also submit information about how maintenance will occur. There would be collaboration with JTC1 SC34, for example. I said I thought this was only practicable for fast corrigenda (which don’t add functionality just fix the text and clear mistakes) and that the approach that OASIS seem to be taking, which would involve resubmitting ODF 1.1 and ODF 1.2 etc for fast-tracking each time, was probably the more realistic thing to expect. However, I noted that DIS 29500 has a quite strong extension regime, indeed a whole part (Part 5) and starts from a much more complete position that ODF: so one would expect complete updates to be rare events, perhaps aligned to the three-year product cycle.

The main Google guy at some stage made a good point about overlapping standards, along the lines that having multiple standards for programming languages was OK because the differences could be justified, but he had not heard arguments why Open XML was so different from ODF that it could be justified.

A few others also had comments that could be fairly reduced to “We don’t need it, therefore we do not support it becoming an ISO standard, therefore we appose it becoming an ISO standard” which is a non sequitur.

Baseline formats and downstream formats

A very interesting point was made by the National Archives representative. They don’t have the resources to cope with Open XML and ODF, he said, so they would adopt ODF for their future format and didn’t support Open XML becoming a standard. Again, I don’t see how the standardization of Open XML forces their policy in any way. Standards Australia is not even a government agency, and has no legal clout on the National Archives: moving to ODF where possible seems a reasonable choice (well, ODF 1.3).

Marcus Carr objected to this. He spoke from the perspective of document processing from the early 90s, and the difficulties in practice of dealing with Word documents (with the various hijinks: converting .DOC to the Rainbow DTD, converting .DOC to RTF then processing that, etc) and brought up the key processing issue that I think almost all the commentators on Open XML miss. He brought up the issue of the need for a full-fidelity baseline schema to allow the most flexibility in downstream processing.

Now this is a pipeline approach that has proven itself to work over the last 15 years we have been using it. Elephantine readers may remember a blog of mine a year ago:

A typical strategy when converting from XML into some structured text format is to have three transformations:

* first, convert the XML into ideal XML: resolve links as needed, remove extraneous elements and attributes, convert cases, generate headings and other things that need to be generated
* second, convert that ideal XML into an XML-ized version of the output format
* third, convert the output XML into the text format, delimiting and indenting as needed

If the input data is non-XML, then we have an additional stage where we first convert the data into an “baseline” XML format that maintains all the information from the data source (it could be a database, another format, a binary, no matter.) You never know what information you need, and you don’t want to trust someone else’s abstraction but work with as unmediated form of the data as possible.

So Marcus’ comments was that we (the system integration and document processing community need Open XML as an ISO standard, because it alone provides an adequate baseline format for subsequent transformations. So the Australian National Archives could well decide to archive data using ODF, but they may well decide to implement their conversion to ODF by going through Open XML. So one standard would be useful for one purpose (saving future archives), the other standard for a different purpose (opening existing files in the archive.) If the Australian National Archive is moving to ODF 1.0 fast, I hope they don’t throw away the original binaries…

Breadcrumbs

So all in all, I think the day was a worthwhile exercise, and a good opportunity to help us all escape groupthink.

I suspect, from the tone of the invitation and comments made at the meeting and elsewhere, that when Standards Australia looks at the comments that people send in (deadline August 21) they will be completely disinterested in comments that question JTC1 decisions and comments on issues that have no local relevance. I gather they may not be much impressed by arguments that can be refuted by precedent: for example, that there should be no overlapping standards. However, we shall see, and I don’t know anything about the CITeC Standards Sector Board.

The most enlightened part of their approach, and I think this is pretty novel, is that Standards Australia seem very aware that the role of an individual standards body in vetting a standard when there is a multi-national campaign to discredit it (on the one hand) and promote it (on the other) changes the requirements for a review. In the case of a normal standard review, you raise as many (sensible) flaws as you can, because you don’t know whether the issue will be addressed by anyone else. In the case of a global campaign, it is clear that almost every National Body has been mail-bombed with the same speil and that therefore those are issues that we can actually ignore, unless they have a clear national significance because we know that other national bodies will be examining them. I think that is what may be behind the last line of the invitation
We are looking for creative, positive contributions that emphasise our commitment to representing truly Australian views to the international community: they want to husband their resources to what is important for local industry and local requirements. They don’t want to succumb to a Denial of Service attack where by concentrating on sorting out edge cases and typos they miss out the big picture of national interest.

I’m preparing my comments to the CITeC Standards Sector Board at the moment, and I will put them online here too, if anyone is interested.

Uche Ogbuji

AddThis Social Bookmark Button

I’ll use this entry as an anchor for my observations on the third day of Extreme Markup Languages. I’ll update it with a note each time a new talk begins, but I’ll add my comments on the talk in the comments section. There is a numbering scheme for the talks, to correlate to comments.

If you happen to be reading this in an aggregator, much of the meat is in the comments, so you might want to click through.

D3.1. “Principles, patterns, and procedures of XML schema design: Reporting from the XBlog project”, Anne Brüggemann-Klein, Thomas Schöpf, Karlheinz Toni

D3.2. ” Enhancing AIML Bots using semantic web technologies”, Eric Freese

D3.3. “Converting into pattern-based schemas: A formal approach”, Antonina Dattolo, Angelo Di Iorio, Silvia Duca, Antonio Angelo Feliziani, Fabio Vitali

D3.6. “Relational database preservation through XML modeling”, José Carlos Ramalho, Miguel Ferreira, Luís Francisco da Cunha Cardoso de Faria, Rui Castro

D3.7. ” Mind the Gap: Seeking holes in the markup-related standards suite”, Chris Lilley, James David Mason, Mary McRae

Rick Jelliffe

AddThis Social Bookmark Button

It is extraordinary that we have no standard, ISO or otherwise, for the ZIP format, when it is the basis for modern packaging: JAR, WAR, EAR, SCORM, ODF, Open XML, etc.

I have found that I was wrong that DIS 29500 (Open XML) includes a ZIP specification. What it has is a quite detailed profile (more than 20 pages), requiring the use of deflate compression and disabling all the advanced features of ZIP, however it falls well short of being an actual ZIP specifation. So Open XML and ODF (which has 2 paragraphs only on this), ultimately both reference the PKWARE definition of ZIP. Sigh…

Last year, I was tasked by SC34 to investigate an ISO standard for ZIP, so I will have to start looking into it again. I am interested in finding out what people think about how much of ZIP should be standardized (if we can indeed get any of it standardized): is a minimal Open-XML style ZIP the way to go, or is something more full-featured better? Should the goal be implementability and out-going compatability (be conservative in what you send), in which case something like the Open XML subset is appropriate, or in-coming compatability (be generous in what you receive), in which case an ISO ZIP would try to allow as much variability as possible?

Rick Jelliffe

AddThis Social Bookmark Button

One of the most odd comments that is coming up on DIS 29500 is that plain old XML is not human readable. I would love to hear an explanation of this. A string of characters saved in a text file with a .xsd extension is not human readable, but exactly the same string when cut and pasted into a word processor is human-readable?

(To forestall talking in circles, this is not about whether XSD is baroque, nor whether a human who can read XML can then necessarily understand the intended semantics of the markup.)

Rick Jelliffe

AddThis Social Bookmark Button

I have been hearing a lot of fine sentiments about ISO standards recently. Is a high bar being drawn that does not reflect established practice objectively evidenced by the catalog of ISO standards?

There can only be one ISO standard for any application area

Oh yeah? What about ISO FORTRAN, ISO PASCAL, ISO Eiffel, ISO Common LISP, ISO C, ISO BASIC, ISO ADA, ISO C++, ISO C#, ISO EcmaScript, and so on? These are all programming languages, with enormous overlap in what they can be used for. What about ISO DTD, ISO RELAX NG and ISO Schematron? These are schema languages, again with enormous overlap in what they can be used for. What about ISO POSIX and ISO Linux Standard Base? What about (Ecma sponsored) ISO9660 disk format and (Ecma sponsored) ISO13346 UDF disk format?

ISO Standards must be made by combining the best of all worlds, and cannot rubberstamp a technology that came from a single vendor

Oh yeah? What about ISO 10664 Hexalobular internal driving feature for bolts and screws - Torx screw head? What about ISO PDF, ISO C#? What about (JIS sponsored) ISO QR Codes? (“QR Code is open in the sense that the specification of QR Code is disclosed and that the patent right owned by Denso Wave is not exercised.”)

In an ISO standard, all elements should be supported by all application for interoperability

Oh yeah? What about ISO ODF s1.6 There are no rules regarding the elements and attributes that actually have to be supported by conforming applications?

Uche Ogbuji

AddThis Social Bookmark Button

I’ll use this entry as an anchor for my observations on the second day of Extreme Markup Languages. I’ll update it with a note each time a new talk begins, but I’ll add my comments on the talk in the comments section. There is a numbering scheme for the talks, to correlate to comments.

If you happen to be reading this in an aggregator, much of the meat is in the comments, so you might want to click through.

D2.1. “Retiring your metadata shoehorn (For OpenOffice documents)”, Patrick Durusau

D2.2. “Localization of schema languages”, Felix Sasaki

D2.5. “Semantic resolvers for semantic web glasses”, Nikita Ogievetsky

Kurt Cagle

AddThis Social Bookmark Button

The Dow Jones Industrial Average (the DOW) did quite a dance today, with its peak to trough extending nearly 300 points before closing, pretty much at random, more or less where it started. I bring this up not to turn this column into an economic report about Wall Street (definitely out of bounds here, except perhaps in the discussion of Atom-based XML feeds retrieving DJ stats) but to discuss a bit about systems theory and to review a book that I think should be pretty much de required reading for XML architects.

I suspect that I’ve always been something of a systems theorist, and I’ve noticed that systems theory tends to attract architects like moths to a bright light (no comment about getting burned). You can tell the systems theorists out there - they are the ones that clandestinely like to play Sim City at work, who can readily tell you what the Austrian school of economics is despite not being an economist, who were getting nervous about calving ice shelves and CO2 concentrations long before Al Gore started doing his stage show. Some of us are scientists, some are programmers, some are environmentalists or economists, but the common thread that binds us together is that we’re the ones who never stopped asking “WHY?” as kids.

Uche Ogbuji

AddThis Social Bookmark Button

I’ll use this entry as an anchor for my observations on the first day of Extreme Markup Languages (See also: Looking forward to Extreme Markup Languages). I’ll update it with a note each time a new talk begins, but I’ll add my comments on the talk in the comments section. I also added a numbering scheme for the talks, to correlate to comments.

If you happen to be reading this in an aggregator, much of the meat is in the comments, so you might want to click through.

D1.1. B. Tommie Usdin, one of the organizers of opens up the conference with “Riding the wave, riding for a fall, or just along for the ride?”

D1.2. “Easy RDF for real-life system modeling”, Thomas B. Passin

D1.3. “Writing an XSLT optimizer in XSLT”, Michael Kay

D1.4. “From Word to XML to mobile devices” , David Lee

D1.5. “MYCAREVENT: OWL and the automotive repair information supply chain”, Martin Bryan & Jay Cousins (Martin presented alone)

D1.6 “Advanced approaches to XML document validation”, Petr Nalevka, Jirka Kosek (Petr presented alone)

Uche Ogbuji

AddThis Social Bookmark Button

I’m still getting my Weblogger profile here updated, but this year I transitioned from one company I co-founded to another. Zepheira provides data architecture solutions, with a focus on semantic technology. I was early on the Semantic Web bandwagon, and I almost fell off at one point because I felt the useful, modest ideas at the core had been overrun by an academic brand of technological megalomania. This year I felt the timing was right to not only renew my interest in the technology, but to stake my livelihood on it. Part of it was timing: I was starting to see the more useful underpinnings of semantic technology take hold in corporations. Part of it was people: I found a group of professionals who I believed were capable of building practical semantic technology solutions, and, more importantly, selling them.

One of those people, Eric Miller, former chair of the W3C Semantic Web Activity, is especially well known for describing the benefits of semantic technology in terms executives can appreciate, and he’s featured in a new InternetWeek article “The Semantic Web Goes to Work”. The article says:

“You[’d] better figure out what the Semantic Web is and soon, because its concepts have graduated from academia and are starting to contribute to your competitor’s bottom line.”

I’m hearing a lot of that sort of thing, lately. The pundits, having written off semantic technology as so much pipe-dreaming for so long have switched into a level of hype overdrive. The reality is that as Eric puts it in the article, a consistent, universal system of identifiers and a layer of technologies for mapping these identifiers is the sweet spot of semantic technology. Semantic Web technology is the specialization that builds these identifiers on Web technology, and in particular URIs. This opens up the benefits of REST architecture, and for me that the third pillar is the universal writing system provided by XML. These are all, individually, modest technologies. Hardly nanotech, quantum mechanics or genetic engineering. But take these three and combine them with a skilled data architect and I do believe very special things are possible. There’s a large crowd of folks who still make the free association from “semantic” to “metacrap”, but that presents nothing but a ripe opportunity for others who know how to keep it simple, and thus get real work done.

The article also mentions Eric’s keynote at the Semantic Web Strategies conference, which is chaired by fellow XML-meets-semantic-tech pragmatist Bob DuCharme this October. I’ll also be on a keynote panel, and I’ll be co-presenting with Kristen Harris, long-time collaborator at Sun about how we improved content architecture for Sun’s mail Web sites using Semantic technology and REST.

This conference, organized by IT industry watchers Jupitermedia, is just another indication of how seriously folks are starting to take this stuff. I almost cringe that the stampede could end up ruining the crop, but that’s a test every worthwhile technology must endure at some point.

Rick Jelliffe

AddThis Social Bookmark Button

This is a fake fake blog, because even though it is not real-time, I was actually at the Open Publish 2007, Sydney conference, unlike my real fake real-time blogs. They will be putting presentations online next week.

I chaired a full-day Symposium on Standard Open Document formats on Wednesday. Alistair Spiers has blogged on this. We had four sections.First I talked about the history of document technologies and standards, for context, looked at the modern ZIP-based formats, and at the similarities and differences between ODF and Open XML, especially looking at the differences in their goals from their standards.

Rather than being theoretical, we then spent most of the day with presentations from two developers talking about their actual experiences in integrating systems around Open XML and ODF. We certainly have arrived at the point where real implementation experience is available that trumps fact-free blather and FUD.

Jason Harrop is well-known from his SpeedLegal days, and showed a collabarative multi-version editing system built on custom XML (adding structures to the linear text) in Office 2007: he had a helper application in Word and the data was sent to a backed Java-based repository which handled shredding and storage issues. He didn’t report any particular problems with formats, just that you need to be careful with namespaces. The system would be good for people collaborating on many kind of legal documents.

Dr Ian Barnes from Australian National University is working in a similar area to Dr Peter Sefton (they are mates, I gather) and his talk was concerned with his Digital Scholar’s Workbench project, whch uses Open Office and ODF to pivot through DOCBOOK to various output formats: and various issues related to multi-publishing and making it appealing for data entry people to use stylesheets. Interestingly, the project grew out of student assignments, where changes in technology allowed a toy problem to inspire a practical result.

I finished off the afternoon with a walk through the standards process, and where Open XML was up to. At the end, we had a welcome lurking visit from Chandi Perera, who emphasized that at the commercial CIO level the issue of standards is entirely subservient to issues of ROI, execution etc.

I didn’t attend any conference presentations, but I had reports that it was the best program yet. The Dr Raymond Wong paper on his new compressed, indexed, DOM/XML-friendly format caused quite a stir in particular. The paper will be up at the Open Publish site, but from what I understand from the drinks session (which I did attend) the format splits the document into a novel tree/index structure at its top, and the data content at the bottom: because the document is smaller than the raw XML it is suitable for transmission and because it can be used directly (it is not a compression per se) without decompression it can be loaded and used directly by DOMs. It sounded to good to be true, but Raymond answered the various questions I had, so it looks good enough to be true now!

I gave the closing keynote, on “The True Saga of Wikigate” and the audience laughed and cried at the right moments. There were many more questions than I anticipated, particularly about whether Wikipedia was reliable. I said that I thought it was excellent general, and that my experience of them was really constructive, but that if you see a page where only one side is represented (e.g. if there are many links to one side of the story only) you should beware.

In the final conference debriefing session, Nick Carr made an interesting point. He said that this was the first time in all the years this conference (and its predecessors, XML Open and SGML Open) has been running that there have been *no* presentations giving some trick or product or technology for trying to shoehorn Word into an XML or SGML production process. In previous years there have always been two or three, typically with some combination of VB and massage. It is quite remarkable how far we have come since 2004 where I reported of the same conference in this blog If I were to pick a theme or meme, it was that the decision on whether and how to support Word was the by far the most critical decision for most large XML deployments.

The sister conference Open Standards 2007 will be held 15,17 November in Sydney, again in partnership with OASIS.

Uche Ogbuji

AddThis Social Bookmark Button

I’ll be off to Extreme Markup Languages 2007 on Monday. It’s will be my first time, and I’m excited because it’s always been one of those conferences I’ve wanted to attend, but August is usually not a good time for such things on my calendar. I’ve always heard that it’s a brilliant conference, and my French friends always tell me Montreal is a very fun city (doesn’t stop them from poking fun at the French-Canadian accent).

Some of the talks I’m especially looking forward to are:

* Writing an XSLT optimizer in XSLT
* Advanced approaches to XML document validation
* Retiring your metadata shoehorn (for OpenOffice documents)
* Localization of schema languages

There are many other juicy -looking talks, but the above really stood out for me at first glance.

I hope to meet up with many old colleagues, and make some new acquaintances at the conference, and I’ll be reporting often from this Weblog. I might even try a bit of live-blogging.

Rick Jelliffe

AddThis Social Bookmark Button

I was enjoying my new Linux Mint desktop, mentioned in a previous blog. I had upgraded all the packages to the latest versions (for the Bianca distro, at least, which is a few months behind the current major release) using the mint tool and everything was swinging the way it was supposed to. Today, disaster.

yesterday I clicked on a button (I think called “Upgrade”) in the main menu, and up comes a box asking if I want to upgrade to Ubuntu 7.0.4, which I checked to find is the latest. That sounds good, I think to myself and press the fateful button. After 24 hours of downloading, it had become obvious that this was in a major operation that did not belong in a casual button. First, it required dozens of clicks on command prompts for packages that already existed. Then it suddenly exited. Was it finished? Hard to know. No processes running for it.

So I reboot, and it all fails. I t*suspect* the problem is that it has taken the file system configuration from my old Mandrake files on the bootloader, not from the Mint distro. So Ubuntu complains it cannot load the root partition; presumably it cannot find it on partition hd6 or whereever because it no longer exists. So now I don’t have a bootable system; or at least, it boots into a strange RAM-based shell and I suppose I have to use the command-line tools to fix something. I got tired of doing this kind of thing in UNIX 20 years ago, why on earth are we still there? I am lost tonight, grrr.

Anyway, the moral is, don’t push the button to Upgrade to Ubuntu 7.0.4! I know I should have downloaded the image and written it to ISO discs for booting, but the button was so tempting and seemed to be working well. Sigh. This is not a good experience.

I have been looking around for a non-Unix, non-Mac, non-Windows environment. One option that appeals to me is to have a LISP machine, because I used to work at TI supporting their Explorer AI systems at one stage: I see that some enthusiasts have actually made an emulator for the Explorer I, which might do me, though it is not clear how much of the Explorer system is intact with the distro. Another interesting option would be to build a system that only provided Java applications.

Looking around at the various virtual machine implementations and other technical option, it struck me that probably the easiest course currently is merely to have a stripped-down UNIX+X+DesktopManager with no applications that can be used as a fat host for the actual applications. Maybe I should try out BSD, for the base. It seems that most of the different environments that might be interesting are actually hosted under Linux/BSD anyway.

Suggestions for an interesting alternative to the hackneyed operating systems welcome. And ideas on how to get Ubuntu to look for the right disk partition when it loads doubly welcome!

Rick Jelliffe

AddThis Social Bookmark Button

I have finally found an example of “contradiction” in two ISO standards that meets what I think ISO means by contradiiction: I have been looking for an example (and how it was dealt with) since writing my blog entry earlier in the year, What is a contradiction of an ISO standard. The example is the conflict between ISO POSIX and ISO Linux.

ISO POSIX and ISO Linux came in to ISO from external sources (IEEE/Open Group and Linux/GNU) and have considerable (or more) overlap. The Linux spec is a profile of the POSIX spec in general (and is interesting because it is one of the first signs of Open Source documentations infliltrating the ISO system, which is generally riddled with industrial, government, niche developer and academic interests.) But there are some cases where POSIX says one thing and Linux says another.

Now my definition of a contradiction was:

  • One standard attempts to redefine another, or is a rival standard for exactly the same named thing but is different in some aspect.
  • One standard disrupts another.
  • One standard pretends to be another.
  • One standard incorrectly uses another.

and I gave the Linux/POSIX difficulty as an example of a contradiction. Well, now it is six months later, lets see how ISO/IEC has dealt with the matter.

They have allowed the ISO standard for Linux as well as POSIX but they have also clearly stated where there is contradiction and gathered the information together into a technical report Technical Report on the Conflicts between the ISO/IEC 9945 (POSIX) and the Linux Standard Base (ISO/IEC 23360). So these contradictions, instead of being showstoppers, are exposed and clearly publicized as “conflicts”.

The POSIX standard remains clear. The people who want a standard for the external technology Linux get what they want. The differences become clear for software developers to workaround. And the differences can be passed on as information to the standards maintenance effort. Everybody wins.

I think this shows several things. First, that ISO is (now) geared to getting win/win solutions, and not one geared to allowing one group to stymie another (oh not this hobby horse again!) Next that the mere presence of a minor contradiction is not a showstopper, especially where there is continuing dialog between the parties: there is some aspect of proportionality. And finally that when a standard documents an external, existing technology, the standard needs to be “warts and all” and not a faked-up sugar-coated version.

Now some national bodies are apparently working assuming a much stricter definition of “contradiction”, where one standard disrupts another. And other people have a much slacker version, good luck to them. But it is interesting that no matter whose definition is adopted, the ISO Linux/ISO POSIX conflicts seems to be “contradictions”. And the ISO response? Constructive engagement to allow the development of consensus voluntary standards: this is a point I made before that ISO standards are like conversations not laws

Rick Jelliffe

AddThis Social Bookmark Button

ISO/IEC standards can be purchased from ISO and usually from your local national body. The lack of free online availability has effectively made ISO standard irrelevant to the (home/hacker section of the) Open Source community. However, many important ISO standards can be located and downloaded for free legally if you know where to look.

ISO

The first source is ISO itself. Where an ISO standard is based on a pre-existing external specification that is itself freely available, (a “Publicly Available Specification” in ISO-speak), the committee managing the standardizing process can ask ISO to make it available on the Publicly Available Standards webpage. This typically is used for standards that come in from an external boutique standards body, but can come from companies (e.g. MS C#) or even from individuals (as was the case with standardizing Schematron.)

These standards do have some encumbrances: you have a single user license and you may only retain one printed copy. This is perfectly adequate for a single open source developer or student.

The ISO list contains the standards for programming languages (FORTRAN, C, BNF, ECMAScript, C#, CLI, Eiffel, Ada profile), graphics (CGM), networking and data interchange (OSI, X.25, parts of EDI, phone systems, the basis of Unicode), data formats (ASN.1, parts of MPEG, .iso CD format, JPEG2000, ODF), hardware (data cartriges, optical disks), and even the Linux application binary interface (a near subset of ISO POSIX), some of current and some of historical interest, and many standards establishing common vocabularies and on conformance testing. Most important for XML people, it has the growing list ISO DSDL schemas languages (RELAX NG, Schematron, etc). It also includes some Technical Reports, which are not standards but more like backgrounders or tutorials: ISO/IEC TR 15285:1998 Information technology — An operational model for characters and glyphs is a good example.

What is notably missing? The subsets of PDF for pre-press exchange (PDF/X) and for archiving (PDF/A) would be nice.

National Bodies and Industrial Consortia

When an ISO standard is a rubberstamp of a national standard or industry consortium specification, the original is frequently available from the original site. For example, OASIS, Ecma. Sometimes a standard is augmented with extra information that becomes the preferred distribution: this is the case with Unicode Consortium’s augmentation of ISO 10646 as the Unicode Character Set.

Drafts

Now the other source of standards material are draft versions. As a standard develops through a committee, there are very often discussion drafts made, and these frequently make their way onto the internet, to help discussion and promotion and to provide a record of the progress. Don’t copy them. International standards have an IS number, and technical reports have a TR number.

he early drafts of a standard are called Committee Drafts and have a CD number: you should be very wary of these.

The versions that make it to an initial vote are called Draft International Standards and have a DIS number; these are usually pretty good indications of the final standard so that there may be a few changes every few pages, they are good enough that the Steering Committee (SC) at ISO presented them for an international ballot.

A draft standard that has had all the changes made from the DIS ballot and is submitted for a final vote is the Final Draft International Standard with an FDIS number; these are gold: rare and valuable, because even though you can expect the final standard to only differ in small editorial ways (typos fixed, etc) usually committee members take them off public websites when the IS is published at ISO. There can be a gap of many months between when an FDIS is accepted at ballot and when the book is available published from ISO, so the online version can help tide people over that period; typically, unless the standard is destined for the free list above, the FDIS would be taken off the website at that time, which ISO asks for.

These drafts online serve a very valuable purpose: they are very convenient for helping developers decide whether a standard might be useful for them. After deciding, of course, a serious developer would then progress to supporting ISO and buying the copy. While a CD is unsuitable for the purpose, a DIS is often good enough to use for prototyping some software, with the caveat that you should check whether there were a succession of DIS in case it was contentious.

How do you find these drafts? Google for “ISO” and the keyword for the technology you are looking for. Then look for numbers, preferably with a DIS or FDIS in them. You may also use the same method to find out which committee as ISO handled the standard (look for “TC” or “SC”)’; many of them have websites with all their formal material including drafts and comments. For example, SC34 is here thanks to Ken Holman’s efforts. An example of a draft archived on the committee site is the ISO C++ draft.

Another good source is Open Standards, which hosts the websites for many ISO groups, such as the POSIX effort.

Advertisement