June 2008 Archives

Hari K. Gottipati

AddThis Social Bookmark Button

My saga on problems with GMail continue. Despite of the -ve feedback (”GMail is working fine“, “GMail is awesome‘, “Not sure why you are complaining GMail?” etc) to my posts, I continue to see the problems with GMail. I am not alone on the planet, lot of people are in the same boat(You can read the problems with GMail here, here, here, here, here, here, here, here and here). The problems are frequent and particularly when they release new features. Some times I feel that Gmail is rushing to release the features without proper testing. May be they think that it is OK to roll out the features with bugs as it is in beta. Until now it was my guess only, but it turned out to be a fact. Sergey Solyanik who worked on GMail revealed some interesting facts on Google procuts and culture after leaving Google.

In the last year, and slick as it is, there’s just too much of it that is regularly broken. It seems like every week 10% of all the features are broken in one or the other browser. And it’s a different 10% every week - the old bugs are getting fixed, the new ones introduced. This across Blogger, Gmail, Google Docs, Maps, and more.

It seems Google culture is focused on introducing the cool features, not focusing on quality. Does Google think that since it is free for the user to use, quality does not matter? Well, it may be free to use, but Google is making money off of it by placing ads.

The culture part is very important here - you can spend more time fixing bugs, you can introduce processes to improve things, but it is very, very hard to change the culture. And the culture at Google values “coolness” tremendously, and the quality of service not as much. At least in the places where I worked.

Incidentally his journey from Microsoft to Google was not as good as he thought and took U turn back to Microsoft. Also he explained why Microsoft is better than Google to progress in the career.

The Google Manager is a very interesting phenomenon. On one hand, they usually have a LOT of people from different businesses reporting to them, and are perennially very busy.
On the other hand, in my year at Google, I could not figure out what was it they were doing. The better manager that I had collected feedback from my peers and gave it to me. There was no other (observable by me) impact on Google. The worse manager that I had did not do even that, so for me as a manager he was a complete no-op. I asked quite a few other engineers from senior to senior staff levels that had spent far more time at Google than I, and they didn’t know either. I am not making this up!
At Microsoft, the role of a manager is far more obvious. A dev lead is responsible for the success of the feature and the health of the feature team. A dev manager is responsible for the success of the product and the culture of the dev team. A PUM is responsible for the success of the business, and interoperation of the three teams that work on the product.

Isn’t it bad for a company like Google not focusing on the quality?

Update: Slashdot is also discussing this from a different prospective “Some Developers Leaving Google For Microsoft” and on:

Everything is pretty much run by [engineering] — PMs and testers are conspicuously absent from the process. Google as an organization is not geared — culturally — to delivering enterprise class reliability to its user applications.

Erik Wilde

AddThis Social Bookmark Button

During the recent discussion of the OAI-ORE drafts (which use RDF), the claim was made that RDF is serialized in RDF/XML and thus could be considered an XML representation of the underlying data model. My response to that was that the RDF model is different from XML, and that it thus is pretty hard to process RDF/XML using XML tools, in particular when considering all constructs allowed by RDF/XML, and maybe even the possibility how to update RDF/XML data using XML tools alone.

I tried for some time to find a general-purpose RDF/XML parser written in XSLT, but so far could not find one. But Google is imperfect and i might not know the best places where to look. So here is my question: Is there a general-purpose RDF/XML parser written in XSLT? It has to support all the fun stuff allowed by XML and RDF/XML, such as weird uses of namespace declarations, XML Base, rdf:ID and RDF/XML syntactic sugar. It must accept anything that is valid RDF/XML. As a result, it should produce some form of normalized RDF/XML, but I really don’t care that much about the exact format (ideally, it should be XPath-friendly). The parser must be robust enough to produce the exact same normalized result for inputs that look radically different because of XML and RDF/XML syntax variations.

I am really interested to see whether such a beast exists, and if so, how big it is. My guess is that it’s not trivial to write such a parser, but it definitely is possible. After finding out whether such a beast exists, my follow-up question will be whether there is an associated function library that can then work on the parsed RDF model, so that the data can be traversed, queried, updated, and serialized.

Eric Larson

AddThis Social Bookmark Button

It is interesting to see the progression of free software along side the proliferation of the web. When I first started programming, I got involved with a web CMS I used in my contract work. I would write a new plugin or feature along with migrating a design to the software and afterwords, I would try to contribute it back. One time, the designer I was working with asked me to remove some of the project branding as well as a GPL notice on the login page. Some of the community found out and a rather long dialog started regarding whether or not I had violated the licensing. In the end, I came to compromise with the original author and we all moved on.

I still think I was right. I contacted the FSF about the issue and they confirmed my evaluation. My argument was that users of the site were not the same as users of the software. My clients had access to the source code and they were free to change it however they wished. I considered the client the actual users of the software. My removing of GPL licensing information from being visible in the login HTML did not limit or restrict the users freedom in any way and I continued to include the same copyright notices in the source of the page. Needless to say, while I disagreed with the author’s perspective, I had no problem coming to a compromise. The whole situation was far from heated and I personally think it was a healthy dialog for the community.

My situation made it clear there were still questions to be answered in terms of free software on the web. It seems the AGPL is one answer for this kind of problem and there are some folks actively promoting it as a means of distributing web based free software. While, I think the terms (as I understand them) of the AGPL would not work for something like a CMS, the concept of providing software over a network and considering the output as licensed has its value.

The best thing about the AGPL is that it provides a licensing option to help provide free software on the web in a similar way the GPL effected desktop software. I don’t think many web developers have thought about web applications in the same light as desktop applications. Installation challenges, database dependencies and server requirements have all made running web applications something left for developers and advanced users. Fortunately, server software is becoming common place and programming languages like Ruby and Python are helping smooth over the issues with distribution and deployment. It is good to know that as web applications continue to evolve, free software will evolve with it.

Erik Wilde

AddThis Social Bookmark Button

The W3C just published a new TAG Finding called Associating Resources with Namespaces. Here’s the abstract:

This Finding addresses the question of how ancillary information (schemas, stylesheets, documentation, etc.) can be associated with a namespace.

I don’t quite understand why the TAG findings are hidden on some badly named Web page. Some of them are pretty interesting documents, and yet they are not published on the W3C Technical Reports page, and the W3C Home Page does not link to them or publish news snippets about new findings. I think these documents should be easier to find.

Technically speaking, the finding talks about how to create namespace description documents, so that namespace names can point to helpful resources, rather than being abstract identifiers. The TAG finding breifly describes possible languages for namespace description documents (RDDL 1.0 and 2.0 and GRDDL), and describes a vocabulary of terms for describing the nature of resources being linked to in a namespace description, and what the purposes of these resources are. The definitions of these terms, though, are one-liners with little guidance to what that concept is supposed to represent.

What I am missing most (and what we were concentrating on when we were defining our own format for namespace descriptions in an e-government scenario) is the ability to associate namespace descriptions themselves, and make assertions such namespace x depends on namespace y. Or rather simple but really helpful pieces of information (in particular for developers) such as namespace x is usually associated with one of these two namespace prefixes, here is where you can find test data, or here is where you can find some example data.

AddThis Social Bookmark Button

At the Semantic Technologies conference in San Jose I attended an interesting presentation entitled “persistent identifiers for the real web”. XML often uses URLs for identifying schema namespaces, and I suppose could be credited for influencing RDF’s practice of using URLs for identifying resources. In using RDF to describe and annotate things a problem arises…are you describing the web page, or the thing the web page is talking about. For example, if I assert that:

<http://tcowan.myopenid.com> :likes <http://www.myspace.com/lettucefunk>

Does that mean I like the web page or the band the page is about? As you’re traversing the semantic web it’s going to be advantageous to distinguish between content assets and the real world entities they may represent. Their proposed solution involves PURLs (http://purl.org for example). Normally a permanent URL redirects you to the best representation of the resource via a 302 response. They propose that when the PURL represents a real world entity that the response be given as a 303 (see also). The computer agent can then understand that the “thing” is a real world entity, and that the redirect is not to the real thing, but to another web resource about the thing.

I’m very much in favor of permanent URLs. Otherwise all our assertions will become disjointed as links break, or we’ll have to keep our own “archives” of dead links and sites. I also appreciate the simplicity of Dave and Eric’s proposal, however, I’m not so sure this is really the best way to solve identifiers for real world things. Consider books for example…what would be the best way to represent a book, it’s URL on Amazon or it’s ISBN number as a URN? If we use the Amazon URL we can’t be sure it’s a book, it might be binoculars or a coffee table. The URN however makes it clear:

URN:ISBN:0-395-36341-1

The urn namespace indicates that it’s a book, without a doubt. If PURL were to host a “see also” permanent URL scheme for each declared URN namespace we’d be able to visit that URL to find out more…

http://purl.org/urn/isbn/0-395-36341-1

But on the practical web, we don’t use PURLs or URNs for books, we use the Amazon.com url. I think in practical terms things are going to be represented on the web by the domain that has the best collection with the best open content. Perhaps the best approach in the end is to take advantage of blank nodes.

<http://tcowan.myopenid.com> :likes _:a
<http://www.myspace.com/lettucefunk> :describes _:a
_:a a :funkBand

In English, http://tcowan.myopenid.com likes the funk bank described by http://www.myspace.com/lettucefunk. Now we’ve made it clear, and without the use of PURLs or some new PURL redirection strategy.

Rick Jelliffe

AddThis Social Bookmark Button

First some jargon (from the Glossary of Typesetting Terms or Harrod’s Librarians’ Glossary full props to Google.) Castoff: The calculation the number of typeset pages a manuscript will make, based on a character count. Proof: An impression made from type before being finally prepared by printing. Proofs are made on long sheets of normal page width… Galley proof: Proof of text before it is made up into pages…just as long as can be conveniently photocopied – usually 13 inches. Compose: To set type-matter ready for printing.

Deciding on breaks

I am ancient enough to have used galley proofs, the long pages of text of books before it had finally been made up into the final pages and runoff on a printer (or rather, by a printery.) It still exists in the draft modes on some modern word processors, I suppose. There has always been a chicken and egg problem in documents which contain dynamic forward references that expand to section or page numbers (e.g. See page 99: how do you know how much space to reserve for the page number? A reference on a tightly-set line or full page may cause different page breaks if it is a two or three digit number, for example. A traditional way to deal with this was to allow a lot of space around page references (to reduce the impact) and to take two passes of the document, the first to estimate the pages and the space required for each reference, and the second to actually compose the document using the calculated space as fixed and squeezing the generated text if necessary.

The idea that you could divide the same text into different length pages is obvious, and quite early on even the electronic typesetting programs alllowed draft modes (or provided alternative macros) for producing proofs. The requirement of some publishers for double spaced manuscripts made the idea of separating structure and presentation, ideas ascribed to Charles Goldfarb and (independently) Brian Reid, does not seem a big leap to us nowadays. Multi-publishing and retargetting became commonplace in the SGML arena, with the advent of declarative stylesheets looming for a long while, but the next really big step was with the advent of the WWW and the impact of resizable windows on formatting.

One of the most important ideas following from the separation of presentation (into stylesheets) and content has been the formalization of the page-flow model (frames), which was championed by Frame Corporation’s FrameMaker though the simpler concept of regions was of course older. The idea is you “pour” the text into the frames and they flow, break and cause new pages where they will.

Loose

In my blog yesterday, I mentioned that the transformational approach of stylesheets in XML (the DSSSL, XSL-FO streams) is only loosely-coupled with the typesetting engine (or formatting engine…some people think that word processors don’t do typesetting, I don’t want to get hung up on terminology) so there are some kinds of page design rules that are impossible even if because the developers cannot be aware of every design rule anyone might want to make.

The separation also impacts another area: the area of document interoperability. I have written several blogs referring to Markup’s Dirty Little Secret, which is that because everyone’s system and each system’s algorithms and resources and capabilities are different, you cannot expect perfect fidelity to the extent of the same line and page breaks when exchanging XML+stylesheet documents (such as OOXML, ODF, DOCBOOK, you name them). This goes quite against the expectations of some users (though I think people are much more realistic about this now than two years ago) and quite against the hard requirements of others (for example, people who need fixed page numbering for legal requirements.)

In yesterday’s blog, Standardization as a collective loss of imagination? I suggested that users may need to assert themselves to prevent the standardization of the current round of office application formats from a particular pitfall of losing sight of the centrality of page (and document and information) design: how to help people communicate rather than how to add the latest pet feature from some vendor. Not that pets are not fun and valuable.

Hinting at our priorities

The tie in that suggestion and the page-fidelity problem (which is really an interoperability issue) is that I think we need some more imagination about whether our current re-pour-each-time model of formatting is actually good enough if we genuinely want substitutability of office applications. People don’t want to be sold a turkey.

Now SGML did provide processing instructions, a kind of markup that still exists in XML, for applications to add extra information that belonged to formatting for example. The ArborText Publisher program used them very successfully, with processing instructions that let you force page and line breaks in certain places, for example. That is one way Iof integrating page markup, but it is not what I am suggesting (for various reasons.)

At the moment, I think that a much better approach would be to add a kind of cast off hint as an attribute to each block-level object (paragraph, list item, table cell, etc). This would be added to the XML markup by the formatting engine as a hint, to enable a subsequent formatter to try to get the same results.

The first time data came into a document, the normal composition mechanisms would apply. But the document’s block structures would also be decorated by these hints at save time. And subsequent opens of the document would use these hints as well when composing the pages. For example the castoff hint might be as simple as
giving the bounding box of the block on the page. The composing system would used differences in these bounding boxes with the bounding boxes it wanted to use as penalties to adjust line feathering (or even margins, padding, breakpoints, spacing, text size.)

Auto-sizing is not completely unknown: WordPerfect had a patent on automated adjusting various page parameters to make sure some range of text fitted on a single page. And many people are aware of the behaviour of some page-oriented systems such as presentation programs to automatically resize text (including nested text lists) to fit into the available space.)

It could be user selectable whether to freeze the page according to the block hints or just use them as hints, or ignore them. As a hint, it wouldn’t interfere with minimal implementations.

Erik Wilde

AddThis Social Bookmark Button

have you ever heard of tree trauma, infoset ignorance, model myopia, or RDF rage? if not, and you are interested in these and other XML-related ailments, you might want to read about XML fevers:

The Extensible Markup Language (XML), which just celebrated its 10th birthday, is one of the big success stories of the Web. Apart from basic Web technologies (URIs, HTTP, and HTML) and the advanced scripting driving the Web 2.0 wave, XML is by far the most successful and ubiquitous Web technology. With great power, however, comes great responsibility, so while XML’s success is well earned as the first truly universal standard for structured data, it must now deal with numerous problems that have grown up around it. These are not entirely the fault of XML itself, but instead can be attributed to exaggerated claims and ideas of what XML is and what it can do.

if you are using XML or think about using XML or work with people who are using XML or think about working with people who are using XML, you might be interested in our XML Fever article in the current issue of the Communications of the ACM (CACM). here are your options:

the official citation for this article is Erik Wilde and Robert J. Glushko. XML Fever. Communications of the ACM, 51(7):40-46, July 2008.

AddThis Social Bookmark Button

One of the areas of web design that is often neglected is the accessibility of your content by impaired users. Because various technologies are used to aid those users who are impaired, you should make sure that your content is usable / readable if it’s ever read aloud.

The developers over of the BBC site Programmes have supported semantically marked up data ( in the form of Microformats ) from day one. Now comes word that because of certain decisions made during the design of hCalendar and its use of the abbr, they are removing hCalendar support from the Programmes web site. Other Microformats being used will remain ( rel & hCard ). However, developer Michael Smethurst has hinted that the Programmes team might migrate over to RDFa and remove all Microformats. This is the first instance that I have heard of where a team will be moving away from Microformats and possibly embracing RDFa.

I wonder if this will become more and more of a common occurrence. As companies begin to look at technologies to apply semantics to their data, I doubt that they will want to chose a technology that limits their audience.

Now, the Microformats community could change the hCalendar. However, I’m not sure I have enough faith in the Microformats community to come to an agreement on this topic. In my short time following the various Microformats mailing lists, I quickly became disillusioned with the community and administrators. I witnessed several instances of heavy handed administration, including the banning of users. Frequently, no real reason was given and I was left w/ the impression that it wasn’t much of a community after all.

I was an early fan of Microformats, but cases like this certainly make a compelling argument for the use of RDFa. Perhaps the most interesting quote from Michaels post was the fact that this decision was made by the developers themselves and not sent down via some edict:

And probably also best to note that this is not a decision that has come down from on high by the BBC equivalent of suits. The /programmes team has been concerned about this issue for a few months now and it’s good to get some clarity here.

Rick Jelliffe

AddThis Social Bookmark Button

Regularly as clockwork, every five years another group attempts to make a new standard language for typesetting. FOSI, DSSSL, XSL-FO, and ODF (plus the less grandiose scopes of CSS (styling) and OOXML (legacy).) I predict that in a decade we will see the same thing. In the past, these efforts came from the user side rather than the vendor side, and were driven by user requirements rather than vendor requirements. But requirements for standards now predominately come from questions about “Our product X supports feature Y and therefore the standard should support it” rather than “Our document A uses typesetting feature B therefore the standard should support it”: the cart is driving the horse. There is more vendor buy-in because the new standards demand and achieve so little.

In part it is understandable, the catch-up mentality does not necessarily encourage imagination.

Comparison Matrix

One very common tool for organized standards groups is a feature matrix: rather than just ad hoc consideration of this feature or that feature as proposed by vendors, the idea is to make a list of the general features required by the users document sets, or by the technologies being evaluated, or the products chosen to get first-class support. Traditionally, standards groups for typesetting and publishing have included actual typesetters (at ISO, Martin Bryan actually worked in type for example.)

A really good example of this can be seen in a document from a decade ago Final DSSSL Survey and
Assessment Report for the DOD CALS IDE Project
(Kidwell, Richman). This is a good introduction both to the Output Specification (FOSI) formatting language used by US military typesetting in the 1990s, and the ISO Document Style and Semantics Specification Language (DSSSL) which has been available standard on many Linux systems using James Clark’s open source JADE program.

The feature matrix can be found in Comparison Matrix which shows how well the standards support the document requirements: we learn that the US military requires both cartoons and running feet. This is the kind of table that I think should be driving requirements for ODF (and OOXML); preferable to the approach of feature- (or vendor- or product-) centric comparison matrix and much preferable to ad hoc feature requests.

FOSI and DSSSL

The US military adopted FOSI because it was under consideration by (what is now) ISO/IEC JTC1 SC34, however SC34 ultimately went with an extended version of Scheme under the (terrible) name of DSSSL; FOSI was deeply unlovable and never floundered outside its early adopters who were locked in; DSSSL was like the other power-user oriented standards from SC34 of the time and never found much commercial adoption by had uptake in the publishing industry that SC34 catered to. James Clark, the DSSSL editor, later merged it with CSS ideas and split it into XSLT and XSL-FO at W3C using an XML+XPath syntax rather than the S-expression syntax.

Where DSSSL and FOSI (MIL­PRF­28001 Output Specification) differ in particular was that DSSSL adopted a strict transformation approach: this is of course a UNIX-ism since the days of nroff, and the idea was that you could output to particular page description languages (RTF, MIF, etc.) Consequently there was no way for the DSSSL processor to make decisions based on typesetting metrics on the fly; instead the race was on for a set of abstract properties that could describe common cases. This fits in well with the checkbox mentality of desktop publishing tools, but was entirely counter to the typesetting-as-programming approach of the 1970s and 1980s generation of systems (systems such as troff and TeX used macro facilities so that creating a typesetting system for a document could involve all sorts of custom smarts to capture the design and fit in with the data; very high-end systems such as Interleaf even had full-blown LISP available for processing: some of these systems are still around with their niches: XYwrite and 3B2 for example, however they face a rising tide where quality and power is increasingly mysterious to the market.)

FOSI did allow or require some kind of interrogation of the pages while they were being typeset: while this can certainly allow much more expert typesetting and decision-making, it also must be tightly coupled to the formatting engine, which effectively prevents any network effects.

What do I mean by expert typesetting?

To give an idea of what I mean by expert (also known as “quality” or “industrial”) typesetting and decision-making, consider the case of typesetting a Yellow Pages (phone directory for businesses categorized by type of business.) Imagine you have to produce a Yellow Pages document using your favorite tool. The page designer and sales force come up with a design and timetable. The layout will be five columns. Entries may not span pages. Some entries take up part of a column and should be put as near to alphabetical order as possible, but rather than break they can be placed before or after their alphabetical position with previous or subsequent entries swapped before them. And there may be two, three, four or five column display adds, which also have this arrangement. And there can even be adds that take a half page but span over two pages.

And it is important that ads should not be orphaned or widowed, with one ad on a previous page by itself or on a subsequent page. And there are 6,000 pages of this. And you get the final data 24 hours before you have to deliver it.

Now how would you do that in ODF, or OOXML, or any of the standard declarative languages? You simply cannot: there is always an extra rule or concept that will not fit. (There are a few moderns systems that do allow this kind of flexibility: using JavaScript in Adobe’s In-Design for example. The program uses XPaths to locate information, but can also access the page model.)

Declarative abstractions are worthy replacements for programs and scripts but have different coverage

Now the history of (SGML and) XML is the effort to key presentation cues from structural information: the benefit of marking up “invisible” containers is that they are often not invisible. The current approach of both ODF and OOXML of allowing foreign container elements (in different syntaxes) but not providing facilities to format based on them, is the worst of all words: for QUASIWYG systems users will be loathe to do anything (well) which does not have a resulting visual/stylistic result in the on-screen draft. And (as was pioneered in pre-Adobe FrameMaker and taken up in CSS) the abstraction of frames (floating or relative, linked boundaries into which text can be flowed) also provides many hooks for making declarative properties that otherwise might require programming.

The way that standards for public declarative publishing formats (whether HTML or ODF) should go, in my opinion, is by progressively asking the question, how can we make it easier for users to do what they want to do? In the old days, this was easy: you had physical paper (from mechanical typesetting, for example) or device-independent page designs (the Yellow Pages for example) and you then programmed it by inserting commands in with the text. SGML and generalized markup came along and said describe the data in markup, then move the processing out of to a presentation system, except for Processing Instructions where you need specific overrides inline still. After this re-factoring came libraries where common code or functions were provided with the base system, and then consolidation where the code for the libraries was hidden from the user, and then exposure where only programming capabilities were removed and only the declarative portions left. RTF and MIF are examples, but so are OOXML and ODF.

At this point, users of transformation systems (such as XML with XSLT) have a lot of capabilities, even for overcoming the differences between the underlying typesetting engines of systems (see Different classes of typesetting engines and Markup’s Dirty Little Secret, but they have none for the kind of page-based calculations required by the Yellow Pages.

Now you could continue to make abstractions: nested keeps with partial float for re-ordering, for example. And in the past, there was a hope we might progress there, because the driving factor for markup languages and style languages was to cope with the kinds of designs which simple word processors failed at. But as I said, the cart seems to be driving the horse: I have no objection to document formats for existing and legacy applications (nor obviously to have them as voluntary standards, readers will no be surprised to read).

Universal pretensions without an assertively inclusive process merely disenfranchizes the weak and the foreign

However, and this was something that I saw as a flaw in the XML Schemas process, the more that you claim your format as a universal format, the more that you need to cope with cases that may be “niche” to vendors (i.e. that didn’t fit in with their development or profit model) but which are significant in their own right. When a technology, standard or not, mandated or not, does not provide a capability needed for a job, it will not (because it cannot) be used.

Lets take a concrete example. In about 1999 I spent a year looking at the various requirements for Chinese and XML, at Academia Sinica in Taiwan. As part of this, I looked at how Chinese actually did typesetting before the advent of computerization. I first made a (example below) of some interesting, but not at all atypical tables, some of which have visual structures that Japanese will recognize. (In effect, when you have the equivalent to very small word size, there are other graphical possibilities that don’t go well in Western text.)

t-b2.png

Then I made a suggested a possible structure that could be used to reconcile them. I was surprised at the reaction: Westerners universally made comments like “Oh, but those are *bad* tables and bad practise” and “They show confusion and unstructuredness”. Microsoft did add diagonal headers to Word 2000, but the SGML (and pre-SGML) idea that you should look at the artifacts and let design lead, rather than merely let vendor’s developers lead, had by the start of this decade died a rather sad death, it seemed to me.

Since then, the Chinese have gone their own way with a fork of ODF called UOF which features, as far as I can make out, Chinese element names (yah!) and extra markup for Chinese-specific requirements that other systems didn’t support. In April 2007, a request came in for ODFOpen Office to add it: Diagonal Header Specification. (which has a particularly wonderful and mad table example.) I don’t know what the status is at OASIS though, or if Sun has even passed it on: as I mentioned before, they are still discussing 2005 and 2006 user requests, which is what set my alarm bells off. (And in July 2007 Bert Bos raised the related issue of text rotation to dismiss it again for CSS at W3C: theoretically not all diagonal splits in tables require rotation or typesetting along the diagonal path, but the requirement for diagonal splits and for rotated headers spring from the same grapheme/glyph qualities of ideographic scripts.

Putting page design back at the centre

The only way to put the horse back in front of the cart is to put page design (in all its detailed aspects) at the centre of the process. Get stakeholders involved who are prepared to contribute (many will have them already) the kind of checklists that the Comparison Matrix has.

However, I fear that this may only push the issue back without changing it; if the external stakeholders themselves have their opinions formed by what commercial pre-fabbed system such as Office provides. In Different classes of typesetting engines I mention how the different implementation approaches lend themselves to different declarative properties. People realize that Office has pretty minimal keep-together control, but instead merely substitute some other products capabilities. We are quite lucky that countries like China are now getting fed up with the lack of imagination and responsiveness by Western developers and standards makers: it provides one of the few chinks in the protective armour by vendors that they only want change when driven by them.

So this has been a rather codgery item, are there any good signs? Well, I have praised Office’s Smart Art before and it is exactly an example of what goes on the page driving the technology: it is not the making format into as the driver but inventing a new class of page object (just as a table is a page object). Now Smart Art actually has crappy arbitrary structure, but the direction it can take is clear, and ODF could leapfrog it, if they could be bothered. There are hundreds of thousand of pages with simple diagrams, and once you decide to support what people actually do (and look at where people find things tedious) , that is putting the documents first.

So the only drivers I see for this, again, is for large user-side organizations to participate and dominate all the standards bodies, to work out their checklists, and force through the changes to ODF/OOXML/CSS/HTML that are required to conform to how people make documents when their focus is on good or natural page design (usability) rather than on incrementalism and conservativism.

Rick Jelliffe

AddThis Social Bookmark Button

I had an interesting discussion today with a key player in the development of a large, quite successful industry-specific standard by an industry consortium with representation from all the key stakeholders. I was surprised that he was less than sanguine about the standard: a common vocabulary was being used by multiple groups each making a schema for their particular sectoral use case, so it looked quite healthy.

But my contact had two particular gripes. The first was that the standardization process was addicted to making new vocabulary items, to the extent that talking about standardizing other things had never worked: the consortium was for making schemas not solving problems! In particular, while there was a lot of attention paid to describing what each field meant, there was no facility for comparison or identification: to say that “this address is that address” or “this person is that person” or “this agent is that agent” except by accidental string matching. So electronic forms using these schemas could be filled out, but data could never be integrated.

The second gripes comes out of the first. Because of the lack of ability to integrate and identify data, it all had to be kept together or messaged around in a bunch. So the schema for a complex process has to include fields for everything in that process except for trade secret fields, which wouldn’t be interchanged anyway: the consortium is made up from fierce competitors with a religious belief that their internal processes will be different from the internal processes of any other company in the same industry. Originally many participants would not even disclose the field names in their databases, they regarded them as so important—only to find that ther field for address was not so interestingly different from their competitor’s equivalent field.

So the result: kitchen sink standards that include so many optional or process-particular fields that the consortium is now having a problem that not enough vendors are able to implement the whole thing. However, underlying this is the problem that without even a simple process model, where each stage of the process could have a fat-trimmed or specific schema, one size has to fit all.

So my contact actually saw the standard as dying rather than thriving: the mania for new elements and structures bloating the project in the direction of unworkability coupled with a refusal to look at standardizing even basic process models or identity/tracking/aggregation capabilities.

Rick Jelliffe

AddThis Social Bookmark Button

The era of closed formats is dead is a friendly interview with South African standards activist Bob Jolliffe. I enjoy being in the same room as Bob, not least because for once some else gets their name constantly mispronounced: I think I counted three different mispronunciations from the same person in one day! I believe that both our names come from the Chaucerian English for jolly: fat and happy.

What I particularly like about Bob is that, if you read the interview, he is concerned with establishing requirements for interoperability and substitutability, and encouraging ODF, rather than slagging off MS or OOXML. I do tend to categorize people as “enablers” and “disablers” (not that these are permanent or unqualified vocations), and I certainly classify Bob as an enabler, even though we have different opinions on OOXML. But I don’t think we have particularly different opinions on ODF. Bob (who has been representing South African standards for the last couple of years at SC34) is now participating on OASIS ODF TC, and I think it is really important for government stakeholders to get intimately involved. I have repeatedly called for more government and stakeholder participation in standards groups, and I think Bob’s involvement should be a model for other governments who are wanting to make open standards mission critical.

It is clearly just the next step, that when a government starts adopting open standards, it also needs to develop expertise (Bob’s comment that there is an issue of scattered expertise is interesting), in particular in order to be able to make hard-nosed evaluations about the state of the art in implementation and on profiles.

I would have titled this “Bob Jolliffe gets it” (like Neelie Kroes gets it: “Standards are the foundation of interoperability.” and The Norwegians get it ) because I agree with pretty much everything in Bob’s answers (to the extent that I feel I could have written some of it!) but for the paragraph

One of the big dangers I see is the proliferation of backend office software which is so tightly coupled with single vendor’s office products. The promotion of open standards-based procurement of electronic document management systems is an urgent challenge.

Which goes further than I do, at the moment. I certainly agree that public government documents (in and out) should be in open standard formats, and that for that use OOXML is extraneous given the availability of ODF: however I think it would be better to think in terms of a hierarchy HTML, PDF, ODF with ODF as the last resort for publishing government material at least. (And I don’t see any harm in multiple formats being provided including the original native format of a file, for example OOXML or SVG, as long as the broad-reach standard was also available.)

However, for internal and specialist document systems, to the extent that I have a formed opinion it is that I suspect that functionality still has to trump standards support, until it can be proven that the standards meet the functional requirements. This is not to say that open systems will have to have a higher standard of scrutiny or QA than the old closed proprietary systems, but rather than functional-compliance requirements do not go away merely by deciding that you need standards-compliance, unless there is specific objective evidence that the one is fulfilled by the other.

(Oh, and I think Bob is technically wrong that IS29500 is not now an ISO standard. It has been approved by ballot so it is a standard; it’s publication has been delayed. The result of a successful appeal would be for it to be withdrawn as a standard (and still not published). )

UPDATE: Bob mentions the South African government’s Minimum Interoperability Standards (MIOS). It is available here (PDF) This extends the normal definition of open standard to include a requirement for multiple implementations: I think this is a mistake of naming (a standard is not open because of its implementions) but the correct requirement for procurement (a technology is open if it allows substitutions): what they should say is “open and mature” standards where multiple implementation is a property of maturity not openness. I think that is just fuzzy thinking that causes unnecessary squabbles: confusing issues doesn’t help thinking them through clearly.

I would severely criticize it for being entirely W3C Schema centred, including support for WSDL, and consequently a tool of one set of vendors. No explicit mention is made of ISO Schematron for example. How on earth does the requirement that XML Schemas should be used for data interoperability square with the fact the ODF has a RELAX NG schema not a W3C XML Schema? The trouble with making unnecessary restrictions is that then you have to turn a blind eye to wherever they are impractical, and turning a blind eye introduces an element of arbitrariness that goes against good government.

However, by the time you get to section 2.7, it turns out that RELAX NG is allowed. And it requires GML which has some Schematron schemas IIRC. Perhaps Schematron can creep in as a kind of XSLT? (Obviously because this is a minimal guideline, it is not exhaustive, so my criticism is unfair to that extent!)

I see that no versions of XSLT or XSD or XML (or most things) are mentioned: it would be interesting to have some idea about why versions don’t matter. And I see the list includes MPEG and ZIP. How do they fit into given the definition of openness? Anyway, these are all the practical issues that Bob will be grappling with.

M. David Peterson

AddThis Social Bookmark Button

I’ve known, loved, and respected David Carlisle for quite some time now. As of today, I’ve known him for 24 hours longer than I did yesterday, yet love and respect him twice as much as I did the day before (and that’s saying a *TON*),

Teaching XSLT vs. Teaching XQuery - O’Reilly XML Blog

If you know XQuery and want to learn XSLT you need to learn about template matching, if you know XSLT and want to learn XQuery, you just need to learn a new syntax.

*YES*! The world needs more straight shooters like David (who, like David, have the credentials to back up every word that comes from their general direction), don’t ya think? (NOTE: If you think differently, that’s just because you have no clue what you’re talking about. But that’s okay, I’ll still luv ya. Mmmwwahh! :D)

AddThis Social Bookmark Button

Is it easier to teach XSLT or XQuery to an experienced SQL developer? My recent training experiences indicates that XQuery is easier to learn.

For the last six years I have been building metadata management systems using a diverse set of XML-centric technologies. These languages include XML Schemas, XSLT, Schematron, XHTML, XForms and most recently XQuery. And to be honest, I really do enjoy XQuery. My job as a consultant is to develop feature-rich and highly customizable metadata management systems for my customers and also transfer the skills needed to maintain and extend these systems to my customers though formal training classes as well as one-on-one mentorship.

I have found that it has been very difficult to teach XSLT to an average support person that is only doing occasional XSLT development. But teaching XQuery has been much easier for me to teach when you consider that most of my students have had some exposure to SQL. Looking back on my own learning process, I recall took me about five months of almost continuous study to really feel comfortable with XSLT. Most of this learning curve was because I had not done production XSLT development. But I picked up XQuery in just a few weeks. Perhaps this is because I was already familiar with SQL and XPath. But perhaps this is because XQuery is a little bit more approachable.

I want to note that this does not necessarily imply a poor design of XSLT or the merits of functional programming. After I did learn XSLT I became a real evangelist of its elegance and beauty. At first I was frustrated by not being able to change a “variable”. Later I realized that this restriction is what made XSLT beautiful, simple and elegant. These features keep the transforms free of side-effects. Once XSLT scripts are deployed I seldom found problems. I became enthralled by the fact that the simplicity of the language implied that XSLT custom-hardware could allow transforms to be orders of magnitudes faster than software-only solutions. XSLT may always have a place in CPU-intensive applications.

The difference in my learning time and those of my students reflects the state of our existing knowledge base: most of are already familiar with SQL. Anyone that knows SQL can take a crash course in XQuery designed specifically for SQL developers. Priscilla Walmsley’s excellent book on XQuery (O’Reilly 2007) includes a single chapter targeted at SQL developers making the transition to XQuery that I use in many of my classes. And 90% of the small support and maintenance tasks that many support and maintenance people need to perform do not require them to ever use the more complex functions and modules features of XQuery.

So with this in mind, I have moved most of my metadata registry tools away from XSLT and toward XQuery. This seems consistent with other metadata managers are doing today. There are several people now working on open source metadata management systems.

What about you? Do you have experience teaching both XSLT and XQuery to SQL developers? What is your experience on the learning times?

Erik Wilde

AddThis Social Bookmark Button

Last week, the Open Archives Initiative (OAI) published a set of beta-stage recommendations for compound documents, called Object Reuse and Exchange (ORE). This set of specifications has been published as version 0.9 and has been released for public review and comments (ironically, the press release is a PDF blob).

The problem of compound documents (how to specify that a set of URI-identified resources together form one compound resource) has been around for a while, and never has been solved properly. There are various proposals from different application areas, such as XLink (not quite for compound documents, but it could be used for this purpose as well), METS (using and extending XLink), and DIDL. I am certainly missing some other technologies here, please let me know what they are. The problem is that none of these languages ever caught on, mostly because none of them tried to be general. XLink focused on navigation, METS on libraries, and DIDL on multimedia.

However, it would be good to have a general and simple language for compound documents. If designed well, it could even be easily extended to be used for application-specific scenarios such as those covered by XLink, METS, and DIDL.

The problem is, OAI-ORE will not be it. Instead of designing a simple data model and a simple language for it, they settled for RDF. None of the documents contains any explanation as to why RDF was chosen over a simpler XML-based model. There even is a document that talks about how to implement OAI-ORE in Atom, and all it does is showing how to embed RDF into Atom. Which means that for processing such an Atom feed you need an Atom toolkit as well as an RDF toolkit. As a side note: the terms in the Atom categories are URIs, which does not really follow Atom’s idea of terms as strings.

Generally, it is disappointing to see that a problem as important and manageable as compound documents, which still is an open problem looking for a good solution, has been approached on the wrong level. It is of course possible to come up with an RDF-based solution for that problem, but this unnecessarily introduces technology layers which for this particular problem are not required.

This means that the quest for a general and XML-based format for compound document descriptions is still on, and OAI-ORE is not a real contender in this race. Well, maybe it still could be one if the abstract data model also got a representation in plain XML. Unfortunately, the model is not as abstract as its name implies, it is a rather concrete definition of an RDF vocabulary, which will make it quite a bit harder to come up with a good and isomorphic XML representation. The effort might be worth it, however, the installed base of XML is significantly bigger than that of RDF.

Rick Jelliffe

AddThis Social Bookmark Button

European Commissioner for Competition Policy Neelie Kroes gave a really interesting talk this week at OpenForum Europe: sounds like a breakfast that would have been very stimulating.

There some very obvious tough talk directed at Microsoft, she is the person with the stick rather than the carrots after all, and most of the commentary I have seen have focussed on that. But there was a few other points that I found interesting with respect to comments I have been making.

Standards for market dominating technologies

Readers may remember that I have been pushing that All Interface Technologies by Market Dominators should be QA-ed, RAND-z Standards! By interface technologies I mean the boundary or exposed technologies: protocols, APIs, file formats.

Dr Kroes writes about so-called de facto standards:

First, the de facto standard could be subject to the same requirements as more formal standards:

* ensuring the disclosure of necessary information allowing interoperability with the standard;

* ensuring that other market participants get some assurance that the information is complete and accurate, and providing them with some means of redress if it is not;

* ensuring that the rates charged for such information are fair, and are based on the inherent value of the interoperability information (rather than the information’s value as a gatekeeper).

The process of subjecting a standard to the same requirements as a formal standard is called, err, standardization.

Note, I strictly use “standard” in the sense of the offered voluntary standard: standardization means being documented, QA-ed, RAND-z, etc and on the books, it certainly does not mean (in my usage) that it is mandated for use (from the demand side of the standards market). If I can fend off some flames before they arrive, at ISO/IEC JTC1 there are types of lesser standards, such as Technical Reports, that may have less scary implications for panic-ridden and be certainly more appropriate that full standards in some cases: I include these as “standards”.

So I don’t see any difference in what Dr Kroes has suggested and my comment; indeed I think it is a very welcome and logical step forward. Indeed, she mentions it in the context of what competition authorities may be obliged to do!

When a market develops in such a way that a particular proprietary technology becomes a de facto standard, then the owner of that technology may have such power over the market that it can lock-in its customers and exclude its competitors.

Where a technology owner exploits that power, then a competition authority or a regulator may need to intervene. It is far from an ideal situation, but that it is less than ideal does not absolve a competition authority of its obligations to protect the competitive process and consumers.

Dr Kroes does however earlier use “standardization” in a loose way, though I don’t imagine it would cause anyone to choke on their croissants: while I agree with It is simplistic to assume that because standardisation sometimes brings benefits, more standardisation will bring more benefits. on the vaccuuous lines that too much of anything is bad, the two different meanings of “standardization” should not be lumped together: standardization in the sense of “putting a technology on the books ready for voluntary use or voluntary disdain” then I don’t see that we are anywhere near the point of having too many standards nor that they are complete enough or updated enough (and I think Dr Kroes may not mean this, given the comments quoted above). However standardization in the sense of adopting or mandating a standard is an entirely different question, and I certain agree with her for that meaning.

In case people were wondering about MS increasing embrace of ODF, the writing is on wall. Dr Kroes says:

In addition, where equivalent open standards exist, we could also consider requiring the dominant company to support those too.

I certainly support that: see The Norwegians get it!

Cartels

Sometimes I feel like I am the only voice, peeping out “cartelization is a dominating regulatory issue” for standards bodies. Standards organizations have little and perhaps no obligations (or, at least, capability) to redress monopoly positions of technologies in a market, and indeed as the previous section mentions, standardization (if RAND-z and proper) actually can actually ameliorate monopoly positions (and they may have a duty to assist in making voluntary standards for that technology); however standards bodies must be careful not to operate as cartels of any kind.

Dr Kroes mentions cartels early: Her opening sentence.

Credible competition policy requires competition law enforcement. Cartel cases, merger cases, abuse of dominance cases.

and cuts to the chase later:

…standardisation agreements should be based on the merits of the technologies involved. Allowing companies to sit around a table and agree technical developments for their industry is not something that the competition rules would usually allow. So when it is allowed we have to look carefully at how it is done.

If voting in the standard-setting context is influenced less by the technical merits of the technology but rather by side agreements, inducements, package deals, reciprocal agreements, or commercial pressure … then these risk falling foul of the competition rules.

Now this brings up an interesting question. I raised the issue of cartelization, in particular the aspect of vendor collusion of a majority against their dominant competitor in Is our idea of open standards good enough?

The question may seem provocative to even ask, but sooner or later it must be asked. Are standards made by organizations where vendor stakeholders can and do outnumber non-corporate stakeholders acceptable or sound?

We can take OASIS, ECMA, W3C or any of the boutique consortia that allow corporate members (or their individual proxies.) Why should we believe that standard is sound enough to mandate merely on the absence of discovered side agreements, inducements, etc, if it has been made by a committee dominated by vendors (at the quorum level of real participation)?

It seems to me that only the various international standards bodies, which have direct voting by National Bodies not individual stakeholders in particular vendors, provides the workable immunity from direct control by vendors (singly or in collision) that needs to be required for mandatory standards. It can certainly be argued that the boutique consortia may have standards approved ultimately by a larger member vote than the working group that created the standard, and that the membership was not dominated by vendors; but that is something that requires certification or monitoring—with ISO it is manifestly the default case because of National Body voting.

So the National Body system prevents “cartelization-in-the-large”, where the final votes have a good measure of independence. However, no system I have seen completely prevents “cartelization-in-the-small”: this is where the small working groups that prepare the drafts initially have vendor domination. Again, it is not always the case: but look at the composition of the ODF TC at OASIS and the OOXML ECMA TC45 over the last two years and you can catch my drift.

Furthermore, in practice not all members are equal: government members of committees are very likely to be there to advance a particular government agenda (accessibility, say) rather than as providers of alternative technical solutions than the vendors come up with: a working group may have effective vendor domination at the technology selection level even though the vendors do not control of the requirements.

There are some other possible approaches too. For example, some standards bodies allocate chairs in working groups by a fixed number of representatives per sector: some academics, some government, some industry, which has some merit.

All this is why I wrote

But the issue of public and archival formats for government and agency documents is clearly one where governments have a vital interest: the customer is always right. This is why I believe governments need to look beyond the current academic definitions of “open standards” and re-frame the issue as “How do we achieve verifiably vendor-neutral standards?”

Maintenance

There is one part that where some implications need to be thought through a little more, perhaps. In the sentence after the When a market section quoted above, Dr Kroes says

In essence the competition authority has to recreate the conditions of competition that would have emerged from a properly carried out standardisation process.

Dr Kroes uses process but means a terminating process, I think. But standardization of a technology is a continuing process, not a one-off event: standards have lifecycles, and waving a magic wand of standardization on a market dominating technology to give it some number or status will do little to help it unless there is an ongoing process of development, correction, evolution, convergence, and so on.

And an ongoing process requires an organization. A standards organization. So when the competition authority “recreates” the conditions of competition that emerged from a properly carried out standarization process (she says this in the context of de facto standards that have had no official process, by the way) this must ultimately involve passing the maintenance on to a standards body and verifying it where there is some concern. (There is certainly scope for Competition Commission action here: if governments and user groups and academia do not participate in standards bodies, say out of some mix of sloth, underinvestment, underskilling, and lack of vision (rather than just because of being poor) it would be great if the Competition Commission could compel or encourage at least matching participation by non-vendors in standards groups of interest. But that is just pure fancy, I know!)

And, of course, this maintenance has to be done with some openness. And openness means not only openness to the needs of stakeholders, but a responsiveness to outside requests. A prioritization of vendor requirements for new features over external user requests for corrections should be taken as ipso facto evidence of vendor domination of the standards group, and/or a failure in openness. Andy Updegrove has recently been talking up the need for metrics for judging the effective operation of standards bodies, a good idea, and metrics for openness and lack of vendor domination in quorums should certainly be one objective measure of this. Despite how it sounds, actually people in almost every standards body are keen for more participation.

Rick Jelliffe

AddThis Social Bookmark Button

There is a new avenue for participation in the ODF effort at OASIS: ODF Implementation, interoperability and conformance which I commend.

Conventionally, people speak of syntactical conformance and semantic conformance, where the first is easy and the second is hard. In fact, because computers can only deal in symbols, the second is impossible. So the issue for automated conformance testing becomes “how can we reflect the semantic operations into syntactical artifacts: into symbols we can investigate.”

So the semantic conformance problem then resolves into just another validation issue. And we have lots of nice schema languages notably Schematron which can help out there. (And using general purpose languages at a pinch, no worries!)

To put it another way, it is an issue of data capture.

For ODF, I would recommend they adopt a strategy of progressive but complete verification.

For ODF import and export, this is easy: have a good RELAX NG schema (make it quite forgiving), use NVDL and DSRL if needed, then use Schematron phases to allow various levels of validity to be detected. The trouble with the monolithic valid/invalid distinction is that there may easily be invalidities in thing you don’t care about. An implementation of a word processor may have problems in its support for spreadsheets, but it should be a minor issue not a flagged as a showstopper. Schematron’s phase mechanism groups patterns of assertions so that you can have a much more useful chunked view of the strengths and weaknesses of a system.

But this leaves the issue of screen display. How can that be tested? Given my characterization of the issue as being one of data capture, the answer is that ODF needs to specify a page dump format, which can then be tested with automated tests. What would this format look like? Think PDF in XML: tiny-SVG may be good enough—anything where you can get the page position of each character (or string) and graphic on a page.

For example, let us suppose we want to test a table implementation. Now we can use RELAX NG to say that there should be tables, rows, cells etc. And we can use Schematron to say that various numeric constraints should hold. And that gets us a long way into validating that good ODF is being generated and accepted. And we can have tests for whether bad ODF is accepted, and so on.

But what about the graphical component? Having a simple page object dump allows testing that, for example, if you have a string a and a string b in two adjacent cells of the same row in a table (in the same script and of the same metrics, etc), then the (X,Y) co-ordinates of their base points conform to (Xa < Xb) and (Ya ~= Yb)

And you can use Schematron for that kind of validation. The advantage of having this built into the spec is that then the ODF spec can use mathematic properties and constraints rather than just natural language. The disadvantage of this approach is that it imposes a burden on the implementer, in particular if the graphic library cannot be trapped conveniently to provide the information; however, it certainly should be possible to generate this information from the PDF (in a reverse of the Magellan software!) especially if using a nice PDF subset like PDF/A.

Rick Jelliffe

AddThis Social Bookmark Button

I would like to propose a new test which you can use to see whether your favoured spout* of technical information is biased (or possibly just a re-printer of press releases, if there is a difference) or not. Here it is:

  1. They reported that the UK Unix Users group had take the British Standards Institute to the UK High Court, and
  2. They didn’t report in the same detail the outcome: that the High Court utterly rejected it.

Surprisingly, the Inquirer gets the guernsey here, in the marvelously titled UK unix beardies appeal for $cash. No sign of it so far on CNET, ComputerWorld, ConsortiumInfo, Slashdot (references welcome). (Groklaw perhaps did not have space for this, given that it has two interesting posts in its news about IBMs RoadRunner supercomputer which is “to ensure the safety and reliability of the nation’s nuclear weapons stockpile.” Terrific boxes! Perhaps the High Court needs to put out its findings disguised as product press releases in order to get into independent media?)

Quoting from the Inquirer:

Mr Justice Lloyd Jones rejected the UKUUG’s application for a judicial review last Thursday, giving the group until the break of dawn this Friday to raise a legal fund for an appeal.

“This application does not disclose any arguable breach of the procedures of BSI or of rules of procedural fairness,” said Justice Jones on Thursday.

“In any event, the application is academic in light of the adoption of the new standard by ISO,” he added.

For terminology. In JTC1, the terminology is that a standard is accepted by a ballot and consequently published. This general process is called adoption. So IS29500 has been accepted as an ISO standard, but not yet published. The UKUUG’s reported comment that

OOXML had not been ratified as a standard, it had merely been put on the fast track to certification.

is mumbo-jumbo.

AddThis Social Bookmark Button

Have you ever wondered if the laws of evolution apply to computer languages? When you walk down the isle at your favorite bookstore, does it seam like there are actually more computer languages than last year? What forces are driving each of these new languages to evolve?

In 1835 Charles Darwin visited the Galapagos Islands. There he collected what he thought were about a dozen distinct species of birds. Upon returning to England he discovered that each of these species had evolved from a single species of finches. On the various Galapagos Islands the requirements for food gathering was different, but consistent over hundreds of thousands of years. Enough time for a single species to adapt to meet consistent requirements.

Consider the Raccoon: omnivores that have proved to be one of the most adaptable mammals on Earth. The Raccoon’s range has rapidly expanded into urban areas due to their ability to quickly adapt to new requirements before other animals have had time for the wheels of evolution to turn.

So goes it with computer languages. Some procedural languages can be quickly adapted to fill in the needs for a new niche. When the web was young, procedural languages like Java and JavaScript quickly filled in the need for a variety of tasks. As the requirements for building web applications stabilized, declarative systems like CSS, XForms and XQuery started to push procedural languages back into niche-areas. As these declarative languages stabilize and become worldwide standards, graphical tools are being created to allow non-programmers to create, manipulate and extend these systems.

This is why many of us believe their will always be some need for procedural programming, but certainly not for building standard web applications that are controlled by style sheets and user interaction forms. Like the finch, declarative languages need a little longer to evolve. It sometimes takes years for a small vocabulary of functional specification patterns to emerge and be given labels. Additionally, it can takes years for the standards bodies to agree on the best way to deliver these new languages in a set of semantically precise data elements that have unambiguous interpretations. Finally, it may take another few years for IT managers to realized that they really do lower costs if they avoid vendor-specific implementations and adopt worldwide standards.

When CSS first came out you may have been a little reluctant to let web designers play with a rules engine. As XForms becomes ubiquitous you may be resisting change because you have invested so much time and energy learning how to debug JavaScript (without a debugger). You can not hold back the forces of evolution…and now we all need to adapt to the declarative world or risk our own extinction.

If you are interested in more on this topic see my Presentation from the 2007 Semantic Technology Conference The Semantics of Declarative Systems

Philip Fennell

AddThis Social Bookmark Button

My previous post ‘XSLT and Binary File Formats‘, brought-up the subject of the sequence in XSLT 2.0 and how it can be used to build a byte sequence for a binary file format like a TIFF image. For the XSLT generation of new binary files to be even remotely useful, you would need something that requires transformation into binary data and a way to transform it.

In the world of 3D computer graphics Pixar are the ‘King of the hill’ and their Reyes Image Rendering Architecture defines a very powerful image processing pipeline that is used for the transformation of complex graphics primitives into smaller, simpler primitives that are easier to sample and rasterize. The keyword in the last sentence was transformation, and XSLT is very good at transforming hierarchical data structures like computer graphics models.

To simplify the implementation of a Reyes pipeline processor in XSLT, it makes scene to start with just two dimensions and use SVG as the source model, while the final output format can be TIFF (see previous post). The following example shows an enhanced Reyes pipeline, expressed in XML, that makes use of their bucketing technique to allow for more efficient sorting and sampling strategies. The XSLT transform consumes the pipeline definition in order to control the processing of the source model.

Rick Jelliffe

AddThis Social Bookmark Button

ISO Namespace Validation Dispatching Language (NVDL) is a little language for taking an XML documents, sectioning it off into single namespace sections, attaching or detatching these sections in various ways, and then sending the resulting sections to the appropriate validation scripts.

NVDL solves several problems that come up with namespaces, and as with DSRL takes a very different approach than XSD takes (not saying one is better or worse: they have different capabilities and therefore may even be used together). One of these problems is the problem that often the official schema has a wildcard to say “at this point you can put any element”, but you really want to limit this to your own elements only and you don’t want to edit the official schemas (and thereby create versioning and configuration issues).

Another of these issues can be found in ODF. It allows foreign elements anywhere, and in order to validate against the schemas you have to strip these out. However, this does not mean just remove the foreign element and their children, you have to leave the non-foreign descendents in place.

Now this is something that W3C XSD cannot really handle well. You can have a wildcard to allow foreign elements, and process them laxly so that when you come to an ODF namespace you start validating, but you don’t have the capability of validating that these elements are correct against the content model you want on the parent of the wildcard. You lose synch.

Here is the section of ODF 1.1 clause 1.5 which gives the constraint:

Documents that conform to the OpenDocument specification may contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within this specification and are called foreign elements and attributes.

Conforming applications either shall read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed, or shall write documents that are valid against the OpenDocument schema if all foreign elements are removed before validation takes place.

Hmmm, seems like a job for NVDL.

Here is a rough NVDL script to do this. (It is untested, but thanks to members of the DSDL maillist for vetting it.)

This script just takes the contents.xml file and removes all elements from a foreign namespace. It uses wildcards a bit. Then it sends the result to be validated using the schema. Note that this is a very coarse sieve: there is no need to get too smart with which namespaces are actually allowed under the main office namespace, because validation will handle that. The purpose of the script is to minimally preprocess the file so that the right elements get dispatched to the appropriate validator.

<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" startMode="root">

	<mode name="root">

		<!-- Validation for content.xml -->
		<namespace ns="urn:oasis:names:tc:opendocument:xmlns:office:1.0">
			<validate schema="super-odf.rng"
				useMode="odf"/>
		</namespace>

	</mode>

	<mode name="odf">

		<namespace ns="urn:oasis:names:*">
			<attach/>
		</namespace>

		<namespace ns="http://purl.org/*">
			<attach/>
		</namespace>

		<namespace ns="http://www.w3.org/*">
			<attach/>
		</namespace>

		<anyNamespace>
			<unwrap/>
		</anyNamespace> 

	</mode>

</rules>

So there you have it: a nice declarative way to specify the validation pre-processing which can be actually run with the various NVDL processors around the place.

Now we could duplicate this script to handle the other XML files in an ODF ZIP archive: to say that stylesheets files should start with the appropriate namespaces etc. (I think it would be possible to combine them all into one file, actually, so that different root namespaces would cause the stripped document to be dispatched to be validated by different schemas as appropriate.)
Now

Rick Jelliffe

AddThis Social Bookmark Button

ISO Document Schema Renaming Language (DSRL) is one of Martin Bryan’s contributions to the ISO Document Schema Description Languages project at JTC1 SC34 WG1. This brings together various technologies by Murata Makoto, James Clark, Martin Duerst, Jenni Tennison, and others (including me) to try to build a layered solution to validation using a variety of “little languages“.

I don’t need to go into the advantages of little languages, though I will say that I think that one major concern is that large languages disenfranchise the solo and part-time developer—this is perhaps no concern if you are a large corporation (though it will become so as the maintenance crunch sets in) but it is a definite issue otherwise. Of course, there are disadvantages too: we might hope that the little language would be easier to reason about than a large language, but little language may concentrate on depth rather than breadth, and this extra bang-per-buck can add to the complexity of understanding every case. Furthermore, the little languages still need to be combined, and this has its own perils. But admitting these possibilities does not diminish the usefulness of the approach.

A common issue with standards is how to cope with changes from the pre-standard technology to the standard one. Schematron was a typical case: in moving from Schematron 1.6 to ISO Schematron involved:

  • Swapping to a new namespace
  • In the pattern element, replacingt he attribute called name to id with a title subelement element
  • Removing the sch:key element but recommending xsl:key instead.

All these changes are cosmetic as far as functionality is concerned, but prevent a Schematron 1.n schema being a valid ISO Schematron schema.

This kind of renaming problem is not just reserved for the initial step of making a standard. During the life of a schema, different values and names may come into fashion. Sometimes people decide to take a broom through a schema to consolidate names and allowed values.

And this is where DSRL (pronounced DISRULE as in being against a central authority) comes in. It is a simple declarative language that basically maps between from names and values to to names and values. You can make maps for namespaces, element names, attribute names, PI targets, element values and attribute values (including token lists). Most topically now, in relation to recent ODF discussions, you can also declare maps for the default values for attributes and elements: in fact, it is now looking like the ODF facilities for DTD-compatible attribute value default declarations are fraught with complexity and ugliness such that they should be avoided. One really interesting, but problematic, feature is the ability to provide declarations for undeclared entity references in the document (a feature often requested by the publishing industry) and the ability to rename entity references (which may be quite useful now that SC34 has given the ISO standard entity sets for special characters to the W3C MathML group to maintain: they have a high premium on HTML compatibility even when wrong.)

DSRL is now at a very late draft stage, and I expect it will be finalized over this year. DSRL is declarative: it provides mappings, and even though it could be used to rename items in schemas, Martin Bryan’s open source XSLT implementation of it takes the more direct route of renaming the document. The implementation is available in the ZIP file at the DSDL.ORG site.

For a flavour, here are the renaming rules as given above for the changes from Schematron 1.n to ISO Schematron.

<dsrl:maps
     xmlns:dsrl="http://purl.oclc.org/dsdl/dsrl"
     xmlns:sch="http://www.ascc.net/xml/schematron"
     xmlns:iso="http://purl.oclc.org/dsdl/schematron"
     xmlns:xslt="http://www.w3.org/1999/XSL/Transform"  >

  <dsrl:element-map>
     <dsrl:from>sch:schema</dsrl> <dsrl:to>iso:schema</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:title</dsrl> <dsrl:to>iso:title</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:phase</dsrl> <dsrl:to>iso:phase</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:active</dsrl> <dsrl:to>iso:active</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:pattern</dsrl> <dsrl:to>iso:pattern</dsrl>
     <dsrl:attribute-map> <dsrl:name>name</dsrl:name></dsrl:attribute-map>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:rule</dsrl> <dsrl:to>iso:rule</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:extends</dsrl> <dsrl:to>iso:extends</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:assert</dsrl> <dsrl:to>iso:assert</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:report</dsrl> <dsrl:to>iso:report</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:diagnostics</dsrl> <dsrl:to>iso:diagnostics</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:diagnostic</dsrl> <dsrl:to>iso:diagnostic</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:let</dsrl> <dsrl:to>iso:let</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:p</dsrl> <dsrl:to>iso:p</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:span</dsrl> <dsrl:to>iso:span</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:value-of</dsrl> <dsrl:to>iso:value-of</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:name</dsrl> <dsrl:to>iso:name</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:dir</dsrl> <dsrl:to>iso:dir</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:emph</dsrl> <dsrl:to>iso:emph</dsrl>
  </dsrl:element-map>
 <dsrl:element-map>
     <dsrl:from>sch:key</dsrl> <dsrl:to>xsl:key</dsrl>
  </dsrl:element-map>
</dsrl:maps>

What does it do? Replacing a namespace is quite rare, so the declaration is not as simple as could be conceived: you rename each element explicitly. The last entry handles the special case of sch:key.

The sch:pattern element has an attribute name which ISO Schematron regularized to be a title element, but there is no way to declare this in DSRL: it is not a general purpose transformation language like XSLT (but it can be translated into XSLT, as in Martin’s implementation which follows the Schematron pattern) and in fact is just as convenient in reverse (mapping from new schema documents back to old names) or renaming schemas rather than documents with a suitable implementation: it specifies the mapping not the transformation in a sense. So the best we can do is just to strip that attribute out: it is not required for validation.

I think important aspect of DSRL is that it shows that the SC34 WG1 is asking fundamentally different questions than the W3C XML Schemas WG, which is not to say that one is necessarily asking better questions at all! In XSD you have various facilities like import, redefine, equivalence groups, type derivation by restriction and extension, but there is no systematic facility to allow name and value mapping: to say “What I used to call xxx:yyy I am now calling aaa:bbb!” XSD is not interested in PIs or entities, of course.

So where is WG1 going with all this? The DSDL project is taking time: there have been no shortage of distractions. It has no support from large companies, much as we would welcome this, no publicity or marketing budget, and has to stand or fall squarely on its technical merits, in the context of a market which would really prefer if there were some way to shoehorn XSD into doing this. Now, of course, in a rational world the large corporate (open and closed source) developers would see DSRL as a simple pre-processor to XSD that can help many migration and maintenance issues: as an adjunct. But we are not holding our breaths!

But my vision is that in the near term, with DSRL completing the base DSDL quartet of RELAX NG, NVRL, DSRL and Schematron, that standards developers will start to take them on board as a package:

  • ISO NVDL selecting the particular schemas for different namespaces and culling foreign elements as desired
  • ISO DSRL renaming, localization and providing default values to handle common evolution cases
  • ISO RELAX NG performing grammar-based validation, extended with its XSD data types
  • ISO Schematron performing more complex and detailed validation

A couple of years ago we finally arrived at the point where people had come to pretty realistic apprehensions about the proper limits of XSD functionality, and I think we are now arriving at the same kind of level of maturity with RELAX NG. As these limits become commonplace, I think the need for NVDL and DSRL (for XSD and for RELAX NG) will similarly become more well-know.

My prediction is that it will increasingly occur to community standards bodies that their standards have quite a number of constraints or gotchas which are poorly expressed in English but much clearer (and machine verifiable) when expressed using DSRL (and NVDL and Schematron.)

M. David Peterson

AddThis Social Bookmark Button

It’s amazing to me how seemingly complex programming techniques are really not that complex at all if you break them down into their basic Unix CLI equivalence.

Take, for example, Polyphonic C# where they introduce the concept of the asynchronous join pattern,

Continuations, futures, and whatnot - Thoughts on some asynchronous patterns - B# .NET Blog

Join patterns

If you’ve had a rendez-vous with Ada in a previous life, you’ll be familiar with the concept of join patterns (or shorthand joins). Our Microsoft Research department has done quite some work on this field already, as you can read here: http://research.microsoft.com/~crusso/joins/. It’s part of Polyphonic C#, which is by itself part of Cw and based on join calculus. We already got the LINQ inspiration from it, joins might be lurking around the corner too. There actually just two basic concepts to understand here:

* Asynchronous methods: imagine a keyword async as a modifier on methods (which would imply the method is a procedure, hence the void return type becoming redundant for those type of methods). No longer do you need to create separate versions of a method with or without the Async suffix to work around return type overloading limitations imposed by the runtime. You can think of the method as a wrapper around a task scheduling call, wrapping the entire body. Whether or not this spawns a new thread is another matter.
* Join patterns: also known as chords, is a declarative way of a WaitAll style of guard mechanism to enter a method.

Our monk would look like:


    class Monk
    {
         public async RequestCard(string to) { ... }
         public string GetCard() & public async RequestCard(string to) { ... }
    }

Basically requests for cards queue up on the monk’s GetCard operation, ready to get processed when the monk is awake. Actually this sample is convoluted and our monk suffers from some concurrency diseases (think of the cards it would hand out if there are multiple outstanding requests - any ordering guarantees without “ticketing”?) but I won’t get in details for now - it’s just (yet) another way of thinking about asynchronous processing with today’s materials in the room. If you’re inspired by join patterns, check out the Polyphonic C# paper.

Sounds interesting, and extremely cool. And I can’t wait to start playing with this feature if and when it becomes a part of the core C# language specification. But I’m not sure it’s something we haven’t been able to do for *years* already In the mean time, for those that want to begin sinking their teeth into async/concurrent programming techniques, it seems to me this same general idea can be simulated using nothing more than nohup and Unix pipes. For example, take the following function which just does some random piping operation enclosed in a loop, taking an argument passed to it as an identifier as to which operation is currently taking place, prepending a formatted date in front of it, and then piping the result to sed to perform a search and replace operation on the current day of the week,

Chris Wallace

AddThis Social Bookmark Button

“One thing leads to another” might be the sub-title for the web. Last night I found myself by some circuitous route in LiteratePrograms, a wiki set up by Derrick Coetzee. The site incorporates a version of that earlier WEB. Donald Knuth’s tangle and weave programs allow a single literate program script to be transformed to a view which make sense to the compiler and another which makes sense to a human reader.

The wiki is a laudable effort to provide code examples, clearly explained in many languages but it is fighting a bad case of spam. Comparative examples is a valuable educational resource to show how to abstract away the specifics of a syntax to see through to the essential similarities and differences. No examples in XQuery and only a couple in XSLT however, so here’s another opportunity for an XQuery evangelist.

Fibonacci is the computer science ‘hello world’. It is well represented in the wiki in numerous languages and many algorithms. The lack of a higher order factoring of the many algorithms away from the code needs tackling. XQuery with its limited functional model, lacking such devices as higher order functions, lazy evaluation, closures and generators, restricts the algorithms to pure recursive functions. When coming to XQuery from a background in imperative, object-oriented languages, the loss of updating variables is quite a challenge. .Good old Fib is not a bad example to start with.

Rick Jelliffe

AddThis Social Bookmark Button

The standard for OOXML, IS29500, was approved a few months back with 75% of votes in favour, 14% against (for a breakdown, see this chart.) Some of the National Bodies who voted no are very determined in their opposition: South Africa, Brazil and India have lodged appeals, as allowed under the JTC1 process. [JTC1 = Joint Technical Committee 1 (Information Technology) of ISO and IEC.]

The details of the Indian appeal is uncertain, but the South African and Brazilian appeals are available online.

It will be quite interesting to see how this goes. An IEC spokesman has been quoted

This is the first such appeal after a BRM process in ISO/IEC JTC 1, although appeals occur regularly in other technical committees

How is it likely to go? I don’t want to go into each objection systematically, but I’d like to suggest some things I expect may frame the appeal resolution.

Cassandra with a Cigar

Before I start, a quick comment on politics. You would expect that appeals will be taken quite seriously by JTC1.

This has been a very contentious standard, everyone is aware of the politics, slurs and noise, high emotions and the genuine problems. Certainly, it has long been the expectation that there would be dissatisfaction no matter what the result: indeed, BRM convenor Alex Brown predicted it:

Be forewarned: the ISO process will “fail”

Speaking to fellow NB representatives, it is clear that lobbying (in many directions) is intensifying as the BRM nears. It would be naïve to expect anything else, I suppose. It amuses me the degree of self-certainty both sides have (coupled with very high levels of mistrust). One corollary of this is that they both profess that the only thing that can undo them is a “failure of process”. Tweedledum believes their DIS is so good that only a “failure of process” can thwart it; Tweedledee, however, is convinced that the DIS is so deeply flawed that only a “failure of process” could allow it to become standard..

Note that Dr Brown’s point is not that the process would necessarily be perfect, nor that NBs would not in their rights to make appeals, more that intemperate outbursts would not be tolerated during the BRM—the conduct of the BRM was the topic of that blog entry— but were on the cards for after.

And I personally welcome the appeals, because they are a good chance to get more clarity on issues. But to paraphrase Dr Brown, Be forewarned: the appeal process will “fail”; these are issues where there are quite strong views about how the world and institutions should operate which underlie, and to some extent select, the particular issues.

That there will be grandstanding later does not mean that the appeals themselves are mere grandstanding.

Keep them talking

Next, I’d like to draw a couple of lines in the sand, for setting expectations.

First, if an appeal concerns something that is specifically allowed or disallowed by the JTC1 Directives, it will not succeed. For example, the JTC1 Directives clause 13.1 says

The criteria for proposing an existing standard for the fast-track procedure is a matter for each proposer to decide.

Given that, I cannot see any objection of the form “The fast-track procedure is designed for blah blah and this goes against it” getting anywhere.

Second, to see the JTC1 process as an adversarial process where things can be won on a technicality is to fundamentally mistake its nature. It is not a court of law, it is a forum for formalized discussions aimed at agreement on the text and contents of documents. (Furthermore, these documents are not laws or legislations, they are voluntary standards.) In clause 1.2 of the JTC1 Directives it says:

These Directives shall be complied with in all respects and no deviations can be made without consent of the Secretaries-General.

These Directives are inspired by the principle that the objective in the development of International Standards should be the achievement of consensus between those concerned rather than a decision based on counting votes.

[Note: Consensus is defined as a general agreement, characterized by the absence of sustained opposition to substantial issues by any important part of the concerned interests and by a process that involves seeking to take into account the views of all parties concerned and to reconcile any conflicting arguments. Consensus need not imply unanimity. …]

The interesting parts of that are first that variations are possible with the consent of the Secretaries-General; this is the kind of provision you would expect from a mature standards organization (created about 60 years ago) that has had to weather challenging standards before: a chain of authority but a process to prevent the goals of the organization being thwarted by technicalities. Second, that sustained opposition is a serious thing. And third that the response to conflicting arguments is supposed to be processes of reconciliation: you have to leave the win/lose mindset at the door and think of win/win (where no party in win/win necessarily gets everything they want in the form and timetable that they want, but they can get something.)

So from these, you would expect that even where a National Body had a legitimate grievance that it had been discriminated against during some committee work in this case the BRM, the result is likely to be not that any standard would be invalidated, but that the NB would be urged to participate in maintenance of the standard with the assurance that the situation would be monitored. (And this is especially true where the issue concerns non-normative text, or additional text, or wordsmithing, or anything that can be done by maintenance)

Messer im Kopf

Someone asked me what kinds of things would be legitimate causes for an appeal to succeed.

Frankly, I find it very hard to see how conduct of the BRM can invalidate a vote of NBs (especially as the Directives give a lot of discretion to the convenor, where there were ISO and IEC officials advising directly, where the goals of the BRM are so limited, and where the final ballot shows that the BRM’s goal of an improved text was successful.)

I think if there had been a proven miscount of the revised ballot, that would be a ground for an appeal (again, I don’t think it would result in failure of the standard, just a correct count of the already cast ballots.)

And if the endorsed text from the Editor differed from a NB’s view of what the editor’s instructions meant, that would be grounds for an appeal (but again, not resulting in a failure: probably resulting in the issue being dealt with by maintenance.) [I should say, I think ITTF would be remiss to not release the endorsed text of IS29500 to NBs as soon as it was ready, and certainly with time for it to inform NBs before the appeal deadline is closed: this has been an expectation of mainstream delegates from several different NBs.]

If there was a reasonable allegation of cartelization, I think that would be reasonable grounds for appeal. This is where some NB’s legitimate requirements had been deliberately ignored by other NBs, as a kind of gang. However I would expect this far more to apply to ballots that rejected a standard because where a standard was accepted without some material, the forum for reconciliation would be be the maintenance process.

New information about some showstopping technical flaw would be grounds for an appeal. It would have to be something fairly fundamental (for example, if the ZIP format utterly did not operate the way the spec said it did, undermining OPC and everything built on it) and something which could not be dealt with by maintenance.

If some SC was hijcked by multinational loons who wanted to standardize a perpetual motion machine, or by cannibals who wanted to standardize some headhunting apparatus, then that would certainly be a reputation problem. I am struggling a bit here: obviously JTC1 does not deal with machinery!

I don’t see how JTC1 can find reputation arguments very compelling: it will be hard for them to tease out any real reputation issues from the fake ones. I saw this week that someone just automatically added the ISO and IEC Secretaries-General to the list of people to be suspected of corruption, and this is even before they have actually been involved AFAIK! It is like reading “How to win friends and influence people” by Chicken Little.

So I apologize if I am a bit fuzzy: but again it goes to ISOs approach (as I conceive it, at least) that the purpose of the appeals is to provide forums for formalized discussions to continue. That the Directives use the language of issues getting “resolution” rather than of sides “winning” should be a central part of understanding the process.

Proportionality

A good indication of how appeals will be handled is how alleged “contradictions” were handled during the ISO Linux API standardization. That there was an almost total overlap of this with the ISO POSIX ABI, clearly overlap in application area or even details was not a contradiction. However there was a particular real contradiction, which related to different function signature. So the issue became, this is clearly a contradiction, but is it big enough to cause the ISO Linux standard to be cancelled. Clearly not. Instead it is labelled, and more discussions between the Linux and POSIX people (if they still exist!) would be encouraged.

So the lesson from that, I think, is that the idea that a small issue can be a “spanner in the works” is unrealistic. There needs to be a sense of proportionality between the technical damage and the response to it.

Fairness

I think the most challenging issue I have seen from in the appeals so far revolve around the issue of unfairness, in particular this seems to be a thing of the Brazilians. It seems to me that there need for fairness goes to the heart of ISO procedures, however, so it is quite important.

I think there is an assumption that there is some kind of “poison fruit” doctrine operating, where an earlier SNAFU would prevent later processing of the standard. But I don’t see that in the JTC1 Directives. I am certainly not saying that this would not be a consideration, just that I cannot see why it would be compelling. It would presumably be one of the technical or administrative “principles” which could be discussed under the s11.1.3 procedures.

This is because of the big bottom line here: the National Bodies have voted.

Don’t shoot the messager

I suppose this would go without saying, but I suspect it would be difficult for appeals which are really objections to the JTC1 Directives succeeding. The people who characterize convenor discretion as “making the rules up as you go along” would surely be appalled at retrospectively changing the rules! The resolution of such an appeal would be “take it up with JTC1″.

Who votes at the BRM?

As I have said, I think having legalistic rather than goal-responsive view of the administration of the JTC1 Directives is a mistake. But some people take that as a cop-out, so I would like to point out something about one of the procedural issues that has been raised by some. It goes down to the issue of who is allowed to vote at the BRM: P members of SC34 or both P and O members. (If you don’t know the difference, nations can registers as Observers or Participants for a particular technical committee. They have different rights and obligations.)

It has been pointed out that clause 13.8 says about the conduct of fast-track BRMs:

At the ballot resolution group meeting, decisions should be reached preferably by consensus. If a vote is unavoidable the vote of the NBs will be taken according to normal JTC1 procedures.

Some astute detectives trace this back to clauses elsewhere about P members voting. However, in this particular case clause 13.7 establishes who the NBs are:

NBs of the relevant SC shall appoint to the ballot resolution group one or more representative who are well aware of the NB’s position. NBs having voted negatively, whether or not an NB of the relevant SC, have a duty to delegate a representative to the ballot resolution meeting.

This is quite an important clause, for people wanting some insight into the process, because

  • it is inclusive, not exclusive: it tries to get as many of the NBs at the table as possible
  • it puts an obligation on the delegations to be properly prepared (which is not to say that there is any blame attached to delegates who are specialists or novices or last-minute additions: there always will be a certain number of these): I think this has some implication against claims about size and logistical problems with review and BRM proposals
  • it also explains why poor NBs, if they are trying to be scrupulous with the Directives, could prefer abstain over simple negative, “yes” rather than “no with comments” (as a conditional yes). It is expensive to send a delegation.
Philip Fennell

AddThis Social Bookmark Button

With all the recent talk of angle bracket taxes and what XML is and isn’t good for, I thought it would be fun to look at taking XSLT to places where it is not normally associated - the generation of binary file formats.

The sequence in XSLT 2.0 is of more use than the humble node-set. Not just restricted to nodes, you have access to things like the tokenize() function, that creates a sequence of strings or you can concatenate a sequence using the comma operator. The comma operator can be used on any data type.

However, there is nothing here that lifts us out of the ordinary; not until, that is, you create a sequence of xs:unsignedByte numbers. This sequence can be considered a byte sequence, and if you can create a byte sequence you can create just about any binary file format you like. A good example of this would be an image file like a Tagged Image File Format (TIFF) image. If you don’t get involved in image compression, it is relatively easy to create a TIFF image, after all it is only a series of sequences of bytes.

Mind you, there are two problems to deal with. The first is that a basic XSLT 2.0 processor does not support the xs:unsignedByte data type. Only a schema aware processor is required to support that data type. So, in the absence of the latter you’d have to make do with xs:integer and put up with the extra memory needed. Secondly, and more importantly is - how to get a byte sequence out the other end of an XSLT processor!

Advertisement