I made three prototype implementations of the Topologi XSD to RELAX NG Compact Syntax translator, before adopting a particular one.
First, I used Topologi’s high-level inhouse Java library for XSD, which we use on other products. I looked at converting that into the Java API of one of the versions of RELAX NG in James Clarks’ Trang translation software.
Second, I tried using XSLT to generate RELAX NG Compact Syntax directly.
Third, I looked at using XSLT 2 (Saxon) to generate RELAX NG as XML, then use Trang to convert from this XML to RELAX NG Compact Syntax.
Which one did I go with?
Well, the first method was a complete nightmare. In order to do the work I needed to keep in mind
I was interested in seeing whether adopting the kind of Java API approach (objects) would help simplify the issue, but actually it was the worst approach: the number of levels and connections and abstractions multiplied crazily. Every problem involved looking up multiple manuals or specs. James is a great programmer, but no-one would describe his programming style as chatty or discursive, which was a killer.
The second approach was actually pretty workable, but was starting to look alarmingly fragile: I was adding more functions to generate good RELAX NG compact, and I could see the writing on the wall.
So I chose the third way, ultimately. I sloughed off all Compact Syntax issues to Trang to deal with, working on the command line, and then could just concentrate on converting XSD elements into equivalent RELAX NG attributes. Just the XML syntax, and the RELAX NG, XSD and XSLT semantics, only dipping into the components where needed, and never making an abstract object representation of the XML independent of the infoset. Phew..small is beautiful.
The MS OOX schemas have several features that make translation easier: they use different prefixes for each each kind of object, so no name munging is needed when converting components into RELAX NG patterns with their single namespace. And the OOX derivation tree is shallow, thank goodness. OOX uses all sorts of XSD nasties: extension, substition groups, abstract elements, and I think I found RELAX NG exquivalents for almost everything. If there had been multiple levels of derivation between schema documents, it would need a bit more work.
The main difficulty I had, actually, was with understanding RELAX NG. I have had the experience (with XSD and RELAX NG) of understanding a technology well, but then having a subsequent technology blank out that knowledge. So I found it quite troublesome to figure out how to map xsd import and includes into RELAX NG, when there are foreign namespaces involved. I had made a rule that the translator would not need to look in other documents: I am still not convinced I have it right yet, actually.
But all in all, I think the draft RELAX NG compact syntax schemas for draft Ecma OOX at least show that ISO RELAX NG is a viable technical option even for large complex documents that use XSD schemas: the choice of a particular document type should not force your hand to adopt one stream of schema technology…especially for grammar-based schema languages. I’m also working recently on another project where the independent schema consultant developers in RELAX NG and then distributes as XSD: a nice approach, and I expect over the few years that schema-language neutrality will become a more widely adopted stance by buyers/developer/overseers.


Question: why not convert the xs:documentation elements to RNG-namespaced documentation? Rather trivial detail admittedly.
Most comments are translated to >> comments.
A few are removed in situations where RELAX NG compact does not allow them (the details escape me, it is something like comments on enumerated values).
And there is one or two situations where the comments are duplicated, both as [xs:documentation] and >> comments. Comments were not high on my priorities for making the Ecma draft deadline; however they certainly are high on the list of things to have right by the time the ISO standard comes out.
For the first stage in approach #3, couldn't XSLT 1.0, possibly using EXSLT, have done just as well? Or was 2.0 more of an incidental choice?
As for the outcome: #3 is effectively a two-stage transform (even if stage 2 wasn't XSLT), so the conclusion is not at all surprising. (In absence of Trang, the second stage would have been XSLT as well, and I expect that wouldn't have led to a particularly different result, it would just have taken more effort.) It's always nice to have hard data points from somewhat apples-to-apples comparisons, of course.