First, Happy New Year to friends and readers!

David Orchard has an article up at XML.COM entitled A Theory of Compatible Versions. It looks at versioning of XML Schemas as a sets-and-subsets issue. The punchline is Languages can be compatibly versioned successfully if the first version of a language defines an Accept Text Set that is a superset of the Defined Text Set, as well as a substitution rule for transforming texts in the Accept Text Set into the Defined Text Set.

And the more that partial understanding is used, the more changes will be compatible rather than incompatible. I agree with the idea of partial understandings, but the substitution rules seem to me to be, in part or in whole, ways of reconciling artificial constraints imposed by the limited expressive power of grammars, rather than necessarily being real. When you only have a hammer, every thumb looks like a sore thumb.

It is an interesting example of the kind of hoops that you have to jump through to make grammars work outside the most trivial examples. Grammars encourage or perhaps force you to build in decisions about positions that are not sourced from any business requirements. (There is a theory that this is a good thing, because every time you force a decision on position you may save a bit for compressibility, believe it or not; but that is slim pickings.)

Let’s look at the following Schematron schema which handles partial ordering without any definite positional or occurrence constraints:

<schema>
   <title>Schema example showing handling different name models</title>

  <pattern name="Partial-Ordering-for-Names" >
     <rule context="name">
         <assert test="count(first[preceding-sibling::last]) = 0">
                    In a name, the first name comes before the last name,
                    because it helps natural printing.</assert>
         <assert test="count(middle[preceding-sibling::last]) = 0">
                    In a name, the middle name comes before the last name,
                    because it helps natural printing.</assert>
         <assert test="count(middle[following-sibling::first]) = 0">
                    In a name, the middle name comes after the first name,
                    because it helps natural printing.</assert>
         <assert test="count(last[following-sibling::first]) = 0">
                    In a name, the last name comes after the first name,
                    because it helps natural printing.</assert>
     </rule>
   </pattern>
</schema>

In the example above, this schema will match the various content models David mentions.

Notice that the asssertions have a because phrase. This is one of the things that has been hitting home to me recently, that you need to be able justify constraints (traceability) and reduce the strength of constraints to only go as far as business requirements mandate (alignment).

Lets now use phases to model the languages V1 (first, last, *) and V2 (first, middle, last). Also, lets define the superset language, V0, which just has the partial ordering ( (first | *)*, (middle | *)*, last, * )

<schema>
   <title>Schema example showing defining multiple related content models<.title>

  <p>This schema defines a familty of three languages for names, V0, V1 and V2, as Schematron phases.
  V0 language is the one that must be true for all names; general processing software should be written to
  assume that the V0 constraints are honoured by documents, and that other constraints, such as in the
  V1 and V2 languages may not always be honoured in other languages in the familly. </p>  

  <phase name="V0">
    <active pattern="Partial-Ordering-for-Names" />
    <active pattern="last-name-required" />
  </phase>

  <phase name="V1">
    <active pattern="Partial-Ordering-for-Names" />
    <active pattern="first-name-required" />
    <active pattern="no-middle-name" />
    <active pattern="last-name-required" />
    <active pattern="open-back-content-model-for-names" />
  </phase>

  <phase name="V2">
    <active pattern="Partial-Ordering-for-Names" />
    <active pattern="first-name-required" />
    <active pattern="middle-name-required" />
    <active pattern="last-name-required" />
    <active pattern="closed-content-model-for-names" />
  </phase>

  <pattern name="Partial-Ordering-for-Names" >
     <rule context="name">
         <assert test="count(first[preceding-sibling::last]) = 0">
                    In a name, the first name comes before the last name,
                    because it helps natural printing.</assert>
         <assert test="count(middle[preceding-sibling::last]) = 0">
                    In a name, the middle name comes before the last name,
                    because it helps natural printing.</assert>
         <assert test="count(middle[following-sibling::first]) = 0">
                    In a name, the middle name comes after the first name,
                    because it helps natural printing.</assert>
         <assert test="count(last[following-sibling::first]) = 0">
                    In a name, the last name comes after the first name,
                    because it helps natural printing.</assert>
     </rule>
   </pattern>
</schema>

  <pattern name="first-name-required">
    <rule context="name">
       <assert test="count(first) = 1">A name should have a first name, because ...</assert>
     </rule>
  <pattern>

  <pattern name="no-middle-name">
    <rule context="name">
       <assert test="count(middle) = 0">A name should have no middle name, because ...</assert>
     </rule>
  <pattern>

  <pattern name="middle-name-required">
    <rule context="name">
       <assert test="count(middle) = 1">A name should have a middle name, because ...</assert>
     </rule>
  <pattern>

  <pattern name="last-name-required">
    <rule context="name">
       <assert test="count(last) = 1">A name should have a last name, because ...</assert>
     </rule>
  <pattern>

  <pattern name="closed-content-model-for-names">
     <rule context="name">
       <assert test="count(first) + count(middle) + count(last) = count(*)">
           A name should only contain first, middle or last elements because ...<assert>
     </rule>
  </pattern>

  <pattern name="open-back-content-model-for-names">
     <rule context="name">
       <assert test="count(first) + count(middle)  = count(preceding-sibling::last)">
           A name should only contain first or middle before the last name because ...<assert>
     </rule>
  </pattern>

In the above schema we have three languages, V0, V1 and V2, which are implemented as separate Schematron phases. When validating you select which phase to use, and the phase determines which patterns are tested. The schema factors out the common constraints; the V0 language is equivalent to David’s Accept Set. The schema clearly shows the difference between the two language: if you made a third language V3 as a phase it is trivial to see if V3 would be acceptable by V0, V1 or V2 by seeing which patterns were active in it. And, more to the point, because we have defined precedence using partial ordering, there is no (in this case, and certainly in the general case less) need for a “substitution rule”, as far as I can understand the concept.

I may be mistaken, but it seems to me that the “substition rules” which David proposes are mechanisms to overcome the problems of grammars in being unable to specify partial ordering well.