I’ve just started looking into the ACORD schemas. These are the standards of choice in the English-speaking insurance world, from what I can gather (oops I am being too coy), and are quite meaty and mature now: the documentation runs to about 3,000 pages. Various little birds had told me that Schematron had been used to augment the XSD schemas in several places, so I thought it would be interesting to look at why.
This is not to point out deficiencies in XSD (the facts can speak for themselves) but to look for the relative strengths of Schematron. This kind of data, of course, is very prone to have having several layers of rules for each user: business rules, occurrence rules that come from the forms used, systems limits, and so on: each of these can well be represented by Schematron schemas usually (or combined as different phases of the same schema.)
But lets just look at a three additional constraints above the ones that XSD schemas can represent, just from s4 Implementation Conventions of Acord Life, Annuity and Health Standard v2.17
The first interesting thing is the use of typecodes, explained in s4.4. ACORD documents are interesting because they want to use the same XSD schema and elements for each stage of processing. So when a form comes in, before it has been assessed, in some process all the data may be just treated as strings. Then when a datum has been assessed and possibly fixed up, then it can be marked with a typecode has having a certain data type, for example being a date (in 8601 form).
This is pretty much unfeasible in XSD: I don’t think we can use xsi:type for this, because IIRC the type nominated by xsi:type has to be derivable from the actual type specified in the XSD schema, and a date is not derived from String. (Maybe XSD 1.1 fixes this, it doesn’t matter.) In Schematron, it is easy: something like
<sch:pattern> <sch:title>Type codes</sch:title> <sch:rule context="*[@tc='1']"> <sch:assert rule=".='true' or .='false' or .='1' or .=0'"> A <sch:name/> element should be a boolean</sch:assert> </sch:rule> ...other rules for other typecodes... </sch:pattern>
There is an interesting constraint in s4.13 that says that aggregate elements with no optional subelements should be omitted. This is not something that can be specified using grammars, since it makes the occurrence of a parent dependent on the value of a child. The Schematron assertion might be as simple as something like this:
<sch:rule context="*[string-length(.) = 0]"> <sch:assert test="not(*)">Aggregate elements should contain elements with content</sch:assert> </sch:rule>
In s4.14 it speaks of nested data ranges: the example they give is
it would not be valid for a PolicyProductInfo to specify an expiration date of 3/1/2005, while one of its child JurisdictionApprovals specifies an expiration date of 4/1/2005.
This is obviously quite trivial for Schematron, especially when you make life easier for yourself by using
sch:let to parse the dates into fragments that make comparison easier.
TriSystems Infobahn have a brochure (PDF) on their approach for using Schematron with ACORD, for people who want more information. The ACORD schemas were developed with respected industry figure Daniel Vint as the senior architect: I see he is potentially nabbable for contract work now.