Here is a way to express basic Entity Relationship model using ISO Schematron. Schematron allows you to model entities and various relationships using “abstract patterns” (a parameterized macro facility.) You can use the same idea to model other kinds of diagramming and modeling systems.

I think what it quite powerful about this approach is that we can separate the information relationships from the XML serialization. Fields can be child elements, attributes, attributes of the parent, any kind of XPath, we don’t care. Similarly, if two fields are related, they may use a key or containment, but we don’t care. If you like you see this a technique for capturing information from a model in a form which also happens to hook into Schematron validation; but you can use the captured model for non-Schematron purposes too!

(This is an updated version of a post to XML-DEV mail list, in Nov 2006.)

In the following mini-example, fields with a one-to-one relation can be nested or they can be linked using an ID-like mechanism. XSD is not powerful enough to allow this kind of alternative mechanism: this forces people to make a decision about the serialization strategy: this creates incompatibility because different people make different choices.

Using this mechanism, you can get a complete separation from the declarative portions (which can have as much additional declarative information as you like) and the operational/implementation code.

Example

This example declares there is an entity called Address which has fields Street, Town and Postcode, and also an entity called Person which has a field Name. There is a one-to-one relationship between Address and Person. The types and optionality of fields are specified, as you would expect. We have defined (below!) abstract patterns called ENTITY, FIELD, and ONE-TO-ONE-RELATIONSHIP to match the basics of the ER model, to start.

<sch:pattern is-a="ENTITY" >
   <sch:param name="name" value="Address" />
</sch:pattern>

<sch:pattern is-a="ENTITY" >
   <sch:param name="name" value="Person" />
</sch:pattern>

<sch:pattern is-a="FIELD">
   <sch:param name="entity" value="Address"/>
   <sch:param name="name" value="Street"/>
   <sch:param name="type" value="xs:string"/>
   <sch:param name="required" value="true" />
</sch:pattern>

<sch:pattern is-a="FIELD">
   <sch:param name="entity" value="Address"/>
   <sch:param name="name" value="Town"/>
   <sch:param name="type" value="xs:string"/>
   <sch:param name="required" value="true" />
</sch:pattern>

<sch:pattern is-a="FIELD">
   <sch:param name="entity" value="Address"/>
   <sch:param name="name" value="Postcode"/>
   <sch:param name="type" value="xs:short"/>
   <sch:param name="required" value="false" />
</sch:pattern>

<sch:pattern is-a="ONE-TO-ONE-RELATION">
   <sch:param name="from" value="Person"/>
   <sch:param name="to" value="Address"/>
</sch:pattern>

<sch:pattern is-a="FIELD">
   <sch:param name="entity" value="Person"/>
   <sch:param name="name" value="Name"/>
   <sch:param name="type" value="xs:String/>
   <sch:param name="required" value="true" />
</sch:pattern>

How easy is that? And, in particular, in what way is that more complicated than XML Schemas?

This kind of declaration is very declarative, IYSWIM. Very easy to use for other purposes. In fact, it means that using Schematron syntax you can model your information using extensible collections of name-value pairs (can someone say “tuple”?) including metadata that you won’t be using in any assertions. You use Schematron abstract patterns to capture all the data and metadata about some information, then you decide which of that information you want to make assertions about and the metadata, being captured, is available in the most convenient form for other XML processes.

Implementation

Here are some simple definitions for the abstract patterns. A practical set of abstract patterns would allow more kinds (cardinalities) of relationships to be expressed, and could have more elaborate implementation. However, the definitions for the abstract patterns are not necessarily something that developers would be required to keep in mind. They can just fill in the forms for the various kinds of forms, like above.

The implementation of the abstract patterns might be something like this (there is probably some casting required for strings and names, but this is enough to give the idea):

<sch:pattern name="ENTITY"  abstract="true">
  <sch:rule context="/">
    <sch:assert test="true()">
      (We don't make an assertions about an entity.)
    </sch:assert>
  </sch:rule>
</sch:pattern>

<sch:pattern name="FIELD"  abstract="true">
  <sch:rule context=" $entity ">
    <sch:assert test=" boolean( $required ) = false or $name ">
    A <sch:name /> has a field .
    (Fields are always serialized to XML as subelements.)
    </sch:assert>
  </sch:rule>
</sch:pattern>

<sch:pattern name="ONE-TO-ONE-RELATION"  abstract="true">
  <sch:rule context=" $from ">
    <sch:assert test=" $to or attribute::*[name() = $to ] ">
    There is a one-to-one relation from <sch:name /> and  </sch:value-of select=" $to "/>
    (This may be expressed in XML by using a subelement or by using an ID with the same
    name as the entity pointed to.)
    </sch:assert>

    <sch:assert test="count( $to | attribute::*[name() = $to ]) <= 1 ">
    A one to one relation only allows a single child element or attribute.
    </sch:assert>

    <sch:assert test=" not(attribute::*[name() = $to ]) or
        //*[name() = $to]/attribute::*[name() = $from]
                     = current()/attribute::*[name() = $to ] ">
    If a one-to-one relation is serialized in XML using a link, then there should be a element
    somewhere in the document with the name
    of <sch:value-of select=" $to "/> which has an attribute called
    <sch:value-of select=" $from "/> which has the same value (e.g. an ID)
    as the value of the <sch:value-of select=" $to "/> attribute on the
    <sch:value-of select=" $from "/> element.
   </sch:assert>

  </sch:rule>

</sch:pattern>

As I mentioned before, providing the definitions is a guru/vendor task. Using the abstract patterns is trivial form-filling.

Declarative = Retargettable!

Also, note that because we have been so declarative, we could actually convert the top definitions of our address schema to XSD even, at a pinch, by simple transformation.

And we can change the serialization strategy for our data just by changing the definitions of the abstract pattern, without touching the declarations for the individual ER components. Want to allow any field to be an attribute? Just change one line

<sch:assert test=” boolean( $required ) = false or $name “>

to

    <sch:assert test=" boolean( $required ) = false or $name
            or attribute::*[name() = $name] ">

Schematron allows us to model the serialization strategy independent of its uses, in a way that leaves XSD’s substitution groups in the dust.

There are a few implementations of Schematron which implement abstract patterns, but it is not part of Schematron 1.5. A file called iso-pre-pro.xsl is sometimes used. In the next week or so, I will be loading a pre=processor onto the website for my beta validator implementation of ISO Schematron, at www.schematron.com