Lets imagine that we are transitioning into a “Document Engineering” style of architecture, so that we can model our entire business using old but not-as-outmoded-as-we-first-thought Data Flow Diagrams. At each data flow we need to ask the Exception Question: Does an exceptional document need human intervention or can it be dealt with automatically? Indeed, the expected answers to this question is probably what distinguishes the document community from the database community: the docheads would expect exceptions to be dealt with by humans who can monitor, fix and reset the production flow at all stages, the dataheads would expect exceptions to be dealt with by automated process, since humans involvement is at the input/output periphery of systems.

Obviously, the most “exceptional” kind of document is the invalid-against-a-schema document. However, Schematron allows a much milder (or tougher, depending on how one looks at it) bar: the presence or absence of any arbitrary pattern in the document can allow it to be marked as exceptional. (Schematron not only define valid/invalid, but it also allows complex dynamic diagnostic messages, and it also allows various flags to be set by assertions that fail.)

So the Exception Question then becomes a criterion for evaluating schema or constraint languages: when exceptional documents are to be sent to humans for intervention, does schema language A provide clear enough information to be usable by those humans. Similarly, when exceptional documents are to be sent to software (services) for intervention, does schema language A provide clear enough information to be usable by that software. Looked at in those terms, grammar-based systems do not shine. Grammar-based systems excel in all-or-nothing Great-Wall-of-China exclusion uses, but then throw the users (systems and humans) at the mercy of the validator-developer for the kinds of feedback and information possible, who has of course absolutely no idea of the problem domain of the schema. XSD is perhaps a little more organized in this regard compared to the other schema languages, because it defines a specific list of outcomes that can be found in the notional Post Schema Validation Infoset after validation.

But, the trouble is that, whether for humans or systems, the more that problems are diagnosed in generic terms (i.e. in terms of the markup) rather than in domain terms (i.e. in terms of the intended patterns, or dare I say semantics), the less chance that the diagnostic can serve any practical purpose for downstream systems. Notoriously this is true for system which “hide the markup” from the user: the grammatical errors are unavailable and incomprehensible to the users. Grammars have shown themselves over the last 20 years to make programmers more productive but to stupify end-users: the traceablity issue I raised this week on XML-DEV in response to one of Roger Costello’s excellent fishing expeditions is another head of the same Hydra.

(I originally wrote this as a response to a comment on my little blog item of a couple of days ago, on “Why Schematron (or something like it) will win”. The kind of thought that was in my mind then was that questions like the Exception Question show that once you move beyond yes/no validation and into the world of providing diagnostics (whether for humans or computers), you need to be concerned with a whole different set of information than what is provided by the current batch of grammar-based systems. )