The trouble with schemas
Related link: http://www.tbray.org/ongoing/When/200x/2005/05/03/Replacing-WSDL
I received a good amount of feedback, some off-line, on a thread I started on the
>XML-DEV mail list
incompatible use of XML Schemas
which carried through on a thread called
Schema Compatability.
The executive summary: many tools that use XML Schemas for purposes other than validation only implement subsets, by necessity or design; which means that schemas are not always portable. This has a major impact on deployment: am I over-reacting to think this completely stuffs up the presupposition of Web Services, which is that all the data provider is responsible for is publishing their data, schema and protocol; the recipient will be able use that information with different vendor's applications. In other words, when the Information Architect of an organization says "We shall use web services and schemas" does he or she also say "But don't use any but the most conservative subset of features, otherwise everyone will have to use the same vendor's tools"? Which is clearly a problem because it means that the current state of the art is not that, within an organization, suppliers of data can do so without regard to the applications that the receivers are using.
The W3C people on to this issue too: they obviously need to get it addressed not only to prevent Web Services from flopping but also so that the XQuery hypists are not left floundering on an non-interoperable substrate. No-one enjoys floundering on a substrate, believe me. Henry Thompson wrote in to publicize the W3C Workshop on XML Schema 1.0 User Experiences which seems a really practical thing to get used to. Unfortunately, from some of the private email it seems that people are very loath to say the Emperor has no clothes, some for fear of looking an idiot if they are wrong, some because the clearly Emperor has some clothes, and at least in one case because the incompatability is in their employer's product that they obviously cannot write bad things about in public.
I expect the Irish RIG profiles will include good advice on this issue. The kinds of advice I would expect are that for maximum compatibility with applications:
- Avoid dynamic typing (i.e. the xsi:type attribute and substitution groups
- Avoid wildcards
- Avoid element recursion
- And the perennial classic: double-check your schema using some other tools, in particular to detect ambiguous content models.
Other posts mentioned what I suspect were merely features that an application did not use: that an application does not understand ID uniqueness is probably harmless at this level. If it does not understand and barfs if presented with a document with a schema with ID uniqueness constraints, that is the non-interoperable behaviour.
One of the most interesting responses, on profiles and conformance, came from Gregor.
So what is a poor Information Architect to do, when they want schemas to promote not prevent interoperability? The first thing is to be quite militant, and to demand of vendors that their implementations allow the XML Schema non-core abstractions: type substitution, abstract types, recursion, mixed content, wildcards, type derivation, redefinition, and so on. And to check with multiple tools that all important corporate schemas are conforming, non-ambiguous schemas. If a vendor's tools do not support these, either make the feature an organization-wide deprecated feature, or demand your money back from the vendor for flogging a non-conforming product.
Let me back-pedal a little. It is not impossible that there are XML Schemas features that are impossible for a given application to use. For example, what is an application that generates database table definitions do when a schema has wildcards? But place to solve this is at the application, not the schema: the application should have a switch "ignore non-required wildcards" or whatever, so that it will not barf uneccessarily.
Categories
WebComments (3)
Read More Entries by Rick Jelliffe.

resources for further research (more)
For example, at least on system doesn't support recursive declaration of elements, apparantly. That is not on either of the lists you mention: presumably because it is regarded as such an obviously necessary Good Thing by Docheads that it wouldn't enter into our minds to state concerns about it. This is an area where the proof of the pudding can be in the eating.
resources for further research
They are both interesting. But are they based on real problems encountered with tools?
I think there is a big difference between saying "This feature is syntactic sugar that may be confusing" or "This feature doesn't do what you may want it to do" and "This feature is not supported in some certain major tools". The former two can be denied away with "Well, you just need to learn how to use it" or "The tools will shield you", and argued about. The latter is less deniable.
resources for further research
For those considering the development of a profile of XSD that uses certain features and avoids others, check out the UBL Schema subcommittee's Risk Analysis of XML Schema Features and Kohsuke Kawaguchi's W3C XML Schema: DOs and DONT's.