Eagerness and laziness just don’t mix.

There are basically two kinds of XML Schema applications: one kind relies on having all the various schema documents present, the other kind dynamically locates and loads schemas for namespaces as elements or attributes with that namespace are found in a document, lazily. Examples of the first kind include XML IDEs and XML Databinding tools; examples of the second kind include most server-based library tools, such as Apache Xerces and MSXML.

The reasons are obvious: speed. An IDE needs to have all the information at the user’s fingertips. A server-based validator needs to avoid loading spurious schemas where a namespace is possible but not actually used in the instance.

But they are incompatible. The trouble comes whenever you have standard envelope elements or whenever you have multi-vocabulary documents that can start in several vocabularies. The IDE kind of application resolves all th imports in the schemas eagerly; the server kind of application read declarations in a schema, including the import elements, lazily.

XML Schemas allows both behaviours. So you have have a set of schemas that your IDE says is complete and OK, and which validates your sample documents, and then pass the same schema to a server validator and have reports that certain schemas are not available. HUH, BUT I CAN SEE THE IMPORT STATEMENTS?? ( I’ll give a little example later, to help clarify it.)

So what can you do? For a start, if you create documents using an application that has all schemas loaded, you need to be aware that there is a large chance (if you have multiple schemas and standard envelopes, etc) that your schemas will not run successfully in applications of the other kind. There are a couple of remedies: importing everything from everywhere, is one ugly one; having facade schemas is another. But you should probably think in terms of “document type”, just like the old days of DTDs: base the document type on the namespace of the top-level element, and if you have multiple namespaces possible, make up a separate schema (invoking the common components) directly.

Its ugly, but there is no way round it in XSD. Its an implication of XSD’s Schema Document Location Strategy.

For example, lets say I have two namespaces (ns1 and ns2), and each namespace has a single element (e1 and e2), and we have a corresponding schema document for each namespace (s1 and s2), and a sample document

<ns1:e1  xmlns:ns1="...">
    <ns2:e2  xmlns:ns2="..."/>
</ns1:e1>

So now lets say that schema s1 and element e1 are some kind of standard enveloping vocabulary. You downloaded the schemas off the net, say. Or your XML King has decreed s1 to be the standard container.

So you put it all together in your schema IDE, and you add an import statement into s2, which is the schema you control; so s2 imports s1. And you validate in your IDE, and it all works fine, and you begin to see light at the end of the tunnel. You go out to your favourite Taiwanese restauraunt have some beef noodle soup to celebrate, perhaps.

Next day, you return, fresh and happy and try to load your schema into software that uses Xerces (such as most Topologi software) or MSXML (such as all the rest of Topologi software) and validation fails with an error. You check your paths, all OK. You check well-formedness, all OK. You check that your validation software indeed does have the right schema location for s2, all OK. You check with a different validator, and it still fails. You begin to suspect you might have been better off with RELAX NG. But you need to press on.

What is happening? Well, the validator starts off knowing that there is a schema for ns2 at document s2. But it does not load ns2, so it does not know (from the import statement) that there is a schema for ns1 at document s1. So the validator starts validating our document, and comes to the top element: it is in namespace ns1, and the validator says “I don’t have a schema for that namespace” and (since we are using strict validation) fails with an error.

If you had been able to supply the s1 schema for ns1, all would have been well, but you don’t want to fiddle with it.

(At this point, you will try various other things, such as adding extra namespace/schema pairs to the schemaLocation attribute on the document or to the API, and you may find your validation library has a bug that doesn’t accept multiple pairs. Yikes. But it doesn’t really matter: the goose is cooked.)

So you have to go back to your IDE and add extra import statements. Or create little facade schema that include the schemas for the top-level element and import whatever else is needed. And finally it works, and you are left trying to figure out whether to curse the IDE, for not making imports that lazy tools need, or the lazy tools, for not even looking a little bit inside schemas, or XSD for being so giddily liberal in this area, or yourself for having to work with these stupid things.