Sean McGrath’s
Master Foo On Structured Documents makes a similar point to my Standardize the jellybeans not the jars, and is worth a read.
However, there is one big problem with open content models, and using generic containers: many automated XML tools only use schema information and not instance information to do their stuff. This is a problem I am facing right at the moment, actually: a customer wants to use Brand X tool which lets you map from controls on a form to elements in a schema, but also wants to use an industry-standard schema which uses data values.
For example, the tool would like a document like this:
<Customer>
..
<homephone>1234</homephone>
<businessphone>1324</businessphone>
<ax>123</fax>
...
</Customer>
but the industry standard has
<Party>
<Person>
<PersonTypeCode tc="1">Customer</PersonTypeCode>
...
<Phone>
<PhoneTypeCode tc="1">Home</PhoneTypeCode>
<DialNumber>1234</DialNumber>
</Phone>
<Phone>
<PhoneTypeCode tc="2">Business</PhoneTypeCode>
<DialNumber>1234</DialNumber>
</Phone>
<Phone>
<PhoneTypeCode tc="12">Fax</PhoneTypeCode>
<DialNumber>1234<DialNumber>
</Phone>
</Person>
</Party>
In the first case the Xpath to the fax number is
//Customer/fax
In the second case the XPath is
//Party/Person[PersonTypeCode='Customer']/Phone[PhonetypeCode/@tc="12"]/DialNumber
This kind of issue is a common problem, and the answer is almost always either to forgo the graphical tools (sometimes the application’s backend can handle more complicated Xpaths than the IDE GUI can) or to transform the data in and out so that the application works with data in an optimal form (which requires having a customized schema for the particular application or class of application.) In many cases, it seems that the large standard schemas are either “jack of all trades but master of none” or that they really are designed for neutral data interchange and adoptees should expect to have to do some information-preserving transforms in and out.
Either way, Sean’s blog is in the ballpark.


"...that they really are designed for neutral data interchange and adoptees should expect to have to do some information-preserving transforms in and out."
This is a pretty accurate statement. As what may be optimal for one system isn't necessarily optimal for another. Unless system and tools support the industry standard out of the box, then there is always going to be mapping done and transformation.
It's sort of fun to see articles like Sean's blog. For so long people put so much faith in XML capabilities but the original SGML crew was so small that a lot of sea lessons didn't make it into the distro. This comes down to two important points:
1. Data handling requires rules or processes. Think operation sets over data sets. These might be named, they might be *identified*, they might be ad-hoc or real time, but in the end, a fully-declarative structured system of names, identifiers and chunks-o-data will fail without local processing and global guidance.
2. The best way to reuse information is to ensure that the lowest level of granularity is truly atomic: no pointers out. Hypermedia as a data model tends to screw that over by putting links inline. This was the original objection to HTML and it still stands. SGML couldn't fix that. Hytime could. But it was of course, too hard and not HTML. If we have to assign blame, let's just be sure we are blaming the right resources.
And so it goes. Each generation can only learn so much and can only pass on significantly less.
Sean's blog post is great. This needs to be understood much better.
Meanwhile, It is knights that move like that (appealing to the controlled vocabulary of the game of chess), even though your phrasing is somehow more poetic.
There needs to be a koan somewhere about information preservation. It is the best our computers do of course, preserve our information without understanding it at all. When we attempt otherwise, there is usually a mess to unravel.
And knowing what transformation preserves that which is important in a given situation is a matter for sapience beyond the computer's purview, it would seem. I bet there's a riff on document formats and what they embody in here somewhere.
Have a joyful and satisfying holiday.
orcmid: Doh, thanks.