Sean McGrath’s
Master Foo On Structured Documents
makes a similar point to my Standardize the jellybeans not the jars, and is worth a read.

However, there is one big problem with open content models, and using generic containers: many automated XML tools only use schema information and not instance information to do their stuff. This is a problem I am facing right at the moment, actually: a customer wants to use Brand X tool which lets you map from controls on a form to elements in a schema, but also wants to use an industry-standard schema which uses data values.

For example, the tool would like a document like this:

  <Customer>
        ..
       <homephone>1234</homephone>
      <businessphone>1324</businessphone>
      <ax>123</fax>
      ...
    </Customer>

but the industry standard has

    <Party>
        <Person>
            <PersonTypeCode tc="1">Customer</PersonTypeCode>
            ...
            <Phone>
                <PhoneTypeCode tc="1">Home</PhoneTypeCode>
                <DialNumber>1234</DialNumber>
           </Phone>
            <Phone>
                <PhoneTypeCode tc="2">Business</PhoneTypeCode>
                <DialNumber>1234</DialNumber>
           </Phone>
            <Phone>
                <PhoneTypeCode tc="12">Fax</PhoneTypeCode>
                <DialNumber>1234<DialNumber>
           </Phone>
      </Person>
   </Party>

In the first case the Xpath to the fax number is
//Customer/fax

In the second case the XPath is
//Party/Person[PersonTypeCode='Customer']/Phone[PhonetypeCode/@tc="12"]/DialNumber

This kind of issue is a common problem, and the answer is almost always either to forgo the graphical tools (sometimes the application’s backend can handle more complicated Xpaths than the IDE GUI can) or to transform the data in and out so that the application works with data in an optimal form (which requires having a customized schema for the particular application or class of application.) In many cases, it seems that the large standard schemas are either “jack of all trades but master of none” or that they really are designed for neutral data interchange and adoptees should expect to have to do some information-preserving transforms in and out.

Either way, Sean’s blog is in the ballpark.