October 2007 Archives

Rick Jelliffe

AddThis Social Bookmark Button

XSD allows you to derive your own simple datatypes by restricting the lexical space or the value space of the type. The rule about derivation by restriction is that everything that is valid against the derived type is also valid against the base type.

And this gives us our method. Remember from the previous blog in this series that we implement a built-in datatype like this

   <sch:rule context="imametadataman">
     <sch:rule extends="xsd-byte-datatype"/>
   </sch:rule>

If we want to say that imametadataman should have a facet of minExclusive of 32, we just implement the facet restriction by adding an assertion:

   <sch:rule context="imametadataman">
     <sch:rule extends="xsd-byte-datatype"/>
    <sch:assert test=". > 32 "> The value for <sh:name />should be greater than 32 </sch:assert>
   </sch:rule>

Type derivation by restriction can be directly implemented by Schematron abstract rules. There is a mismatch in terminology: we restrict the type (in the XSD) by extending the constraints (in Schematron).

Here is some code to give the flavour of how easy it is to handle each type. (The assertion text needs work, and there is lots of scope for beautification, but you should get the idea. )

	<xsl:when test="xs:simpleType/xs:restriction[@base]">
			<xsl:variable name="baseon" select="xs:simpleType/xs:restriction/@base"/>
			<sch:rule>
				<xsl:choose>
					<xsl:when test="self::xs:attribute[parent::xs:schema]">
						<xsl:attribute name="abstract">true
						<xsl:attribute name="id">
							<xsl:choose>
								<!-- attribute has no namespace -->
								<xsl:when test="ancestor::namespace/@uri=''">
									<xsl:value-of select="concat('global_', @name)"/>
								</xsl:when>
								<!-- attribute has namespace (normal case) -->
								<xsl:otherwise>
									<xsl:value-of select="concat('global_', ancestor::namespace/@prefix, '_', @name)"/>
								</xsl:otherwise>
							</xsl:choose>
						</xsl:attribute>
					</xsl:when>
					<xsl:otherwise>
						<xsl:choose>
							<xsl:when test="self::xs:element">
								<xsl:call-template name="generate-element-context"/>
							</xsl:when>
							<xsl:otherwise>
								<xsl:call-template name="generate-attribute-context"/>
							</xsl:otherwise>
						</xsl:choose>
					</xsl:otherwise>
				</xsl:choose>
				<!-- get base value -->
				<!--  FIX THIS: should use namespace URI not prefix! -->
				<xsl:choose>
					<xsl:when test="starts-with($baseon,'xs:') or
									starts-with($baseon,'xsd:') or
									starts-with($baseon,'xsi:')">
						<sch:extends rule="{concat(ancestor::namespace/@prefix, '-xsd-datatype-', substring-after($baseon, ':'))}"/>
					</xsl:when>
					<xsl:when test="contains($baseon,':')">
						<xsl:variable name="prefix"
							select="substring-before($baseon, ':')"/>
						<xsl:variable name="typename"
							select="substring-after($baseon, ':')"/>
						<sch:extends rule="{concat($prefix, '_', $typename)}"/>
					</xsl:when>
					<xsl:otherwise>
						<sch:extends rule="{concat(ancestor::namespace/@prefix, '_', $baseon)}"/>
					</xsl:otherwise>
				</xsl:choose>
				<!-- check the underneath of restriction -->
				<xsl:if test="xs:simpleType/xs:restriction/xs:enumeration">
					<sch:assert>
						<xsl:attribute name="test">
							<xsl:for-each select="xs:simpleType/xs:restriction/xs:enumeration">
								<xsl:text>(. = "
								<xsl:value-of select="normalize-space(@value)"/>
								<xsl:text>")
								<xsl:if test="following-sibling::xs:enumeration">
									<xsl:text> or 
								</xsl:if>
							</xsl:for-each>
						</xsl:attribute> The value of  should be one of
						<ue-of select="@value"/>
							<xsl:if test="following-sibling::xs:enumeration">
								<xsl:text>, 
							</xsl:if>
						</xsl:for-each>. (It is of type "
						<xsl:value-of select="normalize-space(@name)"/>".)
					</sch:assert>
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:minLength">
					<sch:assert test="string-length(.) < xs:simpleType/xs:restriction/xs:minLength/@value"> A
						simpleType(
						<xsl:value-of select="@name"/>)'s value must be longer than
						<xsl:value-of select="xs:simpleType/xs:restriction/xs:minLength/@value"/> </sch:assert>
				
				
					<sch:assert test="string-length(.) > xs:simpleType/xs:restriction/xs:maxLength/@value"> A
						simpleType(
						<xsl:value-of select="@name"/>)'s value must be shorter than
						<xsl:value-of select="xs:simpleType/xs:restriction/xs:maxLength/@value"/> </sch:assert>
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:length">
					<sch:assert test="string-length(.) != xs:simpleType/xs:restriction/xs:length/@value"> A length of
						this simpleType(
						<xsl:value-of select="@name"/>)'s value must be
						<xsl:value-of select="xs:simpleType/xs:restriction/xs:length/@value"/> </sch:assert>
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:whiteSpace">
					<sch:assert test="true()"> WhiteSpace would be treated as 'preserve',
						'replace' or 'collapse' </sch:assert>
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:totalDigits">
					<xsl:comment>The counting doesn't include dot, leading and trailing zeros.</xsl:comment>
					<sch:assert test="string-length(replace(string(.),'.','')) < xs:simpleType/xs:restriction/xs:totalDigits/@value"> The maximum number of digits for <sch:name/>
						should smaller than <xsl:value-of select="xs:simpleType/xs:restriction/xs:totalDigits/@value"/> </sch:assert>
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:minExclusive">
					<sch:assert test=". > xs:simpleType/xs:restriction/xs:minExclusive/@value"> The value for  should be
						bigger than <xsl:value-of select="xs:simpleType/xs:restriction/xs:minExclusive/@value"/> 
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:minInclusive">
					<sch:assert test=". > xs:simpleType/xs:restriction/xs:minExclusive/@value or . = xs:simpleType/xs:restriction/xs:minExclusive/@value"> The value for <sch:name/> should be
						bigger than and equal with <xsl:value-of select="xs:simpleType/xs:restriction/xs:minExclusive/@value"/> </sch:assert>
				<<xsl:if test="xs:simpleType/xs:restriction/xs:maxExclusive">
					<sch:assert test=". < xs:simpleType/xs:restriction/xs:maxExclusive/@value"> The value for  should be
						smaller than <xsl:value-of select="xs:simpleType/xs:restriction/xs:maxExclusive/@value"/> </sch:assert>
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:maxInclusive">
					<sch:assert test=". < xs:simpleType/xs:restriction/xs:maxExclusive/@value or . = xs:simpleType/xs:restriction/xs:maxExclusive/@value"> The value for <sch:name/> should be
						smaller than and equal with <xsl:value-of select="xs:simpleType/xs:restriction/xs:maxExclusive/@value"/> </sch:assert>
				</xsl:if>
				<xsl:if test="xs:simpleType/xs:restriction/xs:pattern">
					<xsl:comment>This assertion check xs:pattern, xs:pattern could be more than one, but the value is valid when one of them is matched.
					<xsl:variable name="testString">
						<xsl:for-each select="xs:simpleType/xs:restriction/xs:pattern">
							<xsl:variable name="apost" select='"'"'/>
							<xsl:value-of select="concat('matches(.,', $apost,@value,$apost,')')"/>
							<xsl:if test="position() != last()"> or </xsl:if>
						</xsl:for-each>
					</xsl:variable>
					<sch:assert>
						<xsl:attribute name="test">
							<xsl:value-of select="$testString"/>
						</xsl:attribute> The value for  should match
						<xsl:choose>
							<xsl:when test="count(xs:simpleType/xs:restriction/xs:pattern) = 1">
								the pattern:
							</xsl:when>
							<xsl:otherwise>
								one of patterns:
							</xsl:otherwise>
						</xsl:choose>
						<xsl:for-each select="xs:simpleType/xs:restriction/xs:pattern">
							<!-- HACK: This is strange to make span into a list value, but better than nothing -->
							<sch:span class="li"><xsl:value-of select="@value"/></xsl:for-each>
					</sch:assert>
				</xsl:if>
			</sch:rule>
		</xsl:when>

We are not implementing simple type derivation by union or list at the moment, because it is outside our primary requirements. I expect derivation by list would benefit from XPath2’s extra power. Derivation by union needs more thought.

But at least this puts us in the position where I think (have I missed something? never impossible!) we can say that Schematron’s power to validate datatypes is strictly more power than XSDs power for datatypes derived by restriction; Schematron (i.e. using Xpath2) can express all the XSD constraints and more.

But is Schematron more powerful to model type derivation? We want to be able to draw pretty diagrams of type derivation. Well, actually because derivation by restriction is simply implemented by abstract rules, in fact Schematron is equally capable of modeling the derivation structure. And, if we add @role attributes to the assertions with the name of the facet being restricted, actually Schematron models the facet system too: to the extent that (if you know the particular conventions used) you could re-generate versions of the original XSD datatype declarations.

But is Schematron better for diagnostics? Well, here comes the rub. In fact, for the datatypes Schematron does not bring any great improvement, in itself, in the kinds of diagnostics that can be generated by an XSD system that was targeted at humans (does any exist?). It does potentially bring a lot more ease of customization (compared to compiled XSD validators, but this is a benefit of scripting), but basically it is just working with a fairly well-enumerated set of properties, in the facets. We will see that it has a lot more scope for smarter diagnostics when validating so-called complex content.

And, we are not necessarily restricted to even XPath2’s power. It is possible to use an extended version of the query language that invokes functions from the Java (or Eiffel or whatever) platform. But this goes beyond our modest scope of a fairly complete implementation of XSD in a handful of XSLT scripts!

Finally, there is a little potential wrinkle here that needs to be worked out. What if our value for imametadataman is -333: we will get an assertion failure both for the byte constraint and the >32 constraint. There is a danger that a multiply derived datatype will generate a flood of redundant error messages. There are two answers: one is to say “we already treat the built-in derived types as single abstract rules, so there won’t really be much multiple derivation with the same facet, its not a big problem!” Another answer is that the assertion for a facet restriction should only test the actual restricted range, and not any range for the base type. So the assertion for >32 also cops out for data >256 and leaves assertion failure for the base type’s abstract rule to provide. (I think this second approach is nicer.)

M. David Peterson

AddThis Social Bookmark Button

So I’m not sure if I should claim the title as my *BEST TITLE EVER* or one of life’s most embarrassing moments.

Guess time will tell, ;-) and in the mean time: Having a few spare cycles now that CMJ is over and the death march to the Nov. 1st amp.fm private beta release is at worst a brisk walk/jog in the park as far as feature completion is concerned (the Jan. 1st, 2008 @ 12:00:01 public release is a different matter all together, but even that isn’t going to be anything like that last 8 months have been), I took a few moments to catch up on my most favorite product and company of all time,

Saxon and Saxonica

In the below linked post to the Saxon-Help mailing list you will find a link to the resources file that contains the following overview of the new Java API called “snappy” which, as Dr. Kay points out, “… is closely modelled on the successful .NET API.”

*SWEET*! Well, sweet from the perspective that my brothers and sisters in software development in which use Java as their primary development environment can now understand just how good we .NET developers have had it over the last 20 or so months since Dr. Kay first introduced the Saxon on .NET product. To each of you, I can assure you of one thing,

Snappy’s gunna’ *ROCK YOUR WORLD*!

Thanks, Dr. Kay!

Details follow.

Rick Jelliffe

AddThis Social Bookmark Button

Because we are using XSLT2 as our query language for the generated Schematron schema, validating built-in simple types from XSD is almost trivial. If you want to validate that, say, an element is a valid boolean, then we can use the test . castable as xs:boolean

What we get is an abstract rule declaration for each built-in type.

<sch:rule abstract="true" id="xsd-datatype-boolean">
  <sch:assert test=". castable as xs:boolean">
    <sch:name/> elements or attributes should have an xs:boolean type value.
  </sch:assert>
</sch:rule>

which can then be used by an element like this: say the element imametadataman is boolean:

   <sch:rule context="imametadataman">
     <sch:rule extends="xsd-boolean-datatype"/>
   </sch:rule>

(What the optimization in the previous entry in this series does is merge types, so that instead of multiple rules we can just have multiple combined. )

   <sch:rule context="imametadataman | ockadocknocka ">
     <sch:rule extends="xsd-boolean-datatype"/>
   </sch:rule>

So hurray for XSLT2 and XPath2!

Not so fast Boy Wonder

There is a rub, however. XSLT2 defines a basic conformance level (which is what the free SAXON XSLT2 transformer uses) that uses the basic XPath2 features. However, the XPath2 working group apparantly went mad with their desire to make life simple for implementers, and decided the basic level of XPath2 would understand (for castable) the built-in primitive type of XSLT but not the built-in derived types. Err, well except for integer. So then of course, because it is silly and confusing for them to be missing, the diligent implementer like Michael Kay of SAXON has to add support as a custom extension: no-one’s life is made simpler.

So in order to use castable without the gratuitous ommissions, it means that SAXON has to be invoked with a special attribute, which in turn has meant I have had to alter the Schematron skeleton to generate that code. (I’ll release it in the next few days.) I hope the XPath2 committee realizes that the more distinctions they make, the more complex their technology and the more difficult for us punters. I am less than impressed. Boo for XSLT2 and XPath2!

Peek at the code

Anyway, here is the basic code, which is part of the larger converter script. This is the most straightforward part of the whole project! Hurray for XSD Datatypes! First we have a list of all the type names, so we can refer to them later. Move constants to headers!

<xsl:stylesheet version="2.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns:sch="http://purl.oclc.org/dsdl/schematron"
>

<xsl:output method="xml" encoding="UTF-8" indent="yes" omit-xml-declaration="no"/>

<!-- supported by Basic XSLT 2.0 processor and XPath 2.0 -->
<xsl:variable name="standard-datatypes">
	<datatype>anyAtomicType</datatype>
	<datatype>anyURI<</datatype>
	<datatype>anySimpleType</datatype>
	<datatype>anyType</datatype>
	<datatype>base64Binary</datatype>
	<datatype>boolean</datatype>
	<datatype>date</datatype>
	<datatype>dateTime</datatype>
	<datatype>dayTimeDuration</datatype>
	<datatype>decimal</datatype>
	<datatype>double</datatype>
	<datatype>duration</datatype>
	<datatype>gDay</datatype>
	<datatype>gMonth</datatype>
	<datatype>gMonthDay</datatype>
	<datatype>gYear</datatype>
	<datatype>gYearMonth</datatype>
	<datatype>hexBinary</datatype>
	<datatype>integer</datatype>
	<datatype>QName</datatype>
	<datatype>string</datatype>
	<datatype>time</datatype>
	<datatype>untyped</datatype>
	<datatype>untypedAtomic</datatype>
	<datatype>yearMonthDuration</datatype>
</xsl:variable>

<!-- not supported by Basic XSLT 2.0 processor -->
<xsl:variable name="extended-datatypes">
	<datatype>byte
	<datatype>ENTITIES
	<datatype>ENTITY
	<datatype>float
	<datatype>ID
	<datatype>IDREF
	<datatype>IDREFS
	<datatype>int
	<datatype>language
	<datatype>long</datatype>
	<datatype>Name</datatype>
	<datatype>NCName</datatype>
	<datatype>negativeInteger</datatype>
	<datatype>NMTOKEN</datatype>
	<datatype>NMTOKENS</datatype>
	<datatype>nonNegativeInteger</datatype>
	<datatype>nonPositiveInteger</datatype>
	<datatype>normalizedString</datatype>
	<datatype>NOTATION</datatype>
	<datatype>positiveInteger</datatype>
	<datatype>short</datatype>
	<datatype>token</datatype>
	<datatype>unsignedByte</datatype>
	<datatype>unsignedInt</datatype>
	<datatype>unsignedLong</datatype>
	<datatype>unsignedShort</datatype>
</xsl:variable>

...

Now generate a set of abstract rules for each of these types. The unrestricted string type never needs validation, so its assertion test is always true(). We also generate a custom diagnostics element for each abstract type too.

	<xsl:for-each select="$standard-datatypes/datatype">
		<xsl:variable name="dataType" select="."/>
		<sch:rule abstract="true" id="{concat('xsd-datatype-', $dataType)}">
			<sch:let name="norm" value="normalize-space(.)"/>
			<!-- Facet: check if it is a float type -->
			<xsl:choose>
				<xsl:when test=" $dataType = 'string' ">
			<!--  strings don't need checking -->
			<sch:assert test="true()"
				diagnostics="{concat($dataType, '-diagnostic')}">
				<sch:name/> elements or attributes should have a </xsl:text>
				<<xsl:value-of select="$dataType"/><xsl:text> type value.</xsl:text>
			</sch:assert>
				</xsl:when>
				<xsl:otherwise>
			<sch:assert test="{concat('$norm castable as xs:', $dataType)}"
				diagnostics="{concat($dataType, '-diagnostic')}">
				<sch:name/><xsl:text> elements or attributes should have a </xsl:text>
				<xsl:value-of select="$dataType"/><xsl:text> type value.</xsl:text>
			</sch:assert>
			</xsl:otherwise>
			</xsl:choose>
		</sch:rule>
	</xsl:for-each>

And here is the code for generating a diagnostic element. The intent is that a user can tailor these if needed. (In Schematron we make a distinction between the assertion text, which is a positive statement of what should be true in the document, and the diagnostic, which contains specific messages for describing, locating and correcting the problem.)


<!-- generate disgnostics for standard datatypes check -->
<xsl:template name="generate-standard-datatypes-diagnostics">
	<xsl:for-each select="$standard-datatypes/datatype">
		<xsl:variable name="dataType" select="."/>
		<sch:diagnostic id="{concat($dataType, '-diagnostic')}">
			<xsl:text> "</xsl:text><sch:value-of select="."/>
			<xsl:text>" is not a value allowed for xs:</xsl:text>
			<xsl:value-of select="$dataType"/><xsl:text> datatypes.</xsl:text>
		</sch:diagnostic>
	</xsl:for-each>
</xsl:template>

And finally we use it when we find an element with a built-in simple type:

<sch:extends rule="{concat('xsd-datatype-', substring-after($baseon, ':'))}"/>

where $baseon is the prefixed built-in simple type name.

On top of this, of course, there is great scope for adding much better diagnostics for problems the datatypes. But not yet.

Rick Jelliffe

AddThis Social Bookmark Button

I’m going to jump to the end now. Here is a little XSLT stylesheet that is a little specialist, and probably not much use outside this application, but if anyone else is autogenerating large Schematron schemas using lot of simple abstract rules, it may be useful.

Our XSD to Schematron implementation generates large Schematron schemas. We knew they would, but small is beautiful. This script trims the generated schema down by merging some kinds of rules.

We use Schematron abstract rules for handing data typing: there can be quite a lot of rules containing just the same sch:extends element and value. In a trial for merging these we brought a 9000 line generated schema down to about 5000 line; not bad.

Download file COMPRESS.XSLT

M. David Peterson

AddThis Social Bookmark Button

So a lot has been written about Blip Messaging here on XML.com, but not a lot of action has been seen. Last week @ the College Music Journal Festival in NYC the world got its first taste of what Blip Messaging is all about,

The above image showcases an overlay of blip messages on top of Google Maps/SOHO/Manhattan that highlights the CMJ-related shows and venues taking place the night of the 16th, a selection from a list that included 1060+ bands playing at 60+ venues over the course of the entire week. It also represents the “communicate”[1] page for Ume, a band headed by Lauren Larson, wife of Eric Larson who plays bass (their long time friend Jeff on drums) who played “The Tank” on Thursday night. A *ROCKSTAR* hacker as well as musician, Eric works with us @ amp.fm, spending a good portion of his time the week before building a text-messaging based blip search engine (e.g. text “Ume” to shows@amp.fm and get all of Ume’s shows (time(s), location(s), etc. in response) that we demoed during the trade show as well.

So then what’s this all have to do with XML?

M. David Peterson

AddThis Social Bookmark Button

To be continued? That’s what I intend to find out.

More when it seems appropriate to report more.

M. David Peterson

AddThis Social Bookmark Button

So this last week was spent in New York City @ the College Music Journal Music Marathon for the pre-launch of The Viberavetions Project (( sonic|radar )) rage.fm amp.fm which in and of itself was an *AMAZING*, successful experience. More on that to come.

In the mean time,

Rick Jelliffe

AddThis Social Bookmark Button

The ISO/IEC JTC1 committee on Document Description and Processing Languages has had an interesting couple of years, what with all the controversy and new members. But for the last few months, all ballots have failed due to lack of votes. Not lack of “yes” votes, just plain lack of votes. The reason is, obviously, because most of the committee members of new National Bodies (standards organizations from participating countries) that have nominated to be active on SC34 issues are only really interested in two issues: ODF and Open XML.

No interest in anything else; no vote on anything; nothing else progresses. It is easy to castigate NBs for not living up to the obligations they signed up for, but the more productive way to approach this is that the customer is always right: SC34 needs to have some reorganization so that people interested in just one thing don’t stymie other projects. When various SC34 people were discussing this over the last week, I suggested that perhaps what was needed was a completely new SC to handle it: that would leave SC34 free to continue with “enabling technologies” rather than application technologies. A less radical proposal came up, which was merely to make a new Working Group (WG4); apparently NBs can nominate which standards or WGs they are interested in, so this can allow partition. Something will happen.

I was very amused to see this on my second favourite trash website:

Is further proof needed that these are not good-faith members? How to get rid of these MSFT stooges?

So lets actually look at the NBs that failed to vote, and compare to their vote on DIS 29500 (Open XML) as far as I can figure it out. I’ll use the ballot results on the recent NVDL draft corrigendum as an example.

The countries that voted Yes on DIS 29500 but failed to vote on NVDL DCOR = 12:
Bulgaria, Côte-d’Ivoire, Cyprus, Egypt, Kenya, Kazakhstan, Lebanon, Malta, Pakistan, Sri Lanka, Turkey, Venezuela,

The countries that voted No on DIS 29500 but failed to vote on NVDL DCOR = 9:
Brazil, China, France, India, Korea, Norway, New Zealand, Thailand, South Africa

The countries that abstained on DIS 29500 but failed to vote on NVDL DCOR = 2:
Chile, Trinidad and Tobago,

So both sides are equally slack. What did you expect the result to be? If you expected it to show that the MS stooge countries were pretty bad, while the valiant anti-OOXML forces were pretty good it shows you have drunk the Kool-aid, with all gentle respect. Some-one says something based on no objective evidence, but if it accords with what has been said enough times before, people think “That sounds about right”: but what if what was said before also had no objective evidence, that there is a chain or ripple of make-believe and demonization that merely emotifies foregone conclusions?

Actually, I think the whole way of thinking about this in blame terms is unhelpful. It is certainly very frustrating for SC34 members to be treading water: it means real money and effort wasted by those involved. May you live in interesting times! is the curse we are suffering from :-)

But ultimately this is just a problem of poor organization at the NBs by the NB bureaucrats: they should be checking that their local committees are indeed voting on the ballots before them, and they should be sending in abstain votes otherwise. It is ultimately a matter of mechanism.

So we have a confluence of three things. First, we have some new National Bodies who need to get their ballot governance procedures into shape: I think we should be highly tolerant of the antics of fledglings. Second, we have a committee structure that does not reflect the interests of members: there are some very mature and deeply involved NB who didn’t put a vote in, not just newcomers; the WGs need to be organized to reflect this division of interest. And thirdly, despite the previous two things, when the national bodies signed on to SC34 (particularly the “participating” P countries more than the Observer “O” countries) they did incur an obligation to vote, and they need to pull up their socks right away and participate: even if it means just sending in a official “abstain” on each ballot, that doesn’t derail the process.

Michael Day

AddThis Social Bookmark Button

Embedding XML islands inside HTML documents is an old idea, and lately the debate about how to standardise this in HTML5 has been heating up again. As someone working on an HTML to PDF converter with strong XML support, I have a keen interest in the outcome of this debate. It would be very helpful if HTML and XML could be mixed and matched as necessary. So let me throw my five cents into the ring. (It would be two cents, but in this decimal age that would round down to zero).

David A. Chappell

AddThis Social Bookmark Button

When: Thursday Oct 18, 12 noon ET, 9AM PT, 1900 GMT

Tomorrow I’ll be joining in with other leading industry vendors to discuss SOA infrastructure in ESB-Con IV. Don’t miss it!

ESB-Con is a half day virtual conference, which includes presenters from BEA, IBM, Oracle, and Progress. The day starts with a 1 hour panel discussion where my Oracle cohort Dave Shaffer will be there to help answer the latest rendition of the “5 tough questions”, which the ESB-Con organizers are notorious for asking.

Immediately following that, I’ll be up next for a 1 hour presentation on what we’re calling “Next Generation Grid Enabled SOA”. After which will follow the other presentations from the other participating vendors.

To sign up and join in, go to - http://www.esbcon4.com/index.asp?type_content=home

and register.

Hope to see you there!

Dave

Rick Jelliffe

AddThis Social Bookmark Button

Andy Updegrove has a good article Words, Standards and Torture: What’s in a Name that is well worth the click IMHO.

There are some things that are much more important than whether your file format says <tomato> while mine says <tomato>.

Rick Jelliffe

AddThis Social Bookmark Button

Using and Referencing ISO and iEC standards for Technical Regulations is the name of a new report just out (late Sept 2007) from ISO and IEC. I think it gives a really useful introduction to ISO and IEC and the breadth of standards (indeed, the vision of standards: it is written primarily for regulators, but it is good for the rest of us too, I think. (Its free, in glossy PDF.)

It is interesting to see the continuing emphasis on environmental considerations: now it is in the first of the bullet lists for how ISO and IEC want to position their standards. The report takes seriously that different countries may have different requirements,

At first sight …. Any goods and services that have the potential to cause serious harm to the health or safety of the population, or to the environment, would seem to be obvious candidates for technical regulation. However, the differences between countries mean that this concept can be applied differently.

The report details the increasing adoption of ISO and IEC standards: both because more societies are industrializing and because more standards are dealing with issues for even poorer societies.

I won’t try to read between the lines here, but obviously some of the issues that came up in various parts of the world concerning Office Open XML and ODF were in the mix of things that have influenced this document. (ISO had their 30th general assembly recently.) However, I know that last week’s JTC1 meeting in Australia explicitly devoted time to considering some matters arising over the year from fast-tracked standards; I have talked informally with a delegate, but I won’t comment until there are some public documents from JTC1: lets just say that JTC1 has clarified some issues that other people have thought were cloudy or murky! (Watch this blog!)

Let me just quote this part of the document (and remembering that ISO and IEC standards may be more about health, safety, environment and so on, while IT issues are often delegated to ISO/IEC JTC1 which is for most purposes a separate organization):

In some countries, for example, the response by authorities to a specific need for technical regulation may be a general declaration that certain standards in a subject area must be mandatory. It is therefore vital that a portfolio of ISO and IEC standards exists to help such countries.

However, the big issue is to tackle a central wrinkle with ISO standards, They are voluntary standards, yet many countries adopt them as regulation or require or favour them. So they are developed with one set of requirements but may be used with another. I think section 6.1 is really important, and it is something I have been banging on about: in particular

Regulators will need to decide what level of checks they wish to put in place to ensure the standard is suitable for use and addresses their needs.

In other words, adopting a standard because it is an ISO standard is not good enough: a regulator needs to be able to justify why it is the suitable and effective standard for their use. Bingo. ISO does not replace or supplant government. It does not set government policy (even though many international standards constitute public policy which is why they are suitable.) And again in s7.1

Using ISO and IEC standards for technical regulation does not imply that regulators have reduced power or that they delegate responsibility to other parties.

The section on conformity assessment was interesting to me, because it has come hot on the heels of some email with an interesting Swedish open-source activist discussing whether ISO allows reference implementations (I gather that it does not, because it is verboten to have alternative specifications for the same thing, but I am open to counter-examples) . He raised the example of SQL, where NIST abandoned their test suite, on cost basis. Governments are so shortsighted on standards and conformance: all the major economic advantages of IT have come through standards of all sorts for more than a decade now, and governments still do things on a shoestring. Why aren’t procurement departments aggressively investing in standards development and conformance testing efforts, for example? Why aren’t the politicians and governance officers insisting on it? I expect they will come to the party, but it is already later rather than sooner!

The last part of the report gives examples of how standards fit in to various national contexts: one thing that hit home with me during all my travels earlier in the year was how each different country has a different history and approach to standards. However, in all of them the trend was clear: they thought standards can be helpful and want to encourage their development and adoption as appropriate for each nation. I think this is a very useful little report, especially for giving some balance to those of us participating in a debates that can get quite heated.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve just started looking into the ACORD schemas. These are the standards of choice in the English-speaking insurance world, from what I can gather (oops I am being too coy), and are quite meaty and mature now: the documentation runs to about 3,000 pages. Various little birds had told me that Schematron had been used to augment the XSD schemas in several places, so I thought it would be interesting to look at why.

This is not to point out deficiencies in XSD (the facts can speak for themselves) but to look for the relative strengths of Schematron. This kind of data, of course, is very prone to have having several layers of rules for each user: business rules, occurrence rules that come from the forms used, systems limits, and so on: each of these can well be represented by Schematron schemas usually (or combined as different phases of the same schema.)

But lets just look at a three additional constraints above the ones that XSD schemas can represent, just from s4 Implementation Conventions of Acord Life, Annuity and Health Standard v2.17

The first interesting thing is the use of typecodes, explained in s4.4. ACORD documents are interesting because they want to use the same XSD schema and elements for each stage of processing. So when a form comes in, before it has been assessed, in some process all the data may be just treated as strings. Then when a datum has been assessed and possibly fixed up, then it can be marked with a typecode has having a certain data type, for example being a date (in 8601 form).

This is pretty much unfeasible in XSD: I don’t think we can use xsi:type for this, because IIRC the type nominated by xsi:type has to be derivable from the actual type specified in the XSD schema, and a date is not derived from String. (Maybe XSD 1.1 fixes this, it doesn’t matter.) In Schematron, it is easy: something like

<sch:pattern>
  <sch:title>Type codes</sch:title>
 <sch:rule context="*[@tc='1']">
  <sch:assert rule=".='true' or .='false' or .='1' or .=0'">
      A <sch:name/> element should be a boolean</sch:assert>
 </sch:rule>
 ...other rules for other typecodes...
</sch:pattern>

There is an interesting constraint in s4.13 that says that aggregate elements with no optional subelements should be omitted. This is not something that can be specified using grammars, since it makes the occurrence of a parent dependent on the value of a child. The Schematron assertion might be as simple as something like this:

 <sch:rule context="*[string-length(.) = 0]">
   <sch:assert test="not(*)">Aggregate elements should contain elements with content</sch:assert>
 </sch:rule>

In s4.14 it speaks of nested data ranges: the example they give is

it would not be valid for a PolicyProductInfo to specify an expiration date of 3/1/2005, while one of its child JurisdictionApprovals specifies an expiration date of 4/1/2005.

This is obviously quite trivial for Schematron, especially when you make life easier for yourself by using sch:let to parse the dates into fragments that make comparison easier.

TriSystems Infobahn have a brochure (PDF) on their approach for using Schematron with ACORD, for people who want more information. The ACORD schemas were developed with respected industry figure Daniel Vint as the senior architect: I see he is potentially nabbable for contract work now.

Simon St. Laurent

AddThis Social Bookmark Button

Living in a place that used to think of itself as the bright future of America, it’s strange to me how people think that particular places will be the bright future.

Rick Jelliffe

AddThis Social Bookmark Button

Martin Bryan has released the latest draft (second final committee draft) for ISO DSRL. He has an open source implementation available too: like some of the other parts of ISO DSDL (Schematron and DTLL) it is designed to be implementable on top of XSLT (XSLT2 in this case) however of course it can be implemented in Java or .NET or C++ directly too: DSRL is certainly suitable for building into a validator as a pre-processor, in a way a little analogous to OASIS XML Catalogs.

DSRL is a real missing piece in the puzzle: it provides a simple tool for remapping names in documents. This allows a more declarative approach than just using XSLT, and makes the task of mapping suitable for non-programmers. It fits into the schema ecosystem because it lets you rename the names in your document to suit that of a standard schema. It is only about 18 pages long, including examples, and easy to read.

We are used to thinking of things the other way around: we expect that allowing variability of element names should be an issue of type derivation if we are operating in the XSD mentality for example. (Actually, of course, the XSD mentality is a figment of the imagination, like all mentalities, and is just an extreme reduction which few people would really hold!)

But consider the case of where there is a schema made with English-derived names, but the users are going to be Chinese who don’t speak or read English. (The days when “educated” entailed “fluent in English” are long gone, just as the days when “educated” entailed “fluent in French”.) It would be much more straightforward, for localization, to have a simple mapping of names, than to ensure that the initial schema was rewritten so that every element was actually a substitution group.

More than this: we are used to saying that “syntax is easy; semantics is hard”. But even syntax is not easy without straightforward tools. Just last week I was working with an example of a company that was considering using an standard external schema, but wanted to use its own terms for things where they existed. Exactly a job for DSRL!

DSRL allows these kinds of mapping, and I think that like ISO Schematron and ISO NVRL it will progressively become part of the schema environment for standards, especially outside the English-speaking world, and especially because XML is the structured format “for the rest of us”. Even the most tragic XML Schema devotee realizes that there needs to be some limits to XSDs scope, and DRSL (and Schematron and NVRL) have a good fit even with XSD. (Of course, we designed them around RELAX NG, but standards are pragmatics not religions.)

DSRL allows the following remappings (organized into a series of maps that typically relate to an output namespace):

  • Element names (in an Xpath context)
  • Attribute names
  • Element simple values
  • Attribute values
  • Default values for elements or attributes (including the subelement after which the content may go, in the case of an element: mixed content is not ignored!)
  • Processing instruction targets
  • Entity names (e.g. for undeclared entity references)

In addition, a new XML-based syntax for defining entities is given. (When implemented on a systems (such a DOM) where undeclared entity references are maintained, the implementation is straightforward. When implemented on systems that would not allow this, then implementation would involve converting the XML form of declaration into <!ENTITY declarations and inserting them into the local subset of the prolog. Its just rewriting.)

ISO DSDL is a multipart standard that we have been progressing in a slow-and-steady fashion through ISO SC34 for the last few years. It is really what the SC34 WG1 working group does: ODF and OOXML are intrusions, part welcome but part unwelcome. (In fact, dissatisfaction at WG1 and SC34 with the fast-tracked standards is at a crisis point now and has been escalated to discussion at the JTC1 meeting in Australia last week: it probably deserves a blog item by itself, because it may introduce a wildcard to the OOXML BRM: what fun!)

If you have comments on ISO DSRL, send them in to your local national standard committee so that they can be raised during the discussions in Kyoto for SC34 WG1 in December. Contact your local standards body and ask them how to contact the chair of the local “ISO/IEC JTC1 SC34 equivalent” committee. Why not get involved yourself? Standards really are a chance for good technical people to influence technology in ways that reflect their needs, especially in medium-size countries where you might otherwise feel dwarfed by American technocratic power. If your business relies on XML and document standards, why not invest some time back? You may not think yourself brilliant enough or articulate enough or self-promoting enough or rich enough to get involved, but a lot of the real work is just quiet technical review and discussion, that can be done in your own home and in small friendly local committees, getting to know procedures and approaches, and by email.

Kurt Cagle

AddThis Social Bookmark Button

Efforts have been underway recently to develop a schema language for JSON, analogous to the XML Schema Definition Language (XSD) or RelaxNG languages in the XML arena. Similarly, a JSON transformation language is being proposed and bandied about in various AJAX circles as web2 developers attempt to take the best of what XML has to offer and recast it from the angle-bracket modality to the braced modality.

These efforts are intriguing, and for the most part people within the XML community are now affecting the same rather confused expression on their face that I remember seeing on the SGML generation as they watched the young turks of the XML movement push their view of the world out to the world - “Didn’t we already DO that?”

Rick Jelliffe

AddThis Social Bookmark Button

W3C’s Services Modeling Language group has two new drafts out: Services Modeling Language 1.1 (latest version) and Service Modeling Language Interchange Format Version 1.1 (latest version). From the abstract

This specification defines the Service Modeling Language, Version 1.1 (SML) used to model complex services and systems, including their structure, constraints, policies, and best practices. SML uses XML Schema and is based on a profile of Schematron.

SML comes out of the XML activity at W3C, not the WS-* activity, so it seems more aimed at working on top of POX (plain ole’ XML) systems. It has representation from IBM, Sun, BEA, CA, Intel, HP and a Microsoft. WS has a bad rep at the moment for over-engineering, but that is partly because many people have problems that they want to be solved by the almost-simplest possible technology. The would prefer erring on the side of modesty rather than grandiosity.

SML has nothing directly to do with services despite the name, and nothing to do with modeling for that matter either: that just seems to be the use-case that has driven the development of a more general technology that takes seriously the problem How do we validate systems of documents, including documents held in multiple files and documents that transclude other documents?, which seems to be an entirely practical question to me: this is the kind of use case that should be driving XSD and DSDL development IMHO.

As I understand it, the recipe for SML is roughly

  • Systems or services are modeled using XML documents which are either definition documents or instance documents
  • Definition documents are either schema documents that use W3C XML Schemas (with a completely reworked version of XSDs key/keyref mechanism allowed under appinfo that handles multi-file references), or rules documents that use ISO Schematron (vanilla XSLT query language with a slightly extended XPath). A whole Schematron schema is plonked into the appinfo element rather than using the Eddie Robertsson’ minimal form for embedded Schematrom, however, they use a rule context of “.” which works out the same. A nifty attribute is added to allow better localization.
  • The model documents are validated against the instance documents
  • A little error report container, to hand back bad data.
  • A kind of transclusion link to allow documents to reference other parts: yet another replacement for entity references! The interesting idea is that the refered-to fragments are not substituted in the document, so we have two PSVIs: the PSVI of the document transcluded and the PSVI of the document without the transclusion. A deref() extension is provided for XPaths: I supose this is something to add to the list for the Schematron skeleton implementation. XPointers can be used for references: I see that, of course, it is the restricted XPath that doesn’t include the range-to functions that killed XPointer. The link allows the element name at the other end to be specified.
  • The Interchange Format (SML-IF) provides containers and accoutrements for bundling everything up into a single file for interchange

I’ll write to the SML group, because they have the use of sch:schema/@queryBinding slightly wrong. It is intended to clearly label what query language is used. The SML draft says that it must be “xslt” however actually they use an extended xslt. What they need is a little Query Language Binding document (which only needs to be a paragraph) to define a query language binding name like “xslt+sml” or whatever. If users don’t use deref() they won’t need to do anything, but it is better to catch schema errors early rather than having obscure XPath messages.

The downside of SML is that it again (as did WSDL’s extensions) shows that XSD, despite being so large, is still simply not capable enough: a non-trival language should be able to handle non-trivial problems otherwise what is the point? Schematron’s approach of explicitly allowing different query languages (and providing guidance on profiles and embedded vocabularies) is much more flexible and practical, IMHO.

In other Schematron news, I see that it is being used by the RELAXED online HTML validator (SourceForge). This project is a good demonstration of using the ISO DSDL little schema langauges together: NVDL, RELAX NG, and Schematron. NVDL and RELAX NG are also used in Open XML, and ODF was defined using RELAX NG. For comments on making standards from Schematron schemas, see this blog item.

Uche Ogbuji

AddThis Social Bookmark Button

I finally created a FOAF file for myself. I exported my LinkedIn contacts (that link should work for you if you’ve recently logged into LinkedIn) to “vCard (.VCF file)”. I then imported the vCard into FOAFgen. Result is here. I think I’ll write a Python script that works with the vCard file and the FOAF to handle new or updated contact entries. I must say, FOAF is really ugly (as if, unfortunately, so much RDF/XML), so I’ll have to be closing my eyes a lot as I write tools to avoid my having to stick my fingers into it. I guess the saving grace is that everything else is even uglier (including hCard).

Rick Jelliffe

AddThis Social Bookmark Button

Here is some XSLT scripts for macro-expanding a set of XSD schemas into a single file with references removed, as a more optimal form for schema interrogation and conversion.

Converting an XSD schema into Schematron involves three stages:

  • Preparing the XSD schemas so they are in an optimal form for transforming out from
  • Converting the grammar and datatype constraints of this prepared schema into Schematron for elements and datatypes
  • Converting the other constraints such as KEY and ID into Schematron.

This blog item gives some beta XSLT code for the first part. A pipeline of three XSLT scripts are used:

  • INCLUDE: starting from a schema, substitute all the included and imported schemas in-place. (<redefine> is not supported in this version.)
  • FLATTEN: move schemas for different namespaces to the top-level, removing duplicates.
  • EXPAND: substitute references to complexType, group, attributeGroup and remove declarations (substitution groups and wildcards are not supported in this version.)

The result is a document with a top-level element of <schemas> contain <namespace> elements each containing an XML Schema module for a single namespace. These modules contain element, attribute and simpleType declarations, but structural references have been replaced. This resolved form makes the job of converting to Schematron much easier, because there are fewer cases to consider and simpler paths. And all the schemas are gathered into a single file.

I have put the beta XSLT files here. It will go to sourceforge or somewhere eventually: watch this space. But I have been frustrated by the lack of tools that expand out XSD schemas, so this code may be useful for other things (I may rewrite Topologi’s XSD to RELAX NG converter to use this as the front end, for example):

I would like to acknowledge JSTOR as the sponsor for this code. Thanks to Matt Stoeffler. It is licensed under GPL as open source.

M. David Peterson

AddThis Social Bookmark Button

Amazon.comAmazon S3 / Amazon S3 SLA

Effective Date: October 1, 2007

This Amazon S3 Service Level Agreement (”SLA”) is a policy governing the use of the Amazon Simple Storage Service (”Amazon S3″) under the terms of the Amazon Web Services Customer Agreement (the “AWS Agreement”) between Amazon Web Services, LLC (”AWS”, “us” or “we”) and users of AWS’ services (”you”). This SLA applies separately to each account using Amazon S3. Unless otherwise provided herein, this SLA is subject to the terms of the AWS Agreement and capitalized terms will have the meaning specified in the AWS Agreement. We reserve the right to change the terms of this SLA in accordance with the AWS Agreement.
Service Commitment

AWS will use commercially reasonable efforts to make Amazon S3 available with a Monthly Uptime Percentage (defined below) of at least 99.9% during any monthly billing cycle (the “Service Commitment”). In the event Amazon S3 does not meet the Service Commitment, you will be eligible to receive a Service Credit as described below.

Simon St. Laurent

AddThis Social Bookmark Button

Yes, I know that looking 43 years ahead is ridiculous for technology. But might it make sense for a place?

Rick Jelliffe

AddThis Social Bookmark Button

The schedule is up for the US XML 2007 conference in Boston in December, and it looks a cracker.

Here are the papers that seem interesting to me:

  • Monday 10:30. XML Hardware Eugene Kuznetsov (IBM): an area with a lot of interest to me.
  • Monday 2:00 Analysis of an architecture for data validation in end-to-end XML processing systems
    John Clark (Cleveland Clinic Foundation), Chimezie Ogbuji (Cleveland Clinic): mentions Schematron.
  • Monday 2:45 Implementing Healthcare Messaging with XML
    Marc de Graauw (Marc de Graauw IT): seems to come to the same conclusions (from an XSD perspective) that underpin Schematron’s phases mechanism, that layering and separation of concerns is important.
  • Monday 4:00 XProc: An XML Pipeline Language Norman Walsh (Sun Microsystems, Inc.):
    the program notes say that XProc has changed recently, so it would be interesting to see how it is going. ISO DSDL support is one of its use cases.
  • Monday 4:45 XML and XPath in the Wild Adam Lee (Stanford University):
    empirical studies of documents found on the web are a very weak basis for saying anything about XML documents, since XML is used (SGML-like) for so many back-ends, but it is really important that we have them nevertheless, and it will be great to see if we get some real work on document metrics happening. Many people are not so aware of the technique, but document metrics can clarify job estimation and costing: Topologi has a customer that uses a simple count of XPaths (from our XML Detective utility) for this, and the Document Complexity Metric is of course useful too.
  • Tuesday 9:00 Semantic data models and business context modelling Anthony Coates (Miley Watts LLP): Tony popped in to the office here a month ago when he was in town, and I share his scepticism that adopting a markup framework based on explicit modeling of semantics according to some higher abstract model necessarily buys you much that simple labelling does not allow you: his paper’s abstract mentions synchronizing information “across the technology boundary between the semantic and non-semantic models” which sounds very clever or very dumb, but very useful either way. I like that Tony’s paper comes out of experience with the UN/CEFACT standards work.
  • Tuesday 11:00 Case Notes from a Vulnerability Assessment of a Bank’s Web Services Mark O’Neill (Vordel): has an interesting comment the bank’s attempt to apply preventative security measures, such as SSL and XML Schema validation, actually proved to provide a false sense of security, and in fact introduced a number of security vulnerabilities of their own. (XML firewalls or test services interest me because of Schematron and Topologi Interceptor, of course.)
  • Tuesday 2:00 Building a XSLT Processor for large documents and high-performance. Lan Yi (Intel): I suspect this is another hardware talk, but it is a fascinating topic whether hard or soft.
  • Tues 2:45 Streamlining the Information Lifecycle in Process and Discrete Manufacturing with XML John Klaren (JustSystems Inc.): this should be an interesting talk about standards and IETM, as long as the speaker steers clear of product pitching.
  • Wed 9:00 XML in Support of the Democratic Process Dale Waldt (aXtiveminds): again, a good chance to catch up with recent developments. Legal and parliamentary publishing are traditional SGML areas and it will be interesting to see if XML has brought some new things to the mix.
  • Wednesday 9:45 Separating Mapping from Coding in Transformation Tasks Wendell Piez (Mulberry Technologies, Inc.) One of the things I have been involved with, for Allette Systems, is trying to figure out how to bring better software engineering practices to our XSLT development: papers like this one (and Tony Graham’s tutorial later) look like signs that some maturity is coming to the industry.

This year the tutorials are all on Wednesday afternoon, but they look like a great collection: Elliot Kimber does DITA, Michael McQueen does XSD, Debbie Lapeyre does Schematron, and Michael Smith and Tony Graham are doing various XSLT sessions. I was interested in Tony’s outline, particularly that he mentions XSLTV program verification.

But it is rare that almost every session has something that grabs me: I wish I were going!

Rick Jelliffe

AddThis Social Bookmark Button

I was reading the Ant (the make system) documentation today, and in the section on copy I came across this horrible note:

Important Encoding Note: The reason that binary files when filtered get corrupted is that filtering involves reading in the file using a Reader class. This has an encoding specifing how files are encoded. There are a number of different types of encoding - UTF-8, UTF-16, Cp1252, ISO-8859-1, US-ASCII and (lots) others. On Windows the default character encoding is Cp1252, on Unix it is usually UTF-8. For both of these encoding there are illegal byte sequences (more in UTF-8 than for Cp1252).

How the Reader class deals with these illegal sequences is up to the implementation of the character decoder. The current Sun Java implemenation is to map them to legal characters. Previous Sun Java (1.3 and lower) threw a MalformedInputException. IBM Java 1.4 also thows this exception. It is the mapping of the characters that cause the corruption.

On Unix, where the default is normally UTF-8, this is a big problem, as it is easy to edit a file to contain non US Ascii characters from ISO-8859-1, for example the Danish oe character. When this is copied (with filtering) by Ant, the character get converted to a question mark (or some such thing).

There is not much that Ant can do. It cannot figure out which files are binary - a UTF-8 version of Korean will have lots of bytes with the top bit set. It is not informed about illegal character sequences by current Sun Java implementations.

One trick for filtering containing only US-ASCII is to use the ISO-8859-1 encoding. This does not seem to contain illegal character sequences, and the lower 7 bits are US-ASCII. Another trick is to change the LANG environment variable from something like “us.utf8″ to “us”.

Now, lets put aside the question of why anyone would copy using text operations rather than binary operations. The larger question is why one earth, in 2007 and ten years after XML came out, we are still using text files that don’t label their encoding?

Let me put it another way: if you make up or maintain a public text format, and you don’t provide a mechanism for clearly stating the encoding, then, on the face of it, you are incompetent. If you make up or maintain a public text format, it is not someone else’s job to figure out the messy encoding details, it is your job.

If avoiding the issue is the wrong approach, what is the right approach? One of the right approaches is to adopt Unicode character encodings (UTF-8. UTF-16) as the only allowed formats. (This is what RELAX NG compact syntax does for example.)

Another right-ish approach would be for every text format to adopt explicit labelling: the disadvantage of this however is that, like HTML’s <meta> element, that it is unsatisfactory to have to parse deep in the document in order to be able to parse the document. And to have recognition software that understands the conventions of each format is impossible.

However, it is possible to generalize XML’s encoding header into a delimiter-independent form that can be adopted . My 2003 suggestion for XTEXT gives the details. I don’t see any disadvantages to XTEXT: in the post-XML world, programmers have moved from being puzzled by encoding labels to understanding that are a valuable part of the furniture.

An XTEXT-aware Ant (or default readers that recognize XTEXT conventions) would allow the problem to go away incrementally, as developers and maintainers adopt it. But the trouble is some mix of a lack of leadership by people developing or maintaining text formats: they don’t see themselves as part of a larger community of text users, I guess, or believe that there is any advantage in participating in a larger community. I suspect that this ultimately because the developers of text formats are people who think in terms of ASCII or who don’t have contact with use-cases where there are different character sets possible. The problem is pushed downstream. Not only incompetent but lazy?

Am I being too harsh? I hope so. In particular, in this day and age of international standards, the burden for fixing this has shifted from the developers to user-community representatives: it is something that governments and non-ASCII-locale standards bodies need to consider.

When I say “You are incompetent” an entirely satisfactory rejoinder back at me is to say “Yes I am: I can only respond to demand from people who are affected by this issue, and the standards and procurements processes are the place for these demand to be manifested!”

But buck-passing won’t fix anything. If we know the problem won’t go away, why cannot we (we consumers or we developers) deal with it?

Uche Ogbuji

AddThis Social Bookmark Button

So my paper was accepted at XML 2007. I look forward to seeing some of you folks there. The schedule looks interesting not just because I see topics that I enjoy, and some about which I want to learn, but also because I see a lot of stuff that makes me think: “Oh, it’ll be fun to debate that one”. Anyway for my part I’ll be presenting “XML Data modeling for Web publishing workflow”, which is a pedestrian, but accurate title. I’ve been proud of th