January 2008 Archives

M. David Peterson

AddThis Social Bookmark Button

As sad, desperate and/or pathetic as it may sound, I often times will find myself rooting around the Mono Project SVN repository looking for buried treasure; One of the intended side effects of open source software is the freedom and encouragement to experiment, so there’s a tendency for those willing to dig to find things that haven’t made it into an official release, but they’re both useful and useable tools, libraries, applications, etc. none-the-less.

Today, apparently, is my lucky day (though I’m surprised I hadn’t noticed this before given Eno did the initial check in 7 months ago),


 Assembly/	 81031	 7 months	 atsushi	 initial checkin.
  Mono.Xml/	 81031	 7 months	 atsushi	 initial checkin.
  Mono.XsltDebugger/	 81155	 6 months	 atsushi	 2007-07-02 Atsushi Enomoto <atsushi@ximian.com> * XsltDebugger.cs XsltDebugg...
  ChangeLog	 81031	 7 months	 atsushi	 initial checkin.
  Makefile	 81031	 7 months	 atsushi	 initial checkin.
  Mono.XsltDebugger.dll.sources	 81031	 7 months	 atsushi	 initial checkin.

Rick Jelliffe

AddThis Social Bookmark Button

I was chuffed to see the ODF Alliance quoting this blog in their new Alliance Response to Ecma’s Proposed Disposition of Comments on OOXML. And they seem particularly interested in getting good results on the Standards Australia issues AU-09, and AU-15, AU-23 which are issues I submitted.

I guess they love me now! Though not enough to mention me by name, I am the only person quoted who is left nameless merely one XML expert. Hmmm, “He who shall not be named”… Since Groklaw thinks that the mere linking to this blog with my name by collegues foreshadows bad things, it is only prudent. I suppose it will have to be a secret love.

Since they quote me, I hope it is not too much to look at their response.*

Procedural Irregularities

In their early material various claims are made which bear looking at in more depth. They say there are many “documented irregularities”, yet when ISO JTC1 looked at them they found no substance. Looking at the list on Wikipedia where is the actual evidence of this villainy?:

  • Portugal: a fixed working group size caused late-applicants to have sour grapes. Actually, the Portuguese already had expanded the size of that working group. Not chairs. The problem as such is the regularity not the irregularity, it seems: Sun and IBM didn’t like the rules. (Note the Wikipedia entry is biased.)
  • Sweden: MS withdrew within hours an mistaken inappropriate offer of support to 2 partners before the meetings and notified the Swedish body themselves before any votes. (Again the Wikipedia entry is biased: IIRC it was MS who reported it, not “it surfaced.”) Sweden ended up abstaining due a procedural SNAFU: a double count of a vote in a meeting where another meeting could not be convened in time. So what do we have? A cock-up, transparency, the correct channels notified, no votes affected: no smoking gun (unless there is material that hasn’t come out.)
  • In the Netherlands, the MS delegate voted one way, other people voted another way: again, a case of regularity not irregularity. (The Wikipedia entry is biased here:why is that substantial problem? Different national bodies have different rules depending on their bureaucratic culture and traditions apart from anything.)
  • In Switzerland, it seems discussions were limited to technical and editorial considerations. These are the only comments that can be considered by the BRM, as has been emphasized recently by Alex Brown, the BRM convenor. So the Swiss chairman had in fact completely legitimate view, as far as I can see, as far as what is in-scope for ballot comments; that other NBs might put out-of-scope material in their ballot responses might make them feel good but they don’t go anywhere. (The Wikipedia article does not mention the scope of ballot comments to provide some balance.)
  • Malaysia voting abstain is typical when there is no consensus. Australia did the same, it not an irregular procedure. If a NB submits their comments with the abstention, the comments get to the BRM and they become part of the mix, so no harm is done.
  • Cyprus joins late. The idea that one side is more remiss than the other in trying to stack SC34 is not evidenced by the numbers: they just came in different waves separated by a few months. Given that perhaps 2500 of the 3500 comments sent in by NBs are parroted comments from a mail-in campaign (i.e. not from a proper independent review) it would take a lot of chutzpah for the ODF Alliance to get too excited by this one.
  • Finally, in Norway MS asked its partners to participate. Again, no procedural irregularity at all.

I don’t know if pointing this out will have much effect. I think the point with the various bribery/corruption claims is that they have the necessary truthiness, so it doesn’t matter if none of them have any procedural irregularities.

5 Months?

ODF Alliance say there was only 5 months to review, yet there was a full year before then during the Ecma process for participation (e.g. by ODF Alliance and Ecma member IBM). Yet the draft was submitted in: December 2006 draft submitted and the ballot was in September 2007: that is at least 9 months. (And then there is the five months until the BRM for further looking at how to resolve the issues and the issues of other NBs.)

And after that comes the maintenance process, whatever form it will take: certainly it will have a pretty high premium on interoperability with ODF and other standards.

6,045 Pages

I have previously dealt with why raw page count is not a very fertile metric. There is so much duplication, so much whitespace and so many diagrams that the effective size for review is much smaller. Furthermore, the assumption that any large standard will not be reviewed with an international and national division of labour is, in my experience and certainly in this case, incorrect.

3520 Comments

The trouble with this number is that people then think “3520 flaws” rather than “750 individual issues and a lot of repetition”. Too many? In my blog On error rates in drafts of standards I have a good quote from Jim Melton, the editor of SQL, who has commented on his standards frequently getting thousands of comments. For a large standard, a good number of comments is an indication of real review, and says absolutely nothing good or bad about the general quality of the standard or the technology IMHO.

Seven Dwarfs

The ODF Alliance groups its response under 7 heads:

In short, the proposal does NOT address the critical need for: a.) review time; b.) harmonization, c.) a clear name; d.) a sound standard with no (new or old) technical errors; e.) interoperability; f.) support for legacy documents; and g.) consistency of “fixes.”

Lets have a look at each of them:

Review time

I have mentioned above that there is more review time than is often bandied about.

But the ODF Alliance argument here is that OOXML should be be standardized because of errors that were not found in DIS29500. This is a remarkably hopeful claim (perhaps a cunning plan): see falsifiability for a discussion on why it is shakey ground.

The strongest evidence would be if the (non-duplicate) flaw rates detected for DIS29500 were far in excess of the same for other standards. However, as the blog item above mentions, the numbers don’t go that way.

However, this is not to say that OOXML and ODF and PDF would not have been better submitted as Committee Drafts in the accelerated process to ISO/IEC JTC1 SC34. No-one is particularly enamored of any of the current fast-track processes.

Harmonisation

It is interesting that the ODF Alliance quotes Tim Bray that the world doesn’t need another way to express basic typesetting features. If it is so important, why didn’t ODF just adopt W3C CSS or ISO DSSSL conventions? Why did they adopt the odd automatic styles mechanism which no other standard uses? Now I think the ODF formating conventions are fine, and automatic styles are a good idea. But there is more than one way to make an omlette, and a good solution space is good for users.

My perspective is that harmonisation (which will take multiple forms: modularity, pluralism, base sets, extensions, mappings, round-trippability, feature-matching, convergence of component vocabularies, etc, not just the simplistic common use of a common syntax) will be best achieved by continued user pressure, both on MS and the ODF side, within a forum where neither side can stymie the legitimate needs of other.

Clear name

This is actually something that I have been pushing since early last year, in discussions with other SC34 people. It is part of the general observation that many of the problems with DIS29500 are not with the technology or the technical parts but can be fixed editorially: the scoping and conformance issues are examples. My point is not that “Office Open XML” is particularly confusing or that it should not continue as a brand name (not ISO’s business!), my point is rather that it is too similar to ODF/ODA/OpenOffice to be the name of the standard. I don’t know why the standard cannot have an extra part added to its name to be more descriptive. (And indeed if the plan to split out OPC to a separate part comes off, then the Ofiice Open XML really applies to the other parts so it may not be the best collective name.)

For example, the full name of the ISO Schematron is Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron.

But is this really a showstopper for the standard? Of course not: the brand OOXML is already out in the wild. And Alex Brown has indicated that this kind of issue might be at the bottom of the list for discussion at the BRM; it is the kind of thing where people are happy to spend days discussing, which Alex is clearly not going to allow. 120 people are not traveling from all parts of the world for a week to get the issues they have raised ignored because other people’s issues are taking a disproportionate amount of time.

Sound standard

This is where I (this blog) get quoted! The blog item was The design goals of XML.

Note the difference in approaches. My angle is “I think this is a problem, I hope it can be fixed.” Their angle is “He thinks this is a problem, therefore the whole process should be abandoned.”

I think there is a kind of bait-and-switch going on: to understand it you have to make the distinction in your mind between what a particular draft (e.g. DIS29500) says and the larger concept of what OOXML could be when fixed up (e.g. substantially the same, with the same design approaches, though different in details.) It is the difference between text and technology. Here is the ploy: first find a technical or editorial problem in the draft, then transfer this to OOXML as if it were intrinsic or necessary, then use it as evidence of the unreformability of OOXML, in which case there is no point fixing the draft since the whole thing stinks.

My POV, if anyone cares, is no different from what I wrote in 2005:

I read recently a criticism of the “Binary XML Infoset” project as polluting the stream. I believe the lesson to be learned from XML is not that “Everyone should use one format, it should be simple, it should be Unicode, it should use angle brackets” but the far more challenging “Respect-driven standards development produces really good and generally applicable results.”

Note in particular this:

when I read general, rather than technical, criticism of standards or standards bodies, I usually detect strategic sour grapes, where the organization or writer is trying to undermine a process that they cannot influence enough. XML wasn’t based on the mentality people who don’t or won’t use this are idiots but we want to add to the solution space.

All that being said, I think buried in this section is the germ of an entirely valid point: even things included for legacy reasons should be in standard notations. You have make a more specific judgment than legacy=good (as some Ecma some people are perhaps prone to) or legacy=bad (as some anti-ODF people are perhaps prone to).

For example, I have written about the integer measurement system EMU used in OOXML: this is unusual but useful and a common kind of thing to do (e.g. groff, PDF, etc). But I don’t see any reason for twips let alone half points, they are just a bunion and a carbuncle, if not vice versa. Are they showstoppers? Well, it would be really good to get gratuitous problems fixed now, rather than leaving it for maintenance. But it is a matter best practice, but not an actual error or gap.

Interoperability

Interoperability is a great motherhood word. No-one is perfect.

They complain that

While the proposers “agree that it is important for the specification to support multiple types of object linking,” they suggest changing oleLink(OLE Link) to oleLink(Generic Object Connection). And, instead of referencing the specific OLE2 connection they say to use any generic ‘embedded object’.

When we look at ODF we see they have an element draw:object-ole which has a definition represents objects which only have a binary representation, almost the same thing. So the ODF Alliance want to keep the reference to OLE (and make it a normative reference, which is probably dubious but I digress). Fair enough: lets make the spec better! But look at the use this issue is put to: the heading says “What is missing? Interoperability! Why ignore the re-use of existing standards?” but the use of existing standards is never mentioned in the text.

I suspect that the heading is a carry over from a previous draft, where the body text was changed as it was discovered that among the Editors Disposition of Comments are details of adding scores of references to the various standards used by OOXML (both in DIS29500 and in other proposed fixes.) But my point is that the conclusion is not supported by the evidence, and their reaction to the issues they raise is too strident and over-reacting.

Support for legacy documents

This begins with actually quite an interesting point, and the first really new things to consider. Should a new standard have deprecated material? Putting aside the general point that a fast-tracked standard is not a new standard but a review and rebadging of an an existing external standard, the comment is that OOXML is a different case than other standards where this mechanism has been used: like C++ these standards capture a living technology in which some parts are living and others are dieing, but the ODF Alliance thinks that compatibility or legacy options are only warranted when they reflect multiple previous implementations. I wonder whether the presence of compatibility options designed to handle old Word Perfect behaviours puts a spanner in the works for that argument?

From the interesting start, the material on this point rapidly descends, ultimately saying

However, from the details provided, it appears that Ecma is merely taking a subset of VML, giving it another name (DrawingML), and using it in places where VML was previously called for. What is deprecated
merely re-enters through the back door.

This is quite bizarre: VML and DrawingML are in different namespaces and I have not seen anything in the Editor’s Disposition of Comments about taking subsets of VML and renaming it. I’d love to know what in particular is meant by this. DrawingML is not something new, but part of the draft (VML had almost been entirely retired, the difference is that the Editor wants to completely retire it.) In particular, there is nothing in the section they quote (Response 92) about subsetting: there is only material on the mechanics of deprecating VML, removing references to it in favour of DrawingL, and enhancing DrawingML so that it can do every that VML did (for example, to support rich text comments); deprecating VML necessarily involves making sure that DrawingML has equivalent features, how else could it be? So the ODF Alliance comment here is completely wrong, perhaps they think they can get away with it because the Editor’s Disposition of Comments document is not generally available.

The background to all this is that France’s AFNOR in its comments asked that the standard be split up with all the core material in one part and all the deprecated functions, documented settings, VML etc in a second part. Many other NBs also asked for the standard to be split up and for OPC to be its own part. My suggestion, through Standards Australia, was to split into 9 parts for example. So ECMA’s proposal is to do both: a part for core, one for deprecated/legacy/VML material, and a part for OPC, but then to add various conformance classes for different application areas which would give the same conformance subset effect that having multiple parts would achieve. So splitting up is a straightforward and direct response to NB suggestions.

Consistency

Once the Editor’s initial Disposition of Comments document is out, then the issue of consistency rightly becomes important for reviewers. If the Editor accepts one comment with a particular fix on certain grounds, why not accept another comment with a similar fix on the same grounds? So now is exactly the time to be bringing up consistency issues. And there certainly might be inconsistent responses to different NB comments, where the NB comments are themselves incompatible.

It is the job of the BRM to work through as many of these these kind of issues as it can. The Editor can only say “Here is how I would solve this” and the BRM has to sort through the issues and contradictions. And ultimately it is the National Bodies who then decide whether the revised text of the standard passes their tests.

The ODF Alliance give two example of horrible inconsistent responses. One is concerned with which version of schemas is normative, with the choices being suggested of either the electronic version or neither. (I hope what will happen is that the schemas will be printed as an annex in the standard, and that many of the schema fragments in the standard will be removed. ) I don’t think they are very serious here, the standard will end up saying something, and that something will in all probability be whatever the BRM decided.

The other inconsistency concerns another one of the Standards Australia Issues I raised. I don’t see the contradiction here: one response concerns content-type labels, the other concerns how to locate executables. Maybe there is some deeper issue that has evaded me…I think there might be a confusion here between OOXML content types (which are expressed using MIME content type notation, and live in the [Content_Types].xml part) and relationship types (which are expressed using a URI syntax and live in the various .rels parts.)

Again, the reason to mention all this is not to say that it is not appropriate to bring up issues like consistency in the lead up to the BRM. My problem is in using these run-of-the-mill things that can happen in any standard as evidence that we should decide to disallow the revised OOXML spec ahead of fixing it.

They write:

Can we in good faith endorse a standard that is not technically sound with conflicting recommendations on technical remedies?

But hold on, who is asking for such an endorsement? The purpose of the BRM is to fix these, so that the identified tecnical unsoundnesses get addressed and that there are no conflicts in the editor’s instructions. Then, after these have been fixed, the National Bodies can respond by changing their ballot responses if they are satisfied.

I am sad if I may jeopardize the love of the ODF Alliance, but this document of theirs is so full of non sequiturs that I don’t see it as adding much light to the discussions. But perhaps the purpose of the document is not to join in any dialog but to try to withdraw participants from it.

[Update: I think if I make fun of poor efforts, I should also praise good efforts. After the disaster of the document above, I see the ODF Alliance has now put out another one OOXML: Top 10 Worst Responses to the NB Comments which is a much more respectable effort, raising reasonable issues this time, restraining itself from the dire and lazy mish-mash, and good-humoured rather than ranting, which is particularly welcome. Its only a document format. In a previous blog I mentioned the spin technique of “innoculation” with the example of list, but I don’t see new ODF Alliance document as that at all, but entirely appropriate, and the kind of things the BRM should be discussing and that non-armchair people should be thinking about. (Of course, I do make the same proviso as with the NB comments: if you parrot a set of points provided by a campaign, you are not doing an independent review of the standard draft but you are doing a review of the pre-fab talking points! If every NB comes with its own Top 10 Worst list, that allows much more coverage and improvement than just one: otherwise when the BRM takes 10 minutes to fix these 10, there will be four days left twiddling thumbs! :-) ) So, well done ODF Alliance, I hope this is a sign of things to come.]

Rick Jelliffe

AddThis Social Bookmark Button

One simplification I have made in the XSLT code presented so far is that except for datatypes I have elided the issue of diagnostics. Yet the ability to provide better diagnostics is one of the value propositions for Schematron. So lets quickly add in some diagnostics!

In Schematron schemas, a distinction is made between assertions, which are positive natural language statements about what should be found in a schema (and if possible why!) and diagnostics which provide information about errors for users. So the schema might say “Element X should be followed by element Y” and the diagnostics might say “Element X was followed by element Z”. The user gets both pieces of information.

But in a well-written schema, the assertions can pretty much be printed off without theirXPath paraphernalia as bullet points and read as software requirements or human-usable documentation. See Autogenerating standards from Schematron schemas for an XSLT script that does this.

So here is our basic diagnostics section. These are each linked to from the appropriate assertions using the diagnostics attribute to reference the diagnostic element’s ID..

		<sch:diagnostics>
			<sch:diagnostic id="d1">This element was found:
				"<sch:value-of select="*/name()"/>".</sch:diagnostic>

			<sch:diagnostic id="typo-element">This element was found:
				"<sch:name/>" in "<sch:value-of select="parent::*/name()"/>".</sch:diagnostic>

			<sch:diagnostic id="typo-attribute">This attribute was found:
				"<sch:name/>" on "<sch:value-of select="parent::*/name()"/>".</sch:diagnostic>

			<sch:diagnostic id="expected-element">This element was found:
				"<sch:name/>" in "<sch:value-of select="parent::*/name()"/>".</sch:diagnostic>

			<sch:diagnostic id="expected-attribute">This attribute was found:
				"<sch:name/>" on "<sch:value-of select="parent::*/name()"/>".</sch:diagnostic>

			<sch:diagnostic id="unexpected-immediate-follower">This element was found:
				"<sch:value-of select="following-sibling::*[1]/name()"/>".</sch:diagnostic>

			<xsl:comment>Generating Diagnostics for xs:all/xs:elements 
			<xsl:for-each select="xs:element[.//xs:all]//xs:all/xs:element">
				<xsl:variable name="ancestor-element" select="ancestor::xs:element/@name"/>
				<xsl:variable name="element-name" select="if (@name) then @name else @ref"/>
				<sch:diagnostic id="{concat('d2-',$ancestor-element,'-',$element-name)}">
				<sch:value-of select="count($element-name)"/>
					"<xsl:value-of select="$element-name"/>" elements were found</sch:diagnostic>
			</xsl:for-each>

			<!-- generate diagnostic for each standard datatypes -->
			<xsl:call-template name="generate-standard-datatypes-diagnostics"/>
		</sch:diagnostics>

This is the last in this round of articles on the XSLT to Schematron converter about schema generation, probably. Thanks to JSTOR and Allette Systems for sponsoring its development. I hope to be transferring the to SourceForge under GPL in February, though I want to divide the main code out into nice separate files first, for maintainability. I am very interested in finding anyone interested in taking over or contributing to the project&,dash;I have a backlog of Schematron matters to attend to!

Rick Jelliffe

AddThis Social Bookmark Button

This article is part of a series describing how to convert from W3C XML Schemas to ISO Schematron. They are very different schema languages! This time we look at some code with quite complex XPaths: we want to validate that the element that follows another element in a document is one that “goes after” the first. But not necessarily immediately after: the schema might require extra elements in between, for example.

Why would we want to do that? Well, because we are approaching this systematically, and gradually expressing constraints from the most general to the most specific. XML schema content models are rather difficult, even when simplified in the way we already do when pre-processing the schema. By having a pattern that validates consecutive elements for partial order we can cope with all manner of inter-nested choice and element groups and cardinalities. We leave testing of required immediate following elements to another test (See the previous in this series Required pairs in sequences for a start.) By plugging the hole with a big rock, we need smaller pebbles for the remaining gaps.

(I have previously raised the use of partial order for schemas in my single-element schema language Hook, and readers.)

Output

Lets start off by showing what we want to achieve. We want to generate from any XSD content model rules like the following:


<sch:rule context="Address/StreetOrPOBox">
         <sch:assert test="not (following-sibling::*)  or
                   (following-sibling::*[1][name() ='Suburb']  or
                    following-sibling::*[1][name() ='State']  or
                    following-sibling::*[1][name() ='Postcode'])">
			When in a  "Address" element, the element "StreetOrPOBox" can only be followed
			(perhaps with other elements intervening)
			by the following elements: Suburb, State, Postcode
      </sch:rule>

And for elements which cannot have any followers, we want to generate rules like the following:

      <sch:rule context="Address/Postcode">
         <sch:assert test="not(following-sibling::*)">
		When in a "Address" element, the element "Postcode" should not be
		followed by any other element.
	 </sch:assert>
      </sch:rule>

Main Loop

Here is the start of the named template

	<xsl:template name="generate-following-elements-checking-rule">

In this we first select all the element declarations or references in the XSD schema which are particles in a content model. Remember that we have pre-processed the schema modules into a single file so we don’t need to worry about import, include and global complexType declarations. And we are not supporting some features at this stage, such as wildcards, substitution groups and dynamic typing, which simplifies our life quite a lot, though unfortunately not entirely.

	<!--  For every use of an element in any content model -->
	<xsl:for-each select="//xs:schema/xs:element//xs:element[not(parent::xs:all)]">
		<!-- Sort them so that local declarations come before globals, and so that deep path
		declarations come before shallow ones -->
		<xsl:sort select="count(ancestor::xs:element)" order="descending" />

Note that we are not worrying about elements in an xs:all group, because these have no partial order constraints that are not tested by the patterns for allowed elements and required elements. We sort our particles longest first so that local declarations are tested before global ones.

Now in this scope we make some convenience variables:

		<!--  Store the name of the parent element -->
		<xsl:variable name="parent-element-name" select="ancestor::xs:element[1]/@name"/>
		<xsl:variable name="parent-element" select="ancestor::xs:element[1]"/>
		<!--  Store the context path -->
		<xsl:variable name="path-to-parent">
			<xsl:for-each select="ancestor::xs:element"
                            ><xsl:value-of select="@name"/>/</xsl:for-each>
		</xsl:variable>

Handling repeating choice elements

Next, we handle a special case. Probably there are more special cases like this, and identifying them would help trim the output Schematron schema and reduce redundant messages.

This is the common case of (a | b | c)*, a single repeating choice group. Like an xs:all there is no need to generate declarations for these (though declarations could be made.)



	<!--  Handle special case -->
	<xsl:when test="parent::xs:choice
		[@maxOccurs='unbounded' or @maxOccurs > 1 ]
		[parent::xs:complexType
			[count(xs:choice)=1]
			[count(xs:sequence)=0]
		or parent::xs:element
			[count(xs:choice)=1]
			[count(xs:sequence)=0]]
		[count(child::xs:choice)=0]
		[count(child::xs:sequence)=0]">
		<!--  If the parent is a repeating choice element and its parents only have that choice,
			and that choice element only has element particles for children
			then we can treat it as a special case: it has no extra positional constraints than the
			presence constraints don't catch. -->

		<!--  only generate the rule when we come to the first subelement -->
		<xsl:if test="not(preceding-sibling::xs:element)">
			<xsl:comment> No sequence constraints for element <xsl:value-of select="$parent-element-name"/>.</xsl:comment>
		</xsl:if>
		</xsl:when>

Identify followers

Now comes the more heart of the matter. We want to identify various kinds of followers, each in variables containing the sequence of possible elements; we use various XPaths to locate the possible elements and put them into variables. This is fraught with error!

At the end, we collect them into a variable followers which hopefully has all the elements we need.

Remember that we are looping through all the element particles in all the content models (except for xs:all and (a | b | c)* models) one by one.

The first variable repeating-cousins holds all the elements particles which belong to the same parent as the current element, but have anywhere between them and the parent element, some kind of repetition&,dash; it could be on the element itself or on a parent sequence or choice. Any of these elements can follow our candidate element, by partial order.

The second variable subsequent-cousins traverses up the document tree-of-nodes from our candidate element and finds every time there is a sequence element: all elements that are direct particles are selected.

The third variable subsequent-nephews is a more elaborate version of this: it selects all the descendants element particles of following groups in sequences.

		<!--  Handle the normal case -->
		<xsl:otherwise>

		<xsl:variable name="repeating-cousins"
		    select="$parent-element//*
			[@maxOccurs='unbounded' or @maxOccurs > 1]
			[.//*=current() or .=current()]
			/descendant-or-self::xs:element
				[ancestor::xs:element[1] is $parent-element]" />

		<xsl:variable name="subsequent-cousins"
			select="ancestor-or-self::*
				[parent::xs:sequence]
				[ancestor::xs:element[1] is $parent-element]
				/following-sibling::xs:element "/>

		<xsl:variable name="subsequent-nephews"
			select="ancestor-or-self::*
				[parent::xs:sequence]
				[ancestor::xs:element[1] is $parent-element]
				/following-sibling::*//xs:element "/>			

		<xsl:variable name="followers"
			select="$repeating-cousins | $subsequent-cousins | $subsequent-nephews" />

Now we are set up to generate our rules.

Note: I suspect that people would expect the code to work by generating a transitive closure for the reachable following elements of each particle, finding the possible immediate following sibling elements, then finding their possible immediately following sibling elements, and so on repeated. But recursion in this situation seems to me to be prone to exploding (in time, if not in memory) based on some other recent work I was doing on XML schema re-factoring. However, the method above (if it is correct!) uses no recursion and may be better for that reason.

Rules for elements with followers

The odd use of concat() in the context attribute is just to cope with some element particles being locally declared and others being globally declared.

The code here is not difficult. There is a little xs:if section to customize the assertion text when there is only one possible follower.

    <!--  Make a rule using the current context path -->
		<sch:rule context="{concat($path-to-parent, (@name | @ref))}">
		  <!--  select  all the elements that are under any choice or sequence group which allows
		  	repetition and has the current element under it-->

		<xsl:if test=" $followers " >
		  	    <sch:assert>
		  	    	<xsl:attribute name="test">
		  	    	<xsl:text>not (following-sibling::*)  or (</xsl:text>
				<xsl:for-each select=" $followers ">
				    <xsl:choose>
				    	<xsl:when test="@name"
						>following-sibling::*[1][name() ='<xsl:value-of select="@name  " />']</xsl:when>
				    	<xsl:when test="@ref"
						>following-sibling::*[1][name() ='<xsl:value-of select="@ref  " />']</xsl:when>
				    </xsl:choose>	

				 <xsl:if test="position()!=last()"> or </xsl:if>
				</xsl:for-each>
				<xsl:text>)</xsl:text>
				  </xsl:attribute>
			When in a  "<xsl:value-of select=" $parent-element-name" />" element,
			the element "<xsl:value-of select="concat(@ref, @name)" />" can only be followed
				<xsl:if test="count( $followers ) != 1">(perhaps with other elements intervening)</xsl:if>
				by the following elements:
				<xsl:for-each select=" $followers">
					<xsl:value-of select="@name | @ref" />
					<xsl:if test="position()!=last()">, </xsl:if>
				</xsl:for-each>
			</sch:assert>
		</xsl:if>

Rules for elements with no followers

Finally, we handle the case of elements at the end of the content model. This is the case where there are no elements in the follower set.

		<xsl:if test=" not( $followers ) ">
			<sch:assert test="not(following-sibling::*)">
			When in a "<xsl:value-of select=" $parent-element-name" />" element,
			the element "<xsl:value-of select="concat(@ref, @name)"/>" should not be
			followed by any other element.
			</sch:assert>
		</xsl:if>

		</sch:rule>

	  </xsl:otherwise>

	  </xsl:choose>
	</xsl:for-each>
</xsl:template>

(Acute people might be wondering whether we need any rules to test that an element must start with a particular element. When the same element can appear more than once in a content model, that might indeed be useful. When it only appears once, and is required, the required element rules will report it. Where it is optional, if it goes in the wrong place these partial order rules should catch it. So I am not sure it is an important case at this grain, for us: we are not really doing making any effort to handle multiple particles, though certainly I expect most patterns we have used so far will cope with them. But it remains another issue to audit!)

Rick Jelliffe

AddThis Social Bookmark Button

What we want to do is to have a Schematron pattern that just checks a very specific thing: when the use in a document of one element requires that another element immediate follows it.

Actually, I am skipping over a stage here, because this code is quite small, fun and instructive. Which is perhaps another way of saying and the code we are skipping over (for now) is quite complex. The stage we are skipping over for now has assertions to test partial order (like Topologi’s and James Clark’s RELAX NG validator JIng’s feasible validation mode: it passes any element which could go after the current element (in its parents) not just the element that can immediately follow it. Having the test for partial order is useful for progressive validation (for example for feasible validation where we have a document that we know is incomplete, but we just want to know if it is OK as far as it goes) but more importantly it lets us divide and conquer our task.

Back to our simple case… The XML Schemas schema for this is when there is a xs:sequence element, which contains two consecutive xs:element particles, with occurrence constraints set so that the first cannot repeat while the second is required.

First here is the kind of code we will have in our Schematron schema:

   <sch:pattern id="Required_Immediate_Followers">
      <sch:title>Required Immediate Followers (Simple)

      <sch:rule context="Address/StreetOrPOBox">
         <sch:assert test="following-sibling::*[1][self::Suburb]">
		When in a "Address" element, the element "StreetOrPOBox" should be immediately followed by
		 the element "Suburb". </sch:assert>
      </sch:rule>
     ...
   </pattern>

And here is the beta XSLT code to generate it from our (expanded and munged) XML schema:

	<xsl:template name="generate-immediate-following-elements-checking-rule">

		
    	<xsl:for-each select="//xs:element
		    	[not(@maxOccurs='unbounded') and not(@maxOccurs > 1) and not(@maxOccurs=0)]
		    	[@minOccurs='unbounded' or not(@minOccurs=0)]
    			[parent::xs:sequence]
    			[following-sibling::*
    				[self::xs:element
    					[@maxOccurs='unbounded' or not(@maxOccurs=0)]
                                        [@minOccurs='unbounded' or not(@minOccurs=0)]]]">
    			 	<!--  Store the name of the parent element -->
		<xsl:variable name="parent-element-name" select="ancestor::xs:element[1]/@name"/>
		<xsl:variable name="parent-element" select="ancestor::xs:element[1]"/>
		<!--  Store the context path -->
		<xsl:variable name="path-to-parent">
			<xsl:for-each select="ancestor::xs:element"><xsl:value-of select="@name"/>/</xsl:for-each>
		</xsl:variable>

		  	<sch:rule context="{concat($path-to-parent, (@name | @ref))}">
		  		<sch:assert diagnostics="unexpected-immediate-follower">
		  			<xsl:attribute name="test">following-sibling::*[1][self::<xsl:value-of
		  			select="concat(following-sibling::*[1]/@name, following-sibling::*[1]/@ref)"/>]
                                       </xsl:attribute>
		  			When in a "<xsl:value-of select=" $parent-element-name" />" element,
                                       the element "<xsl:value-of select="concat(@ref, @name)"/>" should be
                                       immediately followed by  the element  "<xsl:value-of
                                      select="concat(following-sibling::*[1]/@name, following-sibling::*[1]/@ref)"/>".
		  		</sch:assert>
		  	</sch:rule>

    	</xsl:for-each>

   </xsl:template>

One thing to note is the variable path-to-parent: we will see this used again later. It allows us to have local declarations as deep as we need. Another thing to note is that whenever we test the XML Schemas attribute maxOccurs and minOccurs we first have to do a string test for “unbounded” (or a test using number()) because they have a union data type allowing numbers and “unbounded”.

Looking at this code I see an immediate potential flaw: in XPath 1.0 you would only need to check the maxOccurs and minOccurs attributes for numeric values: the tests would gracefully fail if “unbounded” was used in the original schema. However, XPath 2.0 will generate a type error, so we put the test for string first (the attribute value will be first tested as a string, then as a number). This relies on shortcircuiting: the success of the first test means the second test is not evaluated. But, oh dear, shortcircuiting is not guaranteed in XPath 2.0 (it is XPath 1.0 behaviour.) So I will have to make these tests into little if ... then... expressions. This is one place XLST 2.0 really gets it wrong, it should add the short-circuiting constraint because it makes life sooo much easier for programmers. I am enjoying exploring XSLT 2, but this is thing is just dumb and un-idiomatic. If it ain’t broke don’t fix it, and so on. (Having said all that, SAXON acts the way I want here, and short-circuits or at least does not freak out. Keen readers: please let me know if my understanding it wrong here!)

This simple test actually handles a lot of the required constraints in content models, and obviously it can be improved on: for example, when the first element can repeat, the assertion needs to be broadened to allow it to follow itself. Or what the second particle is another sequence, or a choice? Or what if the second particle is optional? And what if the same particle appears several times in the content model? (See my initial article on this Converting Content Models to Schematron for some ideas.)

However, it does not generate false negatives, which is what we want as we create our finer sieve.

Rick Jelliffe

AddThis Social Bookmark Button

We can improve on the diagnostics given by the rules in the previous article in this series, Progressive validation for complex content models.

Diagnosing Similar Names

One of the most common typos is simply to make a mistake in upper-case/lower-case. We can generate Schematron code to check this:

<sch:rule context="*[upper-case(local-name())=upper-case('Address')]">
         <sch:report test="true()">The unexpected element "<sch:name/>" has been used,
            which is close to an element in the schema: the element "Address".
	</sch:report>
 </sch:rule>

And here is the XSLT for generating those Schematron rules:

	<xsl:for-each select="//xs:element[@name]">
		<xsl:sort select="@name"/>
		<xsl:variable name="theLocalName" select="replace( @name, '^(.*):(.*)', '$2' )" />
		<xsl:if test="string-length( $theLocalName ) > 0">
			<sch:rule context=
                             "{concat(
                                  '*[upper-case(local-name())=upper-case(&quot;',
                                  $theLocalName,
                                  '&quot;)]')}">
				<sch:report test="true()" role="note"
                                >The unexpected element "<sch:name/>" has been used, which is close to an
				element in the schema: the element "<xsl:value-of select="@name"/>"
                                <xsl:if test="contains(@name, ':')"> in the
				{<xsl:value-of select="ancestor::xs:schema/@targetNamespace"/>} namespace</xsl:if>.
				</sch:report>
			</sch:rule>
		<xsl:if>
	</xsl:for-each>

This code actually catches two problems: have you made an upper-/lower-case typo or have you used an element with a name in the current namespace but using a different namespace.

Actually, the code as it is will generate a false positive if the same element name is used in multiple namespaces. So I will give it a role attribute of “Note” (as in Note, Caution, Warning). The role attribute lets you know what function a particular assertion plays in its rule or pattern.

These generated rules get put in the pattern that checks for typos, after the checks for defined names, but before the wildcard catch-all entry at the end: this way elements that have correct names and namespaces are dealt with before these rules, and any names that have other problems get dealt with by the default. In Schematron, a schema is made from patterns: each pattern contains rules, and each rules contains assertions (assert or report elements): every assertion in a rule is tested in the context (an XPath that may match nodes of interest from the document) provided by the rule; the rules however form a case statement, so that if some node matches one rule they won’t be tested by a subsequent rule in the same pattern.

Towards terser, more declarative schemas

It is almost axiomatic that automatically generated code is ugly and unfriendly. Look at compiler generators for example. Of course, getting consistent code that does the same thing many times is why you use a code generator like Schematron in the first place rather than writing the XSLT yourself, in many cases.

But it is certainly possible to make the code more friendly and more declarative. In Converting Schematron to XML Schemas I showed how to use abstract rules to provide extra declarative information so that there is enough information to convert back to a kind of W3C XML Schema. It doesn’t go so far, but the idea is that abstract rules (and abstract patterns, together with the role attribute) provide the abstraction for grouping assertions and representing types.

I won’t go into the code, it is trivial, but the idea is that there are quite a few rules or assertions that don’t have any dynamic content (sometimes it is handled by the diagnostic element, other times we don’t expect the rule to ever generate messages, see Expressing untested and untestable constaints in Schematron) and we can use abstract patterns to make things much more declative, readable and terse.

Here is an example, for the rules that swallow elements names that are defined in the current namespace

      <sch:rule id="DefinedElement" abstract="true">
         <sch:assert test="true()">The element name "<sch:name/>" is defined.</sch:assert>
      </sch:rule>

      <sch:rule context="Address">
         <sch:extends rule="DefinedElement"/>
      </sch:rule>

      <sch:rule context="AgeNextBirthday">
         <sch:extends rule="DefinedElement"/>
      </sch:rule>

And here is an example for detecting various kinds of text content:

	<sch:rule abstract="true" id="NoDataContent-ns1">
         <sch:assert test="string-length(normalize-space(string-join(text(), ''))) = 0"
                     diagnostics="d1">Element "<sch:name/>" should have no text content.</sch:assert>
      </sch:rule> 

      <sch:rule abstract="true" id="NoElementContent-ns1">
         <sch:assert test="count(*|processing-instruction()|comment()) = 0" diagnostics="d1
         ">Element "<sch:name/>" should be completely empty (no XML comments, PIs, or elements).</sch:assert>
      </sch:rule>

      <sch:rule abstract="true" id="NoContents-ns1">
         <sch:extends rule="NoDataContent-ns1"/>
         <sch:extends rule="NoDataContent-ns1"/>
         <sch:assert test="count(processing-instruction()|comment()) = 0" diagnostics="d1"
                >Element "<sch:name/>" should be completely empty (no XML comments, PIs).</sch:assert>
      <sch:rule>

      <sch:rule context="BestTime">
         <sch:extends rule="NoElementContent-ns1"/>
      </sch:rule>

      <sch:rule context="Gender">
         <sch:extends rule="NoDataContent-ns1"/>
      <sch:rule>

      <sch:rule context="Female">
         <sch:extends rule="NoContents-ns1"/>
      </sch:rule>

      <sch:rule context="Male">
         <sch:extends rule="NoContents-ns1"/>
      </sch:rule>

Much easier to read than having all those assertions expanded!

Rick Jelliffe

AddThis Social Bookmark Button

Karen Deane has an article in The Australian Battle on Microsoft standard push which includes some quotes from a background interview she asked me for, to give her the gossip ahead of the big MS journalist fly-in last week.

David A. Chappell

AddThis Social Bookmark Button

There has been a fair amount of chatter lately about defending the value of SOA projects or justifying such projects to the “C-level”. Many of these discussions will point at the business value of doing more with less and achieving IT cost reduction by reducing redundant systems and reuse of services in a SOA. Also streamlining processes in order to run the business more efficiently is a popular opinion,

M. David Peterson

AddThis Social Bookmark Button

As per the description in the newly created Facebook group of the same titled-name,

It’s been 3 1/4 years too many since Chris Sells hosted the last SellsCon @ the beautiful Skamania Lodge in Stevenson, WA. It’s about time we SellsGeeks band together and demand our rights to have our brains properly nourished with the type of brain nourishment that only a SellsCon can provide.

If you haven’t experienced a SellsCon before, then here’s your chance to get in on the ground floor. If you have, then you don’t need any encouragement from me: You know exactly what I mean when I state that there’s no tech conference like a SellsCon. None of this fluffy-puffy corporate sponsored mumjo-jumbo. Just the best and the brightest converging together into the same place for a couple days to make sense of all the crap that get’s blasted in your face at all the other conferences (Okay, except for OSCON (and, of course, any other O’Reilly hosted/sponsored conference ;-)).

Now, before I get myself in any (more!) trouble: This isn’t something that is absolutely, without a doubt going to happen. This is just an attempt to get enough people to come together to ensure Chris is made fully aware just how badly we want to attend/participate in another SellsCon, and what better way to do that than to stick our names on an easy-to-locate list that Chris can then look at and go: “Okay, I’m convinced.”

That said: I invited Chris to join the group. And he did. So if nothing else, at least he’s interested in the idea. :D

Are you in?

M. David Peterson

AddThis Social Bookmark Button

NOTE: ILMOTD == IRC Leaving Message Of The Day


As seen in the scroll back on #xslt,

MikeSmith left the chat room. (”Less talk, more pimp walk.”)

David A. Chappell

AddThis Social Bookmark Button

Hi all,

A video recording of a SOA Grid presentation that I did at the BeJUG Enterprise SOA Conference has just become available at
http://parleys.com/display/PARLEYS/Next-Generation+Grid+Enabled+SOA?showComments=true

Topics include -

- A new grid based service bus infrastructure concept that combines process flow, horizontally scalable service state caching, and ESB mediation.

- Fault tolerant in memory data grid

- A controversial subject that I call “Not Your MOM’Bus”

- Patterns for transparent state management of load balanced services

- Patterns for transparent fault tolerance of stateful services.

- Optimal server resource allocation that is complementary to virtualization strategies.

Dave

M. David Peterson

AddThis Social Bookmark Button

Push Button Paradise | Blog Archive | WebPath wants to be free (BSD licensed, specifically)

The focus of WebPath was rapid development and providing an experimental platform. There remains tons of potential work left to do on it…watch this space for continued discussion. I’d like to call out special thanks to the Yahoo! management for supporting me on this, and to Douglas Crockford for turning me on to Top Down Operator Precedence parsers. Have a look at the code. You might be pleasantly surprised at how small and simple a basic XPath 2 engine can be.

Nice! I wonder if it runs via IronPython? That would *ROCK*! And if no,

So, who’s up for some XPath hacking?

ME! :D

M. David Peterson

AddThis Social Bookmark Button

Actually, there are plenty of reasons why F# ((F == Functional) == True)) *ROCKS*. Here’s a few from the previously linked F# site on Microsoft Research,

Combining the efficiency, scripting, strong typing and productivity of ML with the stability, libraries, cross-language working and tools of .NET.

F# is a programming language that provides the much sought-after combination of type safety, performance and scripting, with all the advantages of running on a high-quality, well-supported modern runtime system. F# gives you a combination of

* interactive scripting like Python,

* the foundations for an interactive data visualization environment like MATLAB,

* the strong type inference and safety of ML,

* a cross-compiling compatible core shared with the popular OCaml language,

* a performance profile like that of C#,

* easy access to the entire range of powerful .NET libraries and database tools,

* a foundational simplicity with similar roots to Scheme,

* the option of a top-rate Visual Studio integration,

* the experience of a first-class team of language researchers with a track record of delivering high-quality implementations,

* the speed of native code execution on the concurrent, portable, and distributed .NET Framework.

The only language to provide a combination like this is F# (pronounced FSharp) - a scripted/functional/imperative/object-oriented programming language that is a fantastic basis for many practical scientific, engineering and web-based programming tasks.

F# is a pragmatically-oriented variant of ML that shares a core language with OCaml. F# programs run on top of the .NET Framework. Unlike other scripting languages it executes at or near the speed of C# and C++, making use of the performance that comes through strong typing. Unlike many statically-typed languages it also supports many dynamic language techniques, such as property discovery and reflection where needed. F# includes extensions for working across languages and for object-oriented programming, and it works seamlessly with other .NET programming languages and tools.

For those of you unaware, F# is now a first class MSFT language, or in other words, this is no longer a “Hey, here’s an idea. Let’s research it.”-type project and instead a true-blue MSFT product backed by mean-green MSFT money, led by some of the very best and brightest minds @ MSFT.

If you were to ask me “What’s the future language foundation of the .NET platform?” I would first state “More than likely, XSLT 2.0++.” And then when you stopped laughing and slapped me upside my head to awake me from my dream I’d say, “What the F#!? was that for?” and you’d say “F#??,” followed by “Isn’t that for programming the way God intended for people to program on the .NET platform?”, and then I’d say “Okay, you got me on that.” at which point we’d move on…

So here’s the thing: While there are *TONS* and *TONS* of reasons why F# *ROCKS* (did I mention that F# is distributed as both an MSI and a ZIP, the latter designed to make it easy for folks using Mono to take full advantage of what F# has to offer?), the biggest reason it *ROCKS* is this,

Kurt Cagle

AddThis Social Bookmark Button

It’s been a while since I’ve written a “non-directed” blog for xml.com, so while I will be covering a few XML topics here if you’re not interested in economic systems theory, then you might as well skip this.

As I write this, it’s about eight hours before the financial markets open in New York. The markets were closed today, Monday the 21st of January, for Martin Luther King, Jr. day, which may prove to be a bad thing in the morning. Today the average market loss globally was about 7% - here in Canada, the drop translated to a 605 point loss, or about 4.75%, on the TSX. The India Sensex fell 11% in two minutes before trading was halted. I can pull out other figures, but they say much the same thing.

It’s hard to say what will happen in the morning - I’m not even going to try, though I have my suspicions. Enough fire control may have been put into place to keep the US markets from getting too badly singed (though I have NO doubt that few people at any brokerage firm in the country were allowed to stay at home today), but what you’re seeing here is something that we’ve not seen in a long time … the start of a worldwide stock crash.

Simon St. Laurent

AddThis Social Bookmark Button

I’m happy to report that Ruby on Rails not only offers a comfortable way to develop web applications, but that a little-noticed feature makes some formerly theoretical open approaches to XML much more immediately practical.

Rick Jelliffe

AddThis Social Bookmark Button

The Editor’s Disposition of Comments is quite an important document in the standards development process at ISO. After National Bodies submit their initial positions and comments on a late draft standard, the editor of the standard puts together a document to try to satisfy the various comments. Even though the Disposition of Comments document is not official, in the sense that anything in it is automatically accepted, it is usually the starting point for comment resolution, and, given that most comments are uncontroversial, is often the end-point too.

Monday 14th Jan was the self-imposed deadline for the circulation of the IDS 29500 Editor’s Disposition of comments. (The comments and disposition documents have been leaked to the web, with no tears from anyone.) Here is my rough characterization of them:

image001.gif

The Editor (Rex Jaeschke on behalf of ECMA TC45) has accepted the lion’s share. There is a small chunk of comments that are out of scope (typically concerning IPR or procedural comments.) There is a small chunk which the Editor has decided are issues for the maintenance phase, not the fast-track process: these are typically how comments like “ODF has feature X, why doesn’t OOXML support it?” There is another chunk of issues where the Editor disagrees with the substance of the comment, but wants to address the issue by adding clarifying or helpful text to the specification: for example, the issue of bitmasks is handled by giving examplars of how to handle them in XSD, RELAX NG, Schematron, DTLL and XSLT.. And finally, another chunk where the Editor disagrees, and gives the rationale for the disagreement. These are typically where the comments cross ECMA’s line in the sand: that no currently valid OOXML document should become invalid.

Of course, even in the comments where the Editor agrees with the comment, there may be some cases where the Ballot Resolution Meetinig next month decides to do something different from the Editor’s recommendation.

So how does it compare with the touchstone issues I isuggested in Your Country’s Comments Rated!?

The particular touchstone issues I see are that spreadsheet dates need to be able to go before 1900, that DEVMODE issues need to be worked through more, that the retirement of VML needs to be handled now, and that there needs to be a better story for MathML.

Lets see the suggested resolutions for each of them

  • Spreadsheet dates to go back before 1900 (and can use ISO 8601 date format),
  • DEVMODE concerns printer-dependent data which may be binary: the editor suggests some minimal changes to say “information” rather than “data structure” and to show how the system would work with some future XML-based print structure, but leaves the issue of a standard format to maintenance and justifies the need for these printer-dependent data chunks on the need to package information in legacy documents:
  • VML is being withdrawn from the places it is used in the specification, which now use DrawingML (e.g. for backgrounds in WordpocessingML); (furthermore, this provides a level of modularity that theoretically allows some kind of use of SVG for drawing, though I don’t expect this would be a popular option unless Office supports it.)
  • For Maths, the Editor recommends allowing alternative formats in particular recommending MathML: this is not to replace the OOXML Maths, but in the context of “rehydration” which is where you want to round-trip through systems that don’t support your full language, so they use some lesser one (such as a graphic) as a fallback, but the systems maintain the text of a higher-level format. This is probably good for MathML adoption, but also for professional maths systems developers.

Finally, what about that issue I have been tracking that I think is a crazy edge-case blown out of proportion: the AutospaceLikeWord95? Well, now we have a few pages of documentation about a tiddly bit of extra space between digits and full-width characters (as used by Japanese); in fact we have much more complete documentation of typesetting behaviour that should not be implemented compared to what should be implemented! It doesn’t do any harm to have this documented, except that it is a distraction from more substantive issues and has notoriously been used as evidence that DIS29500 cannot be implemented.

M. David Peterson

AddThis Social Bookmark Button

… and then what will /. have to write about? *NOTHING*, I tell you! *NOTHING*!!!

Wait, that would be a good thing, huh?! *SWEET*! Please carry on…

Brian Jones: Open XML Formats : Mapping documents in the binary format (.doc; .xls; .ppt) to the Open XML format

I wanted to call everyone’s attention to a few interesting developments in Ecma’s proposed disposition document related to the Office binary formats. There were a few comments from national bodies that asked about the documentation of the Office binary formats and the availability of those documents. We had already been talking about these issues in TC45 where there were a number of existing experts in the binary formats (including Apple, Novell, and Microsoft). Based on the feedback from the national bodies, Microsoft decided last week to take some additional steps in this area.

Brian goes on to describe the fact that MSFT will be making it even easier to gain access to the documentation for the legacy Office binary formats, promises not to sue you if — you know — you actually read the documentation and then apply this knowledge to implementing support for these formats, and then takes it one step further by not only providing a mapping from the binary formats to Office Open XML, but promises to start a Binary Format-to-ISO/IEC JTC 1 DIS 29500 Translator Project on SourceForge.net in collaboration with ISV’s.

Hey Micah (Dubinko),

Why don’t they just ISO standardize their binary formats? That’s “backwards compatibility” for ya. -m

Micah Dubinko | January 24, 2007 10:13 AM

Okay, so maybe this isn’t standardization, but as it relates to legacy file formats, isn’t this good enough? And either way, nice work, MSFT!

M. David Peterson

AddThis Social Bookmark Button

A friend of mine here in the Salt Lake City area is in need of the best and brightest C# and Flex developers who either live in the Salt Lake City area or who are willing to relocate. From what I understand these are full time positions. If you have the proper skillset and have interest please contact me directly and I will forward your contact info on appropriately.

Thanks!

M. David Peterson

AddThis Social Bookmark Button

It just makes them more burnt…

Java and doneness

Just because a standards body can’t recogonize done-ness and declare victory, that doesn’t mean the fruits of that body have any bearing on reality.

Kurt Cagle

AddThis Social Bookmark Button

Over the last couple of years, I’ve worked extensively with Firefox, and while it still has its warts (and while I believe that its days of double digit rises in adoption are probably coming to a close) overall, I’ve found that it has become, for me anyway, my de facto browser into the web and the focus of most of the web applications (and extensions) that I’ve built in the last year. For that reason alone, if nothing else, I’ve been watching closely as Firefox 3.0 approaches its final release.

The second beta version of FF3 is now out, and I have to say that overall I’m feeling quite pleased with what I’m seeing, with a few caveats. Since I do generally dig into the application daily, my focus in trying it out (and in writing this review) is less on the immediate UI and functionality changes for the typical user and more on how its going to affect web software development. Thus, I ask that you forgive me for not talking about the new theme (okay, though not a radical departure from the old) or other user improvements or give you a lot of screenshots … I want to look a little more deeply under the hood.

M. David Peterson

AddThis Social Bookmark Button

I don’t agree with everything Sean McGrath writes in his latest post as I think there are a lot of really smart people who have developed some really smart ways to handle the variable width nature of XML w/o turning to malloc() every time the length of an element or attribute name reaches past any given preset constraints. That said, I can’t help but agree with,

Memory-based caches of “cooked” data structures are your friend.

Absolutely!

For you .NET developers here’s a pre-written recipe that handles all of the dirty work of determining whether to create a new XmlReader or return the in-memory cached version based on the generated ETag for the source file (see Extended Overview below for a deeper understanding of how this works.) To use this recipe you need to do nothing more than create a new XmlServiceOperationManager when your application starts up like so,

XmlServiceOperationManager myXmlServiceOperationManager =  new XmlServiceOperationManager(new Dictionary<int, XmlReader>());

and then use the GetXmlReader method of the XmlServiceOperationManager, passing in the Uri (an actual System.Uri object, not the string value of the URI, though I guess it would be easy enough to create an overload that takes the string value of the URI. Another task for another day. ;-)) of the desired XML file to get an XmlReader in return like so,

XmlReader reader = myXmlServiceOperationManager.GetXmlReader(requestUri);

That’s it! Now you can use your “new” XmlReader however you might need and the next time that file is requested for processing if it hasn’t changed you save all of the time it would normally take to read the source file and convert it into an XmlReader which is fairly significant.

Source code and extended explanation inline below. Enjoy!

Oh, and stay tuned for the next installment of this recipe where we learn how adding,


1 Part memcached
1 Part ETag's

and


1 Part GZip encoding

… can turn your lame a$$ performance sucking web application into a lean, mean, kick a$$ performing machine. For a precursor, see Joe Gregorio’s AtomPub presentation slides from this past OSCON