Following on from Kurt’s detailed reevaluation of XSLT 2.0, I thought that I might share an example of what you can do in XSLT 1.0 with the assistance of EXSLT, a useful set of extension functions that are supported by most XSLT implementations.
Since I have been thinking about state machines for parsing XML, why not try and use XSLT to generate one? This will require a transformation that takes a description of the grammar of XML, expressed as an XML file, and produces a description of the state machine for parsing XML, also expressed as an XML file. Very meta.
To keep it simple, we can start with the most basic grammar production in the XML specification, which declares that whitespace (S) is a sequence of one or more space, tab, carriage return or line feed characters:
S ::= (#x20 | #x9 | #xD | #xA)+
Since we want to process this with XSLT, we will need to express this grammar production in XML:
<pattern name="S">
<oneOrMore>
<choice>
<char code="09"/>
<char code="0A"/>
<char code="0D"/>
<char code="20"/>
</choice>
</oneOrMore>
</pattern>
Some of the element names I’ve used come from the RELAX NG schema language, which also uses <zeroOrMore>, <oneOrMore> and <choice> to define grammars, but grammars for XML vocabularies, not grammars for XML itself. Of course, it would be a good idea to write a schema for this grammar description language, both for documentation and to catch errors early. Then we would have an XML file describing a grammar for an XML file describing a grammar for XML. The only way to top that would be to finish it off by writing an XML RELAX NG schema describing the grammar of RELAX NG schemas.
Now we can write XSLT templates to process the grammar productions and generate state machines. I’m not going to go into the algorithms here; for that I strongly recommend the Dragon Book, but basically I need to process the grammar recursively, depth-first, turning each element into a little state machine and then returning to its parent and turning that into a larger state machine that incorporates the children. When I get back up to the top level I’ll be left with one big state machine that implements the entire grammar production.
This is a simple algorithm for most programming languages, but it is awkward to implement in XSLT 1.0. This is because XSLT 1.0 makes a distinction between node sets, which are sets of XML nodes selected from the input document, and result tree fragments, which are fragments of XML generated by templates. Result tree fragments are intended to go to the output rather than being kept around for further processing, so you can’t recursively apply templates to them.
This is where EXSLT comes in. Among its many useful extension functions is one called exslt:node-set(), which does just what we want:
The
exsl:node-setfunction returns a node-set from a result tree fragment (which is what you get when you use the content of xsl:variable rather than its select attribute to give a variable value). This enables you to process the XML that you create within a variable, and therefore do multi-step processing.
As an example I’ll show how I can use this function to simplify the template for processing the <oneOrMore> grammar element. As with regular expressions, we can treat <oneOrMore> as being syntax sugar for <zeroOrMore>, because the regular expression a+ is exactly equivalent to aa*. This is great, as it means we only need to write one tricky template to handle <zeroOrMore>, and the template for handling <oneOrMore> can just expand it like a macro. Here is the complete template:
<xsl:template match="oneOrMore">
<xsl:variable name="expansion">
<sequence>
<xsl:apply-templates mode="copy"/>
<zeroOrMore>
<xsl:apply-templates mode="copy"/>
</zeroOrMore>
</sequence>
</xsl:variable>
<xsl:apply-templates select="exslt:node-set($expansion)"/>
</xsl:template>
The template expands the <oneOrMore> element into a <sequence> containing the original children followed by <zeroOrMore> with the original children repeated, thus turning a+ into aa*. Once this is done we can apply templates recursively to the expansion of our macro, as if the <oneOrMore> element was never in the original document at all. This is an easy way to handle any elements which exist for user convenience and can be recursively expanded to a longer sequence of simpler elements.
So if there is something you need to do in XSLT 1.0, give EXSLT a try. Many XSLT engines support it and it is a handy way to bridge the gap between XSLT 1.0 and 2.0.


Fantastic article, Micheal! While the approach I took was from an analogy standpoint instead of a technical showcase, you might find the following article I wrote a while back in regards to EXSLT and XSLT 2.0, moving forward with both tools as part of our overall toolset.
http://www.xsltblog.com/archives/2006/02/on_procedural_v.html
Once again, great post! And welcome to XML.com, by-the-way... Great to have you here :)
uh, not sure how I mixed to ae with ea in spelling your name. sorry about that, Mich*ae*l! :D
Hi, M. David. It's good to see that there will be cooperation between the EXSLT and XSLT 2.0 communities. Perhaps it will eventually lead to a comfortable detente situation where the only major difference between them is their use of XSD schema. (I have to say that I feel more attached to schema-less XSLT, but understand that the value proposition may be different for people using XML databases based on XSD and XQuery).
In effect, the widespread support for EXSLT has created a defacto XSLT 1.1, even though the official draft has been abandoned. It would be a nice gesture for the W3C to rubber stamp EXSLT as being XSLT 1.1 at some point; at least that would allow the extensions to be brought back into the XSLT namespace and reduce the number of namespace declarations that users have to write :)
I like the way you think, Michael ;-) :D I'll be sure to point a few folks at your comments and see what trouble we might be able to stir up ;-)
See also my XML.com article Extending XSLT with EXSLT.
Michael,
Very nice piece. Just a brief follow-up on this - the XSLT 2.0 engine would handle this in nearly the same way, except that you wouldn't need to use the exslt:node-set() method:
<xsl:template match="oneOrMore">
<xsl:variable name="expansion">
<sequence>
<xsl:apply-templates mode="copy"/>
<zeroOrMore>
<xsl:apply-templates mode="copy"/>
</zeroOrMore>
</sequence>
</xsl:variable>
<xsl:apply-templates select="$expansion"/>
</xsl:template>
At the present time, Saxon supports both XSLT2 and EXSLT, and overall I think that there is in fact a fair degree of cross-communication between the two communities ... especially since a number of people involved in EXSLT (myself included) worked closely with the XSLT WG in pushing most of that functionality into XSLT2. The schema issues for the most part arise due to XQuery, which was driven more by the database vendor side, and I find that you can generally get away with not using schemas for just about all XSLT2 transformations.
Actually a followup on your comment about EXLST as XSLT 1.1. I'd be a little nervous about retrofitting EXSLT as XSLT 1.1 at this stage. It creates more targets to implement, would slow down adoption of XSLT 2 because XSLT 1.1 would be seen as a good baby step, and for the most part EXSLT code IS implemented in XSLT 2.0, the primary difference being the underlying data model. After years of having to wade through the XSL 1.0 vs. XML Patterns (Microsoft pre XSLT implementation) I'd prefer not to revisit that, especially given the XSLT community is still not exactly huge.
Hi Kurt,
Just a brief follow-up on this - the XSLT 2.0 engine would handle this in nearly the same way, except that you wouldn't need to use the exslt:node-set() method:
Excellent! The difference between result tree fragments and node sets can be very confusing when you encounter it for the first time; it's good that XSLT 2.0 has simplified the data model.
Besides SAXON, are there any other good XSLT 2.0 implementations I could try? One thing I like about xsltproc (libxslt) is that it is very lightweight, so it's ideal to use in shell scripts and makefiles. I guess that I'm really asking if there are any non-Java/C# implementations of XSLT 2.0 :)
(I do run Jing from shell scripts for validation as it has more complete schema support than xmllint, but there is always a noticeable pause when it starts up).
On the subject of XSLT 1.1, it seems that the incremental revision of XML languages at the W3C can be difficult. XML 1.1, XSLT 1.1, XHTML 1.1; none of them exactly set the world on fire, and there is a lot more interest in XML 2.0 (proposals, at least), XSLT 2.0 and XHTML 2.0. It remains to be seen if XSD 1.1 and XSL-FO 1.1 can break the trend. The most successful incremental revision of a W3C specification would have to be HTTP 1.1, or perhaps the award should go to CSS 2.1. Both of which are non-XML technologies... :)
Michael said:
Besides SAXON, are there any other good XSLT 2.0 implementations I could try? One thing I like about xsltproc (libxslt) is that it is very lightweight, so it's ideal to use in shell scripts and makefiles. I guess that I'm really asking if there are any non-Java/C# implementations of XSLT 2.0 :)
Yeah, I struggled with whether or not to use Saxon for scripting for a while, but eventually gave in because it is such a nice, conformant piece of software. I recently discovered that the Saxon Java interface was easily approachable from JRuby (I do all of my scripting in Ruby anyway), so I'm looking forward to playing with that in the future.
On the subject of XSLT 1.1, it seems that the incremental revision of XML languages at the W3C can be difficult. XML 1.1, XSLT 1.1, XHTML 1.1; none of them exactly set the world on fire, and there is a lot more interest in XML 2.0 (proposals, at least), XSLT 2.0 and XHTML 2.0. It remains to be seen if XSD 1.1 and XSL-FO 1.1 can break the trend. The most successful incremental revision of a W3C specification would have to be HTTP 1.1, or perhaps the award should go to CSS 2.1. Both of which are non-XML technologies... :)
SVG 1.1 is being seen as the canonical form for that spec. The jury is still out on XForms 1.1 - my own take is that it would make more sense there to get XF1.1 out but shoot for an XF2.0 release that would incorporate XPath2, provide a consistent exstension mechanism, and introduce a few more critical elements such as treeview editors, which are hard to model in XF1.1. But yes, I'd agree with you that the 1.1 versions have not exactly set the world on fire.