Following on from Kurt’s detailed reevaluation of XSLT 2.0, I thought that I might share an example of what you can do in XSLT 1.0 with the assistance of EXSLT, a useful set of extension functions that are supported by most XSLT implementations.

Since I have been thinking about state machines for parsing XML, why not try and use XSLT to generate one? This will require a transformation that takes a description of the grammar of XML, expressed as an XML file, and produces a description of the state machine for parsing XML, also expressed as an XML file. Very meta.

To keep it simple, we can start with the most basic grammar production in the XML specification, which declares that whitespace (S) is a sequence of one or more space, tab, carriage return or line feed characters:

S ::= (#x20 | #x9 | #xD | #xA)+

Since we want to process this with XSLT, we will need to express this grammar production in XML:

<pattern name="S">
    <oneOrMore>
        <choice>
            <char code="09"/>
            <char code="0A"/>
            <char code="0D"/>
            <char code="20"/>
        </choice>
    </oneOrMore>
</pattern>

Some of the element names I’ve used come from the RELAX NG schema language, which also uses <zeroOrMore>, <oneOrMore> and <choice> to define grammars, but grammars for XML vocabularies, not grammars for XML itself. Of course, it would be a good idea to write a schema for this grammar description language, both for documentation and to catch errors early. Then we would have an XML file describing a grammar for an XML file describing a grammar for XML. The only way to top that would be to finish it off by writing an XML RELAX NG schema describing the grammar of RELAX NG schemas.

Now we can write XSLT templates to process the grammar productions and generate state machines. I’m not going to go into the algorithms here; for that I strongly recommend the Dragon Book, but basically I need to process the grammar recursively, depth-first, turning each element into a little state machine and then returning to its parent and turning that into a larger state machine that incorporates the children. When I get back up to the top level I’ll be left with one big state machine that implements the entire grammar production.

This is a simple algorithm for most programming languages, but it is awkward to implement in XSLT 1.0. This is because XSLT 1.0 makes a distinction between node sets, which are sets of XML nodes selected from the input document, and result tree fragments, which are fragments of XML generated by templates. Result tree fragments are intended to go to the output rather than being kept around for further processing, so you can’t recursively apply templates to them.

This is where EXSLT comes in. Among its many useful extension functions is one called exslt:node-set(), which does just what we want:

The exsl:node-set function returns a node-set from a result tree fragment (which is what you get when you use the content of xsl:variable rather than its select attribute to give a variable value). This enables you to process the XML that you create within a variable, and therefore do multi-step processing.

As an example I’ll show how I can use this function to simplify the template for processing the <oneOrMore> grammar element. As with regular expressions, we can treat <oneOrMore> as being syntax sugar for <zeroOrMore>, because the regular expression a+ is exactly equivalent to aa*. This is great, as it means we only need to write one tricky template to handle <zeroOrMore>, and the template for handling <oneOrMore> can just expand it like a macro. Here is the complete template:

<xsl:template match="oneOrMore">
 <xsl:variable name="expansion">
  <sequence>
   <xsl:apply-templates mode="copy"/>
   <zeroOrMore>
    <xsl:apply-templates mode="copy"/>
   </zeroOrMore>
  </sequence>
 </xsl:variable>
 <xsl:apply-templates select="exslt:node-set($expansion)"/>
</xsl:template>

The template expands the <oneOrMore> element into a <sequence> containing the original children followed by <zeroOrMore> with the original children repeated, thus turning a+ into aa*. Once this is done we can apply templates recursively to the expansion of our macro, as if the <oneOrMore> element was never in the original document at all. This is an easy way to handle any elements which exist for user convenience and can be recursively expanded to a longer sequence of simpler elements.

So if there is something you need to do in XSLT 1.0, give EXSLT a try. Many XSLT engines support it and it is a handy way to bridge the gap between XSLT 1.0 and 2.0.