I wrote some XSLT the other day that was so neat it made me smile, so I thought I’d share it. It’s an example of how the new
<xsl:next-match> instruction and tunnel parameters can combine to simplify your code. Fair warning: this is XSLT 2.0 through-and-through, and the use case here is one you will only care about if you process documents (rather than data), and pretty complex documents at that.
The problem was about overlapping markup. I’m processing some legislation, which changes over time, with the changes overlapping the main markup for the legislation. The changes obviously can’t be indicated with actual elements (since XML doesn’t allow overlap), so milestones are used instead: in this case they are processing instructions which I’ve turned into empty elements during preprocessing.
To give you an example, after preprocessing the content looks like this:
<p> This is the original paragraph, <change id="1234" type="addition" mark="start" /> to which this text has been added. </p> <p> The new text carries on into a new paragraph, <change id="1234" type="addition" mark="end" /> and this is the original paragraph again. </p>
We want the change to be indicated by blue text. That is, any text between the start and end milestone needs to be blue. Like this:
This is the original paragraph, to which this text has been added.
The new text carries on into a new paragraph, and this is the original paragraph again.
To turn the text blue, I use:
- a function that locates the text that’s within a given change
- a key that indexes each change by the generated IDs of the text within the change, as identified by that function (1)
- another function that returns true if some text is within a change, using the key (2)
- a template that matches text nodes for which that function (3) is true, wrapping it in a span with a class to indicate it’s amended text
The most difficult of these steps, and the one that gave rise to the elegant code, is the first one. (I’ll be leaving steps 2-4 as an exercise for the reader.) As is my wont, because this is a function that acts on a node, I created a function that just applied templates to the node using a mode named after the function:
<xsl:function name="eg:textInChange" as="text()*"> <xsl:param name="nstChange" as="element(change)" /> <xsl:apply-templates select="$nstChange" mode="eg:textInChange" /> </xsl:function>
The template is only meaningful for
<change> elements that have a
mark attribute with the value
'start', so I filter out those that don’t with an empty template:
<xsl:template match="change[not(@mark = 'start')]" mode="eg:textInChange" />
In other cases, I want to collect the text nodes that follow the
<change> element. I step through them one by one using the
following:: axes. To know when to stop, I need to pass along the id of the change: I stop the recursion when I get to a
<change> with a
mark attribute with the value
'end' and with that same id.
Crucially, I pass the identifier through using a tunnelling parameter. Tunnelling parameters will automatically be passed through any templates that get called from this one, even if they don’t explicitly declare that parameter. As you’ll see, this means I only have to declare the tunnelling parameter again in the template that actually uses it. The bit that turns a normal parameter into a tunnelling parameter is simply
tunnel="yes" in both the
<xsl:with-param> that passes the parameter and the
<xsl:param> that declares it in the template where it’s used.
<xsl:template match="change" mode="eg:textInChange"> <xsl:apply-templates select="." mode="eg:collectTextNodes"> <xsl:with-param name="endId" select="@id" tunnel="yes" /> </xsl:apply-templates> </xsl:template>
The behaviour of most nodes in
eg:collectTextNodes mode is just to go on with the recursive traversal of the nodes, moving on to the first descendant or following node:
<xsl:template match="node()" mode="eg:collectTextNodes"> <xsl:apply-templates select="(descendant::node() | following::node())" mode="eg:collectTextNodes" /> </xsl:template>
There are two exceptions to this behaviour. First, I want to actually collect (return) any text nodes I come across. So I need a template for text nodes that adds that node to the result sequence and goes on with the recursion. I don’t want to repeat the recursion logic, though, so I use
<xsl:template match="text()" mode="eg:collectTextNodes" priority="1"> <xsl:sequence select="." /> <xsl:next-match /> </xsl:template>
<xsl:next-match> tells the processor to apply the next-best template. Here, after adding the text node to the sequence, the processor moves on to the next-best template, which is the one that matches all nodes, and therefore recurses on to the next node.
The second exception is the
<change> element whose
mark attribute is
'end' and whose
id attribute matches the id that’s being passed along in the
$endId parameter. I can’t test the
$endId parameter in the match pattern for the template, so I need to do the testing in the body of the template, and if the ids are different, I want to do the default thing of recursing on to the next node. So this template uses
<xsl:next-match> again, to get the default recursion, and it picks up on the
$endId tunnelling parameter, because this is where it’s important:
<xsl:template match="change[@mark = 'end']" mode="eg:collectTextNodes"> <xsl:param name="endId" required="yes" as="xs:string" tunnel="yes" /> <xsl:if test="@id != $endId"> <xsl:next-match /> </xsl:if> </xsl:template>
And that’s it.
So why does this code give me so much pleasure? Because I do not have to repeat myself. Each template does exactly what it needs to do and no more. No tedious parameter definition and passing. No repetition of the XPath that selects which node to go to next. And that makes it easy to maintain. If I realise that actually I want to recurse only through specific types of nodes, or do the recursion in a different way, I only have to make that change in one place. Plus this DRYness is achieved without introducing any templates over and above what I need to do the job.
Having said that, the danger with
<xsl:next-match> in particular, and tunnelling parameters to a lesser extent, is that it can be hard to follow the logic — to identify the next-best template — when you or someone else returns to the code. So if you’re using
<xsl:next-match> or tunnelling parameters you should try to keep the templates together, or document the logic of the program somewhere.
It’s funny that in discussions of what’s new in XSLT 2.0, a lot of the emphasis goes on support for grouping, regular expressions and user-defined functions. But I’ve found a lot of the real benefits are here, in new features that augment the basic XSLT processing model.