With all the recent talk of angle bracket taxes and what XML is and isn’t good for, I thought it would be fun to look at taking XSLT to places where it is not normally associated - the generation of binary file formats.
The sequence in XSLT 2.0 is of more use than the humble node-set. Not just restricted to nodes, you have access to things like the
tokenize() function, that creates a sequence of strings or you can concatenate a sequence using the comma operator. The comma operator can be used on any data type.
However, there is nothing here that lifts us out of the ordinary; not until, that is, you create a sequence of
xs:unsignedByte numbers. This sequence can be considered a byte sequence, and if you can create a byte sequence you can create just about any binary file format you like. A good example of this would be an image file like a Tagged Image File Format (TIFF) image. If you don’t get involved in image compression, it is relatively easy to create a TIFF image, after all it is only a series of sequences of bytes.
Mind you, there are two problems to deal with. The first is that a basic XSLT 2.0 processor does not support the
xs:unsignedByte data type. Only a schema aware processor is required to support that data type. So, in the absence of the latter you’d have to make do with
xs:integer and put up with the extra memory needed. Secondly, and more importantly is - how to get a byte sequence out the other end of an XSLT processor!
You cannot convert the byte values to Unicode codepoints because XML does not allow certain resulting characters which would undoubtedly crop-up in the byte sequence of an image. However, the byte values - also known as octets - can be Base64 encoded. You can write you own Base64 encoder or, if you are using Saxon 9+, the
saxon:octets-to-base64Binary() function will do the job for you.
With the output method set to
text, the resulting Base64 encoded string can be written to a file and subsequently thrown at a Base64 decoder. The transformation and decoding steps can be strung together using a shell script, a build tool like Apache Ant or, in the fullness of time, an XProc pipeline. Other possibilities include calling a method of an external class, from within the transform, to write the byte sequence directly to the file system. Or, how about extending the transformation engine to allow a
binary output method.
Now, I would be the first to admit that using XSLT to generate Base64 encoded TIFF images is a bit niche if not down-right unusual, and you might ask what is it that I’m transforming that would require a binary output. Well, I’ll explain more in my next post.