With all the recent talk of angle bracket taxes and what XML is and isn’t good for, I thought it would be fun to look at taking XSLT to places where it is not normally associated - the generation of binary file formats.
The sequence in XSLT 2.0 is of more use than the humble node-set. Not just restricted to nodes, you have access to things like the tokenize() function, that creates a sequence of strings or you can concatenate a sequence using the comma operator. The comma operator can be used on any data type.
However, there is nothing here that lifts us out of the ordinary; not until, that is, you create a sequence of xs:unsignedByte numbers. This sequence can be considered a byte sequence, and if you can create a byte sequence you can create just about any binary file format you like. A good example of this would be an image file like a Tagged Image File Format (TIFF) image. If you don’t get involved in image compression, it is relatively easy to create a TIFF image, after all it is only a series of sequences of bytes.
Mind you, there are two problems to deal with. The first is that a basic XSLT 2.0 processor does not support the xs:unsignedByte data type. Only a schema aware processor is required to support that data type. So, in the absence of the latter you’d have to make do with xs:integer and put up with the extra memory needed. Secondly, and more importantly is - how to get a byte sequence out the other end of an XSLT processor!
You cannot convert the byte values to Unicode codepoints because XML does not allow certain resulting characters which would undoubtedly crop-up in the byte sequence of an image. However, the byte values - also known as octets - can be Base64 encoded. You can write you own Base64 encoder or, if you are using Saxon 9+, the saxon:octets-to-base64Binary() function will do the job for you.
With the output method set to text, the resulting Base64 encoded string can be written to a file and subsequently thrown at a Base64 decoder. The transformation and decoding steps can be strung together using a shell script, a build tool like Apache Ant or, in the fullness of time, an XProc pipeline. Other possibilities include calling a method of an external class, from within the transform, to write the byte sequence directly to the file system. Or, how about extending the transformation engine to allow a binary output method.
Now, I would be the first to admit that using XSLT to generate Base64 encoded TIFF images is a bit niche if not down-right unusual, and you might ask what is it that I’m transforming that would require a binary output. Well, I’ll explain more in my next post.


MPEG, a working group of ISO/IEC, has standardized (within its MPEG-21 Digital Item Adaptation standard) means for generating binary file formats based on so-called Bitstream Syntax Descriptions (BSDs). The main application is the adaptation of (scalable) multimedia contents (JPEG2000, MPEG-4 SVC, etc.).
A BSD describes the structure of a bitstream in terms of packets, headers, layers, etc. It's also possible to include a parameter value in the BSD. The data type of the parameter is either provided through the schema to which the BSD belongs or directly in the BSD through xsi:type. Additional data types (mainly for multimedia formats) have been defined by the standard which are not natively covered by XML Schema built-in data types.
Thanks Christian, I'll take a look at that.