A few years ago, I was briefly involved with a publishing company that was interested in packaging and producing eBooks. The challenges that we faced in trying to go from client submissions in Word, the occasional PDF and even straight text files proved to be daunting, largely because these works would in general place such a requirement on editors that it was not cost-effective enough to be a viable model. Most people working with Word have only a limited understanding and therefore use for word styles, and the notion of even more stringent structured documents was completely foreign to them.
More recently, I’ve migrated more into a role as an industry analyst, and consequently spend a significant amount of time creating working papers, conference presentation materials, web articles and blogs, not to mention the occasional book. While I’ve come to thoroughly enjoy DocBook (especially v.5.0, with wonderful xlink: goodness ;-) I find that the transformation from DocBook to XSL-FO to PDF still leaves something to be desired. This has nothing to do with Bob Stayton’s superb XSLTs, nor does it even have to do with FOP or XEP, both of which I use quite extensively. Rather, it has to do with that mysterious middle layer of XSL-FO.
FO is hard. It was built on a master publishing model that makes sense when laying out books, but that can quite frankly be a pain in the butt when dealing with otherwise ad-hoc articles. I know about structured layout - that’s why I use DocBook in the first place - but I don’t normally want to think extensively about all of the minutiae of laying out template masters, of creating bindings to those template masters, then figuring out how to write the exceptions around those when things don’t lay out the way that I want. I also like my CSS neat and tidy and accessible as class entities, and with XSL-FO, you don’t even have support for classes (at least in the most prevalent 1.0 version).
In other words, I would like to lay out my printable documents in a way that’s familiar to me, for which I have tools that can support this and that can easily be changed without having to do a search and replace through a hundred distance instances of a paragraph. In short, I want CSS, acting on XHTML, generating my printed pages as readily as it displays that content to the screen.
A previous blog from Michael Day about PrinceXML reminded me that I hadn’t had a chance to play with it. My previous experiences with XHTML to PDF conversion were, to put it bluntly, terrifying, and so, as I was downloading the JAR file I wasn’t expecting a lot. When I tried it, I wasn’t disappointed … I was stunned. I had taken an article that I’d recently written for XML.com and run it through Prince. It digested the ten page article and cranked out a PDF in under a second, and the quality was better than anything I’d been able to get with a straight DocBook/FO/PDF rendering.
I looked up the documentation, and found that it supported the CSS 3.0 page rendering set, as well as support for columns (including columnar rules), it could be used to print SVG content embedded or linked to the main XHTML document, and it included a nice set of extension properties for handling headers and footers, internal links, rounded borders, and the full panoply of CSS selectors including nth-child (which seemingly no one supports), content search and the whole gamut of pseudo-classes.
I’d heard some time ago that Bert Bos and Håkon Wium Lie of Opera had completed a book written using Prince and XHTML+CSS, but time pressures meant that at the time I was hard pressed to explore it further. At the time, I expressed a certain degree of skepticism - I knew that it was certainly possible, but I came away thinking that it would likely have been fairly primitive. After having seen PrinceXML in action, I’m a convert - I think its doable, and I think that the recent publication of CDF/WICD (the latter acronym is pronounced “Wicked”) by the W3C certainly opens up the possibility of HTML making serious inroads into the publishing layout sector, if a tool like PrinceXML was used as the rendering engine.
Prince comes in a number of different flavors, including a free personal edition that will place the PrinceXML logo in the upper-right-hand corner of the PDF (it won’t show up when printed), as well as licenses for server-side and corporate use. It uses a simple command line interface.
I judge a software package on a number of factors, from ease of use to quality of output, but one of the biggest indicators for me that something “works” is when I find I have an overwhelming compulsion to just play with it. With Prince, I find myself searching around my hard drive for completed XHTML documents just to feed them through the shredder and see my rather indifferent articles turn into something sparkly and fun.
If you are involved in content management in any way, you owe it to yourself to check out the Prince, at http://www.princexml.com.
Kurt Cagle is an author and software developer living in Victoria, British Columbia. He’s currently off trying to feed the family cat through Prince, despite the protestation of his family.