On a project today one of XML’s paradoxes struck me: we adopt XML for publishing often because we want to re-target our documents to different publications and media; however we then find it useful if the information is organized or formatted similarly on different media and applications, in order to reduce gratuitous differences, ease processing and to increase branding.

So our books end up getting PDA-isms such as small sections, or HTML-isms such as page focus, or RSS/Atom-isms such as chapter and section summaries (It was worth writing this blog for the pleasure of saying “RSS/atom-ism”.) And our HTML pages get book-isms, such as the familiar TOC and next/previous/up buttons. The initial movement for media and publication independence is met by a counter movement for cross-media and cross-publication homogeneity.

Rather than thinking in terms of XML documents as ideally being free of particular presentation dependencies (”Just model the Information” was the SGML mantra) it may be more prudent to think of XML as encoding both the raw data and information required by a mythical all-purpose medium. This would explain, for example, why we have generic tables: tables have long been a conundrum for pure markup theorists, because specific markup using lists of named elements and some selection/sorting/styling code should provide much better value…but tables encode this cross-media information which does not violate the separation of content and specific presentations on specific media.