Related link: http://www.openpublish.com.au/
Open Publish 2004 here in Sydney just closed. (It grew out of the old SGML Open conferences, then became XML Open when the XML brand became hot, and then split out when the publishing industry attendees weren’t getting enough value from all the DBMS interest, and Nick Carr felt that there was a lot of synergy between the Open Source and Standards movements and the XML/PDF publishing industry.)
The quality is not so much in terms of papers or size (though these are good: Oracle’s Ben Chang, open content mangagement’s Michael Wechner, Tim Arnold-Moore, PlanetPDF, Microsoft, Adobe), but in terms of networking: for many people, it is being able to chat to old/future collegues, competitors, fellow travellors and potential consultants/clients that is just as important as the papers. A large conference covering to too broad an area means, in effect, that there is only a small chance that someone you talk over coffee to is interesting to you.
At this conference, just casually talking to people, I discovered a large publishing company who used Schematron with a checkbox interface, so that people could turn on and off the assertions they were interested in. That is a great idea. My implementations of Schematron (the open sources ones at Academia Sinica, and the commercial ones from Topologi) all support phases (a higher bundling of assertions) but not individual assertions in each rule: it seems like a really nice capability and the checkbox interface is obvious.
If I were to pick a theme or meme, it was that the decision on whether and how to support Word was the by far the most critical decision for most large XML deployments. Deciding not to frees you up considerably, but some users are pathetically entrenched, if you can forgive my sympathy for lapsing. If you decide to use it, then you need to spend a lot of resources on figuring out how to do the conversion (lots of tools now) and, more imporantly, how to force authors to only do things that will make it into the XML.
(Talking to someone else who was forced to accept RTF, I mentioned my old trick, which saved many projects: you get the Word users to first save as RTF, then to relead the document and save again. This does two things: first it tidies up the eventual RTF, second it exposes if anything essential is not round-trippable through RTF—this may be more comforting than important, though.)
David Drinkwater’s paper analysing the twists and benefits of a beefy yearlong project involving scores of CD-ROMs and books, in the area of moving adult learning objects published in paper/CD/Web formats to a single source. He said he doubts it is possible, for most kinds of material, to entirely use single source; when people tried to do too much formatting, the XML approach failed; the benefit was in not needing to hire an HTML designer for each CD, because the same design was reused. He said the project was supposed to take 6 months, but blew out to 12 months (how typical): I asked him how much time was spent on dealing with Word, and he said 7 months (they serialized out the internal DOM of Word rather than convert to RTF, and they didn’t want to upgrade to Word 2003 with the nicer XML export.)
I couldn’t help wondering to myself whether that 7 months was the cause of the 6 month blowout; David did say the Word handling took more resources than each of conversion to CD, PDF/paper, or Web. I didn’t say anything, out of politeness to the other speaker (a cheerful Microsoft guy with a good Infopath demo: XForms really is its competitor) because I was chairing. I managed to introduce David as Michael, embarrassingly.
I enjoyed Dr Peter Sefton’s paper most; I
James Robertson’s presentation on Content Management Systems (CMS) was very stimulating too. He said it seems that you should limit your CMS implementation plans around “What will my needs be in 2 years time?”; any contract or expectation of putting in a CMS that will need an ROI or value longer than this is foolish. I chaired that session too, and I always like to ask a question that “forces” the speaker to state more bluntly what they may be too politic to say: so I asked James whether he had ever seen a CMS project intended to meet needs longer than 2 years succeed: he said he never had. A big upfront budget on an expensive product may mean less ongoing funding to customize and take advantage of the swanky features. Many of James’ other point centred around the same core idea: I would restate it as projects targetted at the long-term fail in the short-term, are abandoned in the mid-term and never make the long-term.
Corridor gossip that the military may here are seriously looking at adopting
International Specification for Technical Publications
utilising a Common Source DataBase
for new projects. Our existing 5629A DTDs were a compromise to codify the existing practise in different services. Actually, this was another meme in the conference: that often it is better to first implement something that modernizes and regularizes the production process, without being to purist, and then, when the processes and technologies and expertise is in place, move up to a simpler, unified, company-wide schema. The initial move should concentrate on things like faster delivery time, or cost-cutting and process re-engineering; it remined me of a recent project I worked on where they discovered they could not put in a CMS until they had cleaned up their documents, which would take two years.
While on that topic, yesterday I heard of one organization that discovered it would take 3 days per document to collate a definitive version of their technical manuals (which had years of loose-leaf updates): that being so, their plans to convert the most critical 500 (of their thousands of) documents into a CMS would take about 70 man years: and they only had a 10 man team. And that is before conversion, checking, etc.
I had been asked to give an updated version of my
>Wiki-to-XML article at XML.COM. It was well-attended, and I was surpised when the local Boeing people said they were using a kind of Wiki internally for collaborative writing. People who use it, often love it. I guess we must never believe propaganda that says users require WYWIWYG authoring, especially when issues of convenience and efficiency dominate.
In my presentation, I showed a 400 character little document. To mark it up in XHTML took about an extra 430 characters. To mark it up equivalently in a Wiki took 9 characters! Hmm, 430 against 9: is terseness really of
"http://www.w3.org/TR/REC-xml/#sec-origin-goals">minimal importance even when using cheap fingers?
To mark the same text up in HTML using a customized HTML editor took between 15 and 27 mouse actions (depending on whether you did it all at once or separately from data entry.) My point was to suggest that a 50% reduction in keystrokes (well, given that any decent XML editor will provide templates, end-tag generation and other features, this is problably more like a 30% reduction) is nothing to be dismissed automatically, and that is just when you are adding data+markup: if you are marking up existing raw text, the Wiki is between 4 to 20 times less work! Of course, to make use of it, you need to forgo attributes and (probably) deeply nested element-content elements. But you can always stick on an XSLT transform at the last stage and convert to your production schema.
This show I was also wearing my spruiker hat as an exhibitor: Topologi’s new Professional Edition 2.1 is being released Monday, adding a lot of analytics and reporting capabilities to the existing validation and making everything work with everything (we can validate SGML with Schematron now!); I’ll sneak the Wiki-to-XML converter into the distribution as a freebie. My appointment book is pretty full next week for people talking about buying; so all in all a good, targetted conference for me.
Should you scream if the O’Reilly Network has one more blog from a conference this week?