This is the continuation of blogging from XML Conference 2007. See yesterday’s post for more. There are, of course, a lot of folks blogging about the conference. Here’s my colleague Andy’s take. Elliotte Rusty Harold is providing some wonderful reading as well (and apparently did a smashing job at the XForms talk last night). For a visual sense of the conference, check out David Megginson’s photos on Flickr.
Dorothy Hoskins: Outside-In XML Publishing
What role can XML play at the prettiest end of the print production spectrum? Dorothy doesn’t feel that XSL-FO is an appropriate solution for shops requiring graphically-rich publishing. Instead of struggling with XSL-FO in these cases, develop XML outside of your formatting system and then eventually import your content near the end. Both InDesign and FrameMaker are good options for this route. FrameMaker 8 has good integration with DITA, in particular. Both have server (expensive) applications for high volume environments. [FrameMaker 8 DITA demo].
InDesign Scripting Potential
Getting content out of InDesign (perhaps to put it on the web) can be challenging. Tagging (either manual or aided by scripting) can impose a little structure on your otherwise flat InDesign document, which will help make the export more useful.
One thing to avoid when importing XML into InDesign is pretty-printing, because InDesign is very whitespace sentence. Pretty-printed XML may introduce a lot of unwanted linebreaks and spaces. [Some InDesign demos, including a blog posting application from InDesign, which was pretty cool.]
Since 2000, publishers have grasped the importance of XML, but in the early days there were not any solutions that fit them well. Today, there are a huge numbers of XML products targeted toward publishers, some of which is actually helpful. Takeaway: 7 years has produced progress.
The ups and downs of the entire publishing industry really mirrors each individual publishers implementation of new technology: ignorance->mistakes->learn via pain->productivity increases. In some cases, it has been lucky to be a late adopter, as some early adopters are now ditching their first set of products for more useful tools today. The tools themselves have become smaller in some cases and this specialization helps publishers solve specific problems rather than trying to solve everything with a huge product (circa 2000).
Specifically, publishers in 2000 were unsure (”skittish”) about whether XML would help them (and what new business model it would support). “All of them were naive with the likely effort of [implementing XML technologies].” Convincing publishers of the ROI of XML was quite difficult. Today, no one asks for that justification anymore, but that’s in large part due to the wide implementation experience in the industry. Another change is that XML is almost always produced at some point in the publishing chain today, although it may be right at the end (perhaps to help get content online).
As far as workflows and publishing philosophy, “Print-First” has shifted past “Media-Neutral” and “XML (web)-first” to “I don’t want to choose first”, where the organization doesn’t want to choose the eventual delivery focus at day one. They will eventually think of it as tied to a rendered version of their content. People, it turns out, just don’t think in a “Media-Neutral” way.
Interestingly, the requirements for CMS technologies haven’t really changed. What’s changed is that the requirements are actually supported by products today, whereas earlier products were extremely focused on the web, had XML as a bolt-on rather than nicely integrated, and users didn’t have many choices for editors. Today we have both web CMSs and XML CMSs (Alfresco), native XML CMSs (RSuite), mature print capabilities via XSL-FO and InDesign/Framemaker integration, and a variety of editing options.
What’s actually being implemented in 2007? It’s really a story of an amalgam of systems all working together: print production systems (like K4, built on InDesign), native XML CMSs, Documentum (perhaps with MarkLogic), and Digital Asset Management systems (like Artesia). What Lisa really likes: Native XML storage [pushing MarkLogic hard], which lets you query XML, combine XML queries with fulltext search, leave metadata in your XML, and enrich content in an ongoing basis. Editing tools have advanced and diversified as well, some publishers are building interesting tools with XForms, others with typical XML editors, yet others with Adobe Creative Suite integration or web widgets [like xOpus?]. This area of tools and their customization for individual publishers is where Lisa thinks more resources should be invested.
- “Real” link management
- Real tools for UI, navigation, and community
- Collapse of the external/internal divide for external authoring contributions
- Richer editor-centric experience
- CMS tools for DITA
- Integration with OOXML
Robin Doran and Matthew Browning: BBC iPlayer Content production: The Evolution of an XML Tool-Chain
The iPlayer is being developed to allow streaming of scheduled BBC TV and Radio shows. It’s a massive exercise, with data collection and collation from multiple sources. The scheduling information itself is quite complex and delivered in the emerging XML standard called TVA, which the BBC is helping along.
The first generation of the iPlayer was built on an RDBMS, which was an obvious choice for their structured (meta)data. The media itself was stored and delivered separately. The problems surrounding the first generation centered around performance, creeping requirements, and unncessary storage (they didn’t care about the metadata after the 7 day replay window had passed). The second time around, they built an XML representation of their objects, modeled using RelaxNG and transformed into other formats using XSLT. Updgrades to the schema are now as easy as editing the RelaxNG schema file, and all entries are now validated at their creation. The targets of the XSLT transforms are web pages, search packages, syndication widgets and feeds, and Facebook messages [phew!].
Does it work? Well, it’s out there as a beta right now and has garnered both good and bad reviews. The revision to XML did mean that they were able to dramatically reduce the human resources directed to the project. Some takeaways from this process? SQL is an option, but not always a must, “Pipelines rule”, as does simplicity. What hurt SQL? Well, ease of schema modifications was important, as they went through 40 iterations of their database schema.
Micah Dubinko WebPath: Querying the web as XML
“We need better web tools.” Not webmaster tools or Eclipse tools, but tools that treat the web as a resource. Pulling random XML off of the web rarely works as promised, though some have exaggerated this problem. In the platonic web, all of the problems with non-well-formedness don’t exist. Perhaps that’s one way to shift people closer to a [pro] “web” bias.
Implementing XPath 2.0
WebPath started out as a [Yahoo] Hack Day project and has 5 pieces: lexer, recognizer [simplifies special cases, search for michael kay optimizing xpath], a top down operator precedence parser [via Doug Crockford; there’s no grammar!], interpreter [evaluate the left of the
/ then use that as context for evaluating the right side], and ?. WebPath has lots unit tests [imagine?!]. It also provides a bunch of “webby” methods.
Part of the design decisions have been critized, some came from Hack Day pressures, others intentional from a “web bias.” There hasn’t been any work on optimizing Schemas, for example, but that’s because Micah thinks Schema is used for optimization, which WebPath hasn’t worked on anyway. Another controversial decision was liberal node tests, because many prefer
/prefix:html/prefix:head/prefix:title. [Now it’s time for a demo in the python 2.5 interpreter (on his mac). One surprise was a new axis
traverse::*. Used when reading documents from the web like in
//a/@href/traverse::*. Another example using two hops and the
string(//a[contains(., '1')]/get(@href)//a[contains(., 'Grok')]/get(@href)/head/title) ==> Groklaw - Digging for Truth]
“minidom is not suitable for … nearly anything.” This has made node-sequencing too hard to implement, as well as a number of other features. Many who like XSLT1 or Lisp prefer the first line, but Micah likes the idea of following links in the second line:
document(document(//a/@href)//a[test()])//*[id="targ"] //a/@href ---> //a[test()]/@href ---> //*[id="targ"]
This was why he added
traverse:: and a new operator
doc() is in XPath/XSLT 2.0 and both XSLT 1.0 and 2.0 have
document(), so other people have thought about this problem.
Experiments and Enhancements to XPath
Part of the interest behind this experiment was wanting to grab the deepest row from a page of 60 tables 7 deep, which can be a pain. Enter “inline closures.” [I didn’t entirely follow his explanation on this.] Another idea is
count(uber-root()//*[contains(@class,'hcard')]), which grabs all the hcards from the whole internet, if you happen to have a couple copies of it lying around like Yahoo! does.
Mark Birbeck: XForms, REST, XQuery…and skimming
This talk is a advert for XForms, again [like last night]. It’ll try to make the case that XForms is the last piece in a jigsaw that’s been forming over the last few years (REST and XQuery among them).
Web apps today
The client in web applications is too thin and provides insufficient technology to make building web applications easy. As a consequence, all of the work happens on the server (UI, state, forms, validation). While people are moving away from this, it’s still quite typical. The key is that there’s no separation between the UI and data, because applications use some data to help create a UI on the fly [amusing verbal slip where he can only remember the names of the XForms elements rather than HTML forms]. This coupling makes data reuse with multiple UIs hard, as well as UI reuse. Even in a RESTful model, a URL isn’t really a link to some data but rather a link to a UI that acts on some data.
Ideally, data would be passed to the UI. This gives the “idea of ’skimming’—a stone being thrown across a lake” [like the data bouncing along]. This model becomes easier to grasp as web services become more important. A model based on web services providing data is a big improvement over the model described above, but it’s still difficult to maintain a UI server doing all this work. Enter rich clients.
Web apps evolving
Breaking UIs into pieces that only consume data might mean that you can have separate form designers, help text authors, database builders, and translators all separate. This is [a ha!] where XForms comes in, as it explicitly allows these functions to be broken discretely apart. With XForms, as with Ajax, automatic UI updates without reloads are possible, but this bit is well publicized. Less commonly talked about is the ability to drive the UI with data types (datetimes with a date selector). [Demo of some of these ideas.]
“XForms is an ideal rich client”:
- allows distributed data sources
- is based on XML standards
- allows for dynamic UI
- is declarative
- can be event driven
- can be accessible
While XForms standardizes the UI, there isn’t a particular standard on the server side. Two of the most interesting are WebDAV and “ATOM” [AtomPub?], which seem to be generating quite a bit of continued interest this week. [Demo of WebDAV connection to eXist database with filesystem-like saves and updates.] The final step available for standardization is the querying language, for which XQuery seems the best choice. [XQuery demo.] [Cool GData demo w/XForms with both GMaps and Google Docs]