[Update: Bob DuCharme is also blogging here. And Robin Hastings. Some more stuff here by David Megginson. Simon did a summary, and here’s my second day. Found another more voluminous blogging at Palimpsest]

[Update2: A list of all the blogging links is here]
Here’s my stream-of-consciousness blog on the 2006 XML Conference in Boston. The conference is running
four concurrent tracks: Publishing, Enterprise, Web, and Hands-On. I’ll
be jumping around between different tracks and trying to give a quick
summary of each session.

They’re making a big deal of the 10 year anniversary of the SGML 1996
Conference when the original 27 pages first outlining of XML, developed
over 11 weeks by 11 people, who “left their corporate identities at the
door.”(Jon Bosak). Bosak, the original working group chair, is also doing the final session.

Roger Banford (Oracle)

Roger Bamford (Oracle) presented the opening keynote Tuesday with Getting
There—The XML/XQuery Ecosystem
.

History

Started with a review of building a scalable application from 30 years
ago (hint: buy IBM). The downsides back then included ugly UIs with
minimal data validation, applications were messy and spread across a lot
of authors, no (real) database scaling [sounding familiar yet..].

New Requirements

Built-in support for tons of variation and also bring back DIY
customization. Response time is really important [thanks Google].

How XML/XQuery(P) Meets the Bill

XML + XQuery(P) + Apache + REST == no middle tier [hooray!].

What’s Going on Today

The brand new world is a UI machine with XQuery-enabled browser talking
to an XML-ready database (Sleepycat or Oracle) with absolutely nothing in between.
Interestingly, Bamford “considers XSLT to be the same technology” [nice
concession].

Why not other languages?

  • Xlinq — propietary
  • ECMAScript — not really for database programming
  • Perl, others — not built-in enough

With the XQuery recommendation in November, one less argument against
it. Currently, BEA, IBM, Oracle, and Microsoft are all getting onto the XQuery
bandwagon.

What’s Coming Up

We need enough free stuff to make XQuery ubiquitous. Specifically, a
standard XQuery engine, standard application definition, an Excel-like
management tool, and, of course, some killer apps.

Oracle’s attempt at the standard XQuery engine is Zorba, a C++,
open-source “high performance, low footprint” app–available “summer
2007.”

Web of applications instead of a web of content. Many are trying: APEX,
RORails, EXMAScript, but no one is developing a standard that goes all
the way up from the database to the browser, especially as XML is now
ubiquitous, except XQuery, which does XML natively and is standard and
free. The only scalable paradigm is one that supports
collaborative development by end users.

XQueryP:
An XML Application Development Language

Daniela Florescu (Oracle) starts the Enterprise track’s day on
programming XML with a talk about making XQuery a usable, pragmatic
language.

XQuery: The Beginning

In December 1998, everyone thought that they’d be able to develop the
new language farily quickly. However, the “database” community had a
hard time understanding the “document” community and vice versa. To
throw everything off, they “functional programming” folks got their say
as well. Now, as of November 2006 a whole family of recommendations were
made (XDM, XQuery, XPath 2.0, XSLT 2.0, XML Functions and Operators,
XQueryX… [see Norm for more]).

There are now 50 XQuery implementations, and three major databases
(Oracle, SQL Server, DB2) implement it [to some extent]. BEA and Oracle
are using it in application servers and Saxon, an open-source
implementation is flourishing.

Status of XQuery and its Update Extension

We now have a functional-heritage, read-only language with no
side-effects. The “database” community made sure to keep it optimizable,
with possible lazy evaluation, and compilers have freedom to do code
rewriting. The downside of this, of course, is no side-effects.

In additional to the traditional programming constructs of variables,
conditions, and functions, XQuery expressions are fully composable and
have built-in XML node constructors.

The first step in “ruining” this beautiful language is the XQuery update
extensions mechanism, now being working on at the W3C. Most of these
updates have to do with updates, of course: Primitive update,
Conditional update, Collection-oriented update. The new update
expressions are not fully composable [bummer, but obvious].
There are now clear distinctions between the side-effecting and the
non-side-effecting functions. Side-effects are not visible until the end
of an entire XQuery program (not to concurrent XQuery apps and not to the
current app either). Consequence: Java programmers have a hard time with
this style of concurrency.

What Are XQuery Users Doing

Some new use cases: Web Services (XML in, XML out), XML data
transformation from hereogeneous data sources (mashups), processing RSS
feeds and other XML message streams, coordination of services in SOA,
XML cleaning and normalization, and, finally, complex manipulation of
persistent XML data.

Are the folks trying to do the above happy with XQuery 1.0? Yeah, to
some extent, but they all hit the limit when they can’t express their
applications in XQuery. This means that they’re often jumping between
XQuery and some more “normal” programming language. This creates a lot
of friction between the inside and the outside of the XML world.

XQuery Overview

XQueryP is trying to address this frction, specifically:

  • preserve state (variable assignment)
  • Invoke side-effecting functions (Web Services)
  • Model Graphs
  • Ability to recover from errors
  • See side-effects during running

The technical proposal has six main points:

  1. Well-defined evalution order (”sequential order”)
  2. Reduce the granulatirty of the snapshot to make atomic update
    expressions
  3. New expressions:
    • Block
    • Set
    • While
    • Break, Continue
  4. Error handling (try-catch)
  5. Modeling graphs in XML
  6. Mapping XQueryP <-> Web Services
Sequential Evalution
Evaluation order is forced by side-effects. This takes 3 changes: FLWOR
statements evalutation order, sequential evaluation of commas, and rules
for updating function calls.
Smaller Granulation
Every single atomic update expression is executed and made visible
immediately.
Block
{ } syntax, potentially updating
Set
set $VarName := Expression, only allowed in some places
Functions and Blocks
A compatible change (with 1.0) where the function updating rules
are relaxed.
While
Just like you’d expect..
Break, Continue, Return
Just like you’d expect, forced by While..
Try/Catch
Just like you’d expect..except handling lazy evaluation—try-catch
assumes eager evaluation
Invoking Web Services
A standard way of importing a Web Service, invoking a WS operation
as a normal function, and expporting an XQuery module to do a WS
Adding XML Reference
Turning a tree into a graph [apparently a lot of people ask for
this]

None of the new syntax allows them to do anything they couldn’t do
before [XQuery 1.0 is Turing complete]. XQueryP is just trying to reduce
pain.

New use cases for XQueryP are in the browser (replace AJAX), XQueryP in
the databases, to handle complex data manipulation, and finally, in the
application servers.

Frequent Criticism

Answers to common complaints:

“Programmers do not know how to program declaritively”
SQL?!
“We don’t know how to coptimize a language that is not purely
declarative”
Not any better than half-XQuery, half-Java
“If you give users variable assignment, they’ll use and abuse it”
Teach them not to, rewrite automatically if they still do
“XML Pipelines are the answer”
Reduncancy of concepts isn’t good for productivity or global
optimization
“This is slower than hand-coded Java”
Yeah, but optimization can be in the engine to help everything
“No libraries!”
Go build some! But remember that the high-level of abstraction
makes many libraries unnecessary

Implementations

In Big OracleDB, could be in BerkeleyDB-XML (if there’s interest [ASK
ASK!]).

The
Essence of Declarative, XML-based Web Applications: XForms and
XSLT

Chimezie Thomas-Ogbuji (Cleveland Clinic) is talking about best
practices for generating XForms based UIs. For the last three years,
he’s been reimplementing a data-entry environment for nurses using RDF
and XForms.

[Discussion of XForms background]

authoring XForms, because of a highly recursive structure, verbosity and steep
learning curve, is difficult. The limited programmatic
expressiveness (compared to XSLT/Javascript) is being addressed in the
WG presently.

Problem: Editing Atom

Atom is fairly recursive, which lends itself well to a recursive
approach, and is becoming increasingly popular. The Atom Publishing
Protocol requires an “expressive submission mechanism,” which might be difficult to implement in more of a
scripting approaching.

One approach of interacting with Atom in a well-abstracted way is to use
four XML vocabularies (XUL, XForms, XSLT, and XHTML). This gives some ease
of authoring and component reuse.

Low-level widgets are written in XUL,
XSLT transforms XUL into XHTML and XForms. Some sample XUL mapping:

 xul:grid - table
 xul:groupbox - fieldset
 xul:lable - xforms:output
 xul:textbox - xforms:input | xforms:secret | xforms:textarea
 xul:radiogroup - xforms:select1
 xul:listbox/xul:listitem - xforms:select1
 xul:meulist/xul:menupopup/xul:menuitem - xforms:select1

Higher-level abstraction:

 ui:existential-block - a widget which renders a (labeled) placeholder
 when absent
  <xsl:template match="ui:existential-block">
    <xf:group  ref="current()[not({@node})]">
      <img src=".. add icon.." alt="Click to add {@node}"/>
    </xf:group>
    <xsl:apply-templates/>
  </xsl:template>

 ui:attribute-anchor - a simple widget for [ed: Damn.. too fast]
  <xsl:template match="ui:attribute-anchor">
    ...
  </xsl:template>

A combined example:

 <ui:existential-block node="atom:category">
   <xf:repeat nodeset="atom:category">
     <ui:attribute-anchor node=".">
       <ui:attribute label="Scheme" attrName="@scheme">
       ...
     </ui:attribute-anchor>
   </xf:repeat>
 </ui:existential-block>

[Ed: He uses oXygen]

[He now shows a sample of the above technique (with some code) on Tim Bray’s Atom feed,
rendered using X-Smiles.]

Panel: Word and OpenOffice for XML Authoring

Jon Parsons (XyEnterprise)

XML is about getting the right person in the right format at the right
time. You should be starting with a repository of re-usable objects,
because once you have that you can start assembling them and delivering
them.

When you think about the kinds of content in the world, it looks like a
pyramid. There’s a small core at the tip where there’s very high levels
of structure. Going down, there’s a middle tier of moderately structured
documents which are marked up enough to be re-used. “Everything else” is
most content and unstructured. The middle tier is where XML can add the
most value.

Why do people like to author in word processors (unstructured). “People
have an inherent need to format”. Sometimes that’s a waste, sometimes
it’s essential. Everywhere, people do not want to give that up.
Also, once people have adopted a tool they don’t want to give it up.
They’re unwilling to give up easy graphics handling, spellchecking, and
a familiar interface.

Instead of giving control of the format, the XML proponents want control
of the content [of course], because they want automated processing,
integration with other applications, content sharing, and a longer
lifespan for content.

Office 2003 showed that it was possible to use XML withing Word [ed:
arguable]. 2007 promises to make it easier. But authoring alone won’t
change the larger picture. You need to locate where your company
actually lives in the pyramid and then discuss authoring in
Word or OpenOffice.org.

Even if Word or OOo don’t output tremendously structured XML, at least
it’s XML and processable using standard tools.

Mark Jacobson (Really Strategies)

What’s possible with authoring in XML in MS Word?

Before

Word 97 and 2000 had no XML support and tended to use RTF to convert to
XML. They tended to be very style-restrictive. However, where it worked
was places that were already accustomed to generic markup.

Now

Word 2003 adds some interesting XML capabilities with XML-save and
custom schemas. You can now edit the XML in Word, but it’s clumsy and
not really suited for complex documents.

Cleaning is probably not done in Word, but there’s a fair amount of
customization in menus in Word to help users do things in a controlled
way.

The problems with custom schemas:

  • Word errors are so cryptic (see cleaning in another editor)
  • Working with mixed content is difficult
  • Tag insertion doesn’t enforce sequence
  • Nested structures are hard
  • Attributes are difficult
Future

Not enough has changed at the UI level (not easy to work with
attributes, for example). The native format, of course, is nice, and
Adobe has been doing this with InCopy.

Conclusions

Consider where in your chain it absolutely has to be XML. You need it at
the right place at the right time. Also, consider the cost/benefit of
customizing Word into a good XML editor or making a good XML editor
user-friendly.

Clyde Hatter (Propylon)

Using OOo to produce legistlative documents in the Irish Parliament
since 2004. OOo is the editing environment with transformation to
canonical data model (complex DTD representing Irish legislation).
Most people “and in that I don’t include anyone in
this room” would rather it a “full plate of broccoli” than edit XML.

OOo looked like it has a nice separation of content and style, and it is
is an XML editor”, it just has a fixed format (DTD) with a
zipped, fairly simple file format.

Structured documents are achievable through the disciplined use of
styles. However, there’s no out-of-the-box constraint in OOo or
validation or context-sensitivity (this is a big loss). Happily, as it’s
open-source, we can customize it to our needs.

One place it’s a good fit for is in automatically generating published
content (through generated content.xml and
styles.xml).

[Demo of their customized version of OOo for the Irish Parliament]

They’ve taken away a lot of redundant features and added some validation
features. Their “style palette” is a highly colored, constrained set of
styles which are allowed. There are a lot of styles: “You’re dealing with data models that go back
about 700 years”. Color-coding give simple visual feedback of how a
document is styled [ed: We’ve _really_ found this in our own work]. This
structured OOo document is up-transformed into a complex XML document
[using XPipe]. Finally, they’ve added validation features to indicate
structural errors.

[He customizes OOo in both Python and Java]

Panel: Agile XML Development [first half]

David Carver (S.T.A.R.)

Agile XML Schema Development

As with everything, the requirements are always changing. For data
specifications, this can cause real problems if the customers don’t yet
know what they want. One way to approach this is to do everything
manually (one time, just before release).

Rather than taking a traditional development process, go crazy XP:
Use Unit Testing, Test Driven Development (testing Elements, ComplexTypes, SimpleTypes,
Code Lists, OAGIS NDR, ATG 2 NDR), continuous interegration and source
control (daily checkins with Ant builds).

2 week interations seemed to
be the sweet spot for him with a publication-ready build at the end of
each iteration (force unit tests, schema development, requirement
gathering) to help communication with the stakeholders.

Automate everything: builds, guidelines, and testing.

The new process has reduced an 8 hour publishing process to 1 hour,
finds errors much earlier, and reduced “knowledge silos.”

Panel: XML Pipeline Processing

Sam Page (LiquidHub)

What is XML Pipeline Processing?

A sequence of XML processes applied to a set of input documents.
Processes: validation, transform, include, merge, aggregate, split,
load save, rename, or extract. The model is supposed to mirror Unix
pipelines (simple tools chained together).

Why No Standard Already? Actually a complex problem where most tools are
too simple: shells, make, Java, C#. Standard tools would be really nice,
as they’ll be xml-centric, reliable, and share improvement. Having a
simple domain language with XPath, conditional processing, and common usage
patterns. Cooler stuff: parallel processing and exception handling.

Field Pipeline uses include code generation, web portal generation,
documentation, and web UI presentation layers.

The Wyeth eCTD Pipeline

Built around a special DTD for submitting drugs to the FDA (others),
with clinical data, revisions, metadata, and a complex table of
contents. These documents require special UIs, with DHTML tree views and
search/hightlight, historical submission views, and role-based security.
The usual submissions to the FDA for a new drug are delivered on 4-5 palettes from semi trucks (”sending them in an electronic format with DVDs is considerably less
postage”).

The eventual Wyeth pipeline used a multo-stage XSL transform approach,
with filter, decorate, and render stages. Because each step was
discrete, they were more simple, maintainable, and testable. Splitting
also meant that the initial implementation could be incremental. Later
performance problems could be solved by optimizing single
transformations.

Norm Walsh (Sun)

[Actually uses *nix, the first presenter I’ve noticed]

Chair of the XML Processing Model Working Group, which he claims will be
finished by Fall 2007.

You want a language that stitches together the many disparate XML
components and processes. Rather than design by committee, “the simplest
thing that will get the job done.”

Components have named, declared “ports”, and inputs and outputs bind
document streams to “ports”. New components can be declared, describing
input and output ports and the number of documents inputted.

Language constructs around for the first version: p:choose,
p:for-each, p:viewport, p:try (not
well-described in the currennt spec),
p:pipeline-library (aforementioned custom components), and a library of standard
components (under discussion).

At the moment, paramters can bind values, but only strings (no document
fragments).

Summary

Current state:

  • They’re making progress but are fighting over naming (of course).
  • Development of the standard library is just beginning.
  • Defaults and abbreviation have not yet been discussed.