March 2004 Archives

Bob DuCharme

AddThis Social Bookmark Button

Related link: http://lists.xml.org/archives/xml-dev/200403/threads.html#00458

Several people made good points in a recent xml-dev thread begun by Andrew Welch’s posting titled Current Status of XLink. (The link above goes to the entry for Andrew’s posting on the thread index, a web page that includes links to some replies below Andrew’s entry; a different part of the same page lists the remaining thread entries. Note that the thread list indentation doesn’t always reflect the thread structure of the discussion.) The “what’s wrong with XLink” topic is a bit of a permathread on xml-dev, and this “Current Status of XLink” thread gives the flavor of several issues that will be brought to the table if the W3C ever convenes a Working Group to update or replace XLink. My favorite quote was from XSLT 2.0 spec editor and Saxon developer Mike Kay:

In my view the mess is because XLink simply doesn’t fit into the layering of the XML architecture. The whole point of XML is that you can choose any names you like for your objects and attributes, and give them any semantics that you like (typically captured in schemas and stylesheets). So why should relationships be different from objects and attributes, and require fixed names and fixed semantics?

Hyperlinking is something that belongs in the user interface layer, not in the stored information. The stored information needs to hold relationship information in a much more abstract form. The hyperlinks, like all other user interface objects, should be generated by the stylesheet. It’s because the hyperlinking community failed to recognize this that the idea failed to catch on. The other consequence of this is that there is a gaping hole in the XML story as to how abstract relationships should be modelled.

(For a nice example of hyperlinks being generated by the stylesheet instead of being stored in the data, see my last weblog posting.) Eric van der Vlist replied to the question ending Mike’s first paragraph above by pointing out that “one may consider that a markup that would have some support for describing graphs might be more useful than one that only supports trees,” and I replied to Eric that “I don’t think that addresses Mike’s point: why does markup that describes
graphs require *fixed* names and semantics?” And, because of Eric’s wish to use markup to describe graphs, I threatened to drag RDF into it, but I backed off.

I also liked Ari Krupnikov’s wish for a CSS “property that would turn an element into
a link. Something that would make it possible to replicate a/href only
on arbitrary elements/attributes, or even on character content.” It would be the fulfillment of what Mike was talking about: parts of your data have relationships defined as links, and then the stylesheet that prepares that data for delivery to the desktop turns those links into hyperlinks appropriate to the client program (in most cases today, a web browser) running on that desktop. It’s possible with XSLT today; like Ari, I’d love to see CSS allow this as well, although truly arbitrary conversions, such as the use of an element value to fill the href role, would be tougher with CSS than with XSLT. If CSS just allowed us to treat foo/@bar as a/@href, that would be great.

For another recent thread on what’s right and what’s wrong with XLink, see the thread beginning with Micah Dubinko’s XLink and mixed vocabulary design posting. See here and here on the threaded list page for links to later entries in the thread.

Instead of closing with the overly-discussed question (at least on xml-dev) “what do you think is wrong with XLink?” below, I decided to go with something inviting more constructive answers…

Is it worth it for the W3C to convene a Working Group to create an XLink 2.0, or maybe to christen a “profile” of 1.0 that strips out certain parts?

Michael Fitzgerald

AddThis Social Bookmark Button

Since arriving at the party in 2001, RELAX NG, a nifty schema language for XML, has grown from an OASIS committee spec into an international standard (ISO/IEC FDIS 19757-2:2002). Some shrug it off in favor of the more widespread XML Schema, but those who are well aquainted with RELAX NG find it hard to shrug it off, least of all on technical grounds. As John Cowan once put it, “Once RELAX NG’s concepts have crossed the blood-brain barrier, you will never be able to take any other schema language seriously again.”

That’s why I am glad to see RELAX NG getting attention at the W3C. I mean, they invented XML Schema, so why would they bother with a competitor? Well, that’s because RELAX NG is hard for the astute mind to resist. Right now at the W3C, the current working draft of XHTML 2.0 is sporting a RELAX NG schema. You can also find a RELAX NG schema (in compact syntax) in the RDF/XML syntax spec, in WSDL 2.0, SVG 1.2, and unofficially for XML Signature.

I think we have advocates like Masayasu Ishikawa and others at the W3C, including Chris Lilley, Dean Jackson, and Joseph Reagle, to thank for the the RELAX NG incursion into W3C. You just can’t keep a good schema language down.

Do you use RELAX NG? Let’s hear why or why not.

David A. Chappell

AddThis Social Bookmark Button

Why are the words “composable” and “stylesheet” both considered spelling errors in Microsoft Word? :)
Dave

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://tbray.org/ongoing/When/200x/2004/03/15/Photointegrity

While I’m still quite thrilled with my Canon Digital Rebel (interchangeable lenses! real manual focus! precise exposure control!), a piece Tim Bray posted yesterday reminded me of how far I have to go to regain skills I’d developed long ago on a much more manual camera in a much less forgiving environment.

After writing about how Photoshop affects the way people work with images, Tim writes:

I can see the lure of a cult of photographic puritanism and minimalism; take the bits the camera gives you and push ‘em out on the Web. Because once you’ve decided not to colour-correct and sharpen, shouldn’t you also give up on cropping? If I took that vow there’d be a lot fewer pictures here, but each would, I think, somehow mean more, because you’d know that nobody, however well-intentioned, had pissed in the pipeline from the camera to your screen.

That cult seems alive and well in the thousands of raw cell-phone pictures posted daily, in all of their weird white balance, graininess, and odd composition. It’s much like the way that Instamatic pictures processed automatically showed the world however it had been in the camera, without any opportunity for the taker of the picture to retouch it.

Professional photographers have rarely been members of that cult, however, as darkroom skills have been important as long as there have been cameras. Cropping, dodging, and burning have always been key tools on the path from film to print.

I never developed my darkroom skills too far. I’d always thought that was unfortunate, but now that I compare the results I’ve gotten from my first three months with a new SLR and my last three months of active photography with my old and utterly manual Pentax K1000 SLR, I see that perhaps it was an advantage.

I shot slides in college because the overall costs were lower, paying the premium when I needed the occasional print. After a few years of doing that, even though I wasn’t taking pictures all the time, my compositional skills improved dramatically. The disconnect between the time I took the picture and the immutable results I would get forced me to pay careful attention to framing and exposure. The cost of film (and sometimes the hassles of changing rolls or simply running out) kept me from shooting multiples in the hope that one might work out.

The pictures I’m taking now, even when I’m shooting similar subjects in similar conditions, just aren’t as good. I can feel ten years’ worth of rust that needs removal, but I also feel myself resisting the kind of discipline I used to have. When I can go from original to good enough with a few minutes in Photoshop, it’s tough to convince myself to put in the extra effort when I’m taking the shots.

Maybe writing this will shame me into paying that kind of attention again; otherwise I guess I’ll just have to buy a film body and shoot slides for a while.

Ever find that working with less makes you perform better? (I know assembler has that effect on people.)

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.w3.org/MarkUp/2004/02/xhtml-rdf.html

There’s long been a disconnect between the regular (X)HTML Web and the RDF Semantic Web, one frequently bridged with duct-tape like solutions. Unfortunately, rather than improve on that duct-tape with a clean joint, a recent proposal suggests replacing the duct tape with a bit of bubblegum and wrapping the whole thing in baling wire.

In the early days of HTML, the META tag seemed like a pretty cool thing, a way to put information into the HEAD of an HTML document that applications could interpret - or not interpret - as they liked. It was easily extensible in those cheerful days before namespaces, even if much of what it was used for was things like architecturally suspect blending of HTTP headers with HTML documents.

As people have come up with new things to do with the Web, META (now typically lower-cased as meta) has continued to be popular. Looking at website source code, you may find things like:

<meta name=”ICBM” content=”42.46558,-76.41397″ />

for use with GeoURL, or:

<meta name=”dc.creator” content=”Simon St.Laurent” />

The former relies, in classical HTML style, on the expectation that it’s unlikely someone else will create a property named “ICBM” and use it for something other than geographic coordinates. The latter reflects a more paranoid time, where Dublin Core metadata is typically prefixed with dc or DC to avoid
name collisions.

This prefixing is accomplished here with a duct-tape solution, something convenient that fixes the problem roughly for a large class of processors and conveniently doesn’t require much hunting around in the document. You can implement tools which hunt for this duct tape without even using XML tools - regular expressions will do just fine.

Unfortunately, this convention for Dublin Core doesn’t meet the expectations of URI and QName-obsessed RDF triple processing. As Birbeck points out, the triples don’t work because there isn’t quite the set of information RDF expects.

Birbeck’s solution is to replace the duct tape holding together the prefix and the name with a colon - the bubblegum - and then wrapping the whole thing in baling wire, since you now need to find your namespace declarations in the surrounding context.

Why do I describe this as mere baling wire and not a more robust form of adhesive? It’s because the solution proposed here relies on a use of QNames that requires lots of extra work on the part of the implementer. The information needed to process the meta tag successfully is now a namespace declaration likely elsewhere in the document, but that namespace information won’t flow naturally to your processor, even if the XHTML is well-formed XML. This meta element, for instance:

<meta name=”dc:creator” content=”Simon St.Laurent” />

will still be reported to an application by a garden-variety XML processor as a meta element containing an attribute named “name” whose value is “dc:creator” and an attribute named “content” whose value is “Simon St.Laurent”.

There is no magic processing of the name attribute into “a QName whose URI is “http://purl.org/dc/elements/1.1/” and whose local name is “name”, which is the information you need to actually make the triple. XML processors do this work for element and attribute names, but not for content.

Even if a schema identifies the name attribute of being of type xs:QName, only schema-compliant processors producing a post-schema validation infoset (PSVI) will provide that, and there aren’t a whole lot of those in the world. The PSVI and similar approaches aren’t particularly renowned for their efficiency in any case, and it’s especially hard to justify using that heavy a style of processing on what was until recently pretty ordinary and easy HTML, even if it was cleaned up to XHTML.

If a schema isn’t available, the lucky implementer gets to keep track of which namespace prefixes are in scope at a given point in the document and break down the QName manually. If developers are using an environment that didn’t value prefixes enough to keep them around, they’re out of luck. If the user didn’t bother to declare the namespace, the document is still well-formed XML, but the developer can either guess what it was supposed to be - the equivalent of the earlier duct-tape solution with the dot - or just give up.

There is a simple solution that cleans up the joint and builds a stronger structure on which RDF and other developers can build, however. It does mean discarding the HTML META element’s extensibility through a name attribute, and turning to the very technology that makes this particular use of META so painful: namespaces.

Instead of fiddling with this:

<meta name=”dc:creator” content=”Simon St.Laurent” />

use this:

<meta dc:creator=”Simon St.Laurent” />

This will be consistently reported as a meta element with an attribute whose URI is “http://purl.org/dc/elements/1.1/”, whose local name is “creator”, and whose value is “Simon St.Laurent”. All the information needed to create the triple is available, without the need to use tools any more complicated than a namespace-aware XML processor, which is most of them.

You can even go from there to:

<meta><dc:creator>Simon St.Laurent</dc:creator>

Once you start treating meta as a first-class container, you can make much more sophisticated statements, using all of RDF/XML if you really want. It becomes a clean joint between HTML and RDF using core XML mechanisms, capable of supporting far more weight.

The mechanisms for including metadata in the body that are described in the rest of the Note are yet another tangled mess of chewing gum and baling wire that needs repair, but as this blog entry is already far too long, I’ll leave that as an exercise for the reader.

Note: If you’re curious about the gum and baling wire metaphor, see
this explanation.

Ever notice that using markup syntax as designed is a lot easier than extending syntax in ways that aren’t magically supported?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.w3.org/TR/2004/WD-xmlschema-ref-20040309/

Although I have been a harsh critic of both the widely used W3C XML Schema and the far less common XPointer family of specifications, it was not until this week that I was granted the privilege of seeing the two of them united.

W3C XML Schema has for years been an unexpected guest at a ball, luring away partygoers with its many delightful promises, stealing their hearts and making it impossible for them to escape.

XPointer, though a distant relation in the W3C family, has spent years sulking in a corner, fending off suitors with threats of patents, with long, drawn-out engagements that come to naught, with unexpected complexities surfacing in what began as simple conversations.

But now! To see them dancing together, a sight hardly imaginable before. Such a strange pair, a couple united perhaps by chance and perhaps by destiny.

After all, who can resist the delicate rhythms to which they dance:

#xmlns(ipo=http://www.example.com/IPO) xscd(/complexType(ipo:Items)/sequence()/item/complexType()/@partNum)

#xmlns(ipo=http://www.example.com/IPO) xscd(/simpleType(ipo:SKU))

#xmlns(ipo=http://www.example.com/IPO) xscd(/simpleType(ipo:SKU)/pattern())

#xmlns(r=http://www.example.com/Report) xscd(/r:purchaseReport)

#xmlns(r=http://www.example.com/Report) xscd(/r:purchaseReport/complexType()/sequence()/r:regions)

#xmlns(r=http://www.example.com/Report) xscd(/r:purchaseReport/complexType()/sequence()/r:regions/identityConstraint(r:dummy2))

It must be true love. There’s no other polite way to explain the dizziness this produces.

It’s quite amazing to see these two come together. I feel it has borne out my most strongly-held convictions about them.

Ever have to say nice things about a partnership that brings out the worst of both sides?

Bob DuCharme

AddThis Social Bookmark Button

Related link: http://thomas.loc.gov/home/xml_help.html

The Library of Congress’ Thomas web site (named for a former resident of the town where I live) is now making some new legislation available in XML. The XML points to XSLT stylesheets that format it for viewing, so that if you go right to http://thomas.loc.gov/home/gpoxmlc108/h3701_ih.xml with a browser you’ll see centering, bolding, and even links. (Do a View Source to see the markup.) Not all the bills available on their web site are available in XML as well, but I found one directory with links to over 200 XML documents. Their XML Display: Help page has a bit more background, and http://thomas.loc.gov/dtd/ has links to their DTDs.

The XML document mentioned above includes a working link, and I was very pleased when View Source showed me that it wasn’t an HTML a/@href one. The comments in the DTD that the document references describe an interesting evolution of its linking architecture: there was an attempt, later abandoned, to keep it in line with XLink; I was tickled to see the phrase “architectural form” come up in one comment. Ultimately, they modeled the links around the relationships between their particular document types instead of trying to shoehorn these relationships into some wider linking standard, and then the XSLT stylesheet that prepares it for web delivery turns the links into a/@href links. The following shows the attribute list declaration for one of the DTD’s linking elements, external-xref:

<!ATTLIST  external-xref legal-doc (usc | public-law | statute-at-large |
                                    bill | act | executive-order |
                                    regulation |senate-rule | treaty-ust |
                                    treaty-tias |usc-appendix | usc-act |
                                    usc-chapter | usc-subtitle)  #IMPLIED
                          parsable-cite      CDATA       #IMPLIED>

People who conflate linking and hypertext forget that the former is about relationships between data and the latter is about the presentation of those relationships. The markup community learned long ago that separation of content structure from content presentation is a Good Thing—this realization was actually a key driver for the growth of this community. The notion that content relationships and the user interface to express those relationships are also distinct (or rather, that keeping them distinct can offer the same advantages as keeping content structure and content presentation separate) is not quite as widespread, so I was happy to see a great example used in a publicly accessible tax-dollars-at-work project. The links describe the relationships in terms of the document types themselves, not in terms of the UI for expressing those relationships. It says that a House resolution has an external reference to a legal document, which may be part of the United States Code, a public law, a statute, etc.; a stylesheet then converts this relationship to more broadly-used markup necessary to display it as hypertext on a web browser. If another delivery medium uses different markup to describe hypertext links, another stylesheet can convert the same House Resolution XML to the appropriate markup for the other medium. It’s a great model.

Should they have done it differently?