Related link: http://www.w3.org/MarkUp/2004/02/xhtml-rdf.html
There’s long been a disconnect between the regular (X)HTML Web and the RDF Semantic Web, one frequently bridged with duct-tape like solutions. Unfortunately, rather than improve on that duct-tape with a clean joint, a recent proposal suggests replacing the duct tape with a bit of bubblegum and wrapping the whole thing in baling wire.
In the early days of HTML, the META tag seemed like a pretty cool thing, a way to put information into the HEAD of an HTML document that applications could interpret - or not interpret - as they liked. It was easily extensible in those cheerful days before namespaces, even if much of what it was used for was things like architecturally suspect blending of HTTP headers with HTML documents.
As people have come up with new things to do with the Web, META (now typically lower-cased as meta) has continued to be popular. Looking at website source code, you may find things like:
<meta name=”ICBM” content=”42.46558,-76.41397″ />
for use with GeoURL, or:
<meta name=”dc.creator” content=”Simon St.Laurent” />
The former relies, in classical HTML style, on the expectation that it’s unlikely someone else will create a property named “ICBM” and use it for something other than geographic coordinates. The latter reflects a more paranoid time, where Dublin Core metadata is typically prefixed with dc or DC to avoid
This prefixing is accomplished here with a duct-tape solution, something convenient that fixes the problem roughly for a large class of processors and conveniently doesn’t require much hunting around in the document. You can implement tools which hunt for this duct tape without even using XML tools - regular expressions will do just fine.
Unfortunately, this convention for Dublin Core doesn’t meet the expectations of URI and QName-obsessed RDF triple processing. As Birbeck points out, the triples don’t work because there isn’t quite the set of information RDF expects.
Birbeck’s solution is to replace the duct tape holding together the prefix and the name with a colon - the bubblegum - and then wrapping the whole thing in baling wire, since you now need to find your namespace declarations in the surrounding context.
Why do I describe this as mere baling wire and not a more robust form of adhesive? It’s because the solution proposed here relies on a use of QNames that requires lots of extra work on the part of the implementer. The information needed to process the meta tag successfully is now a namespace declaration likely elsewhere in the document, but that namespace information won’t flow naturally to your processor, even if the XHTML is well-formed XML. This meta element, for instance:
<meta name=”dc:creator” content=”Simon St.Laurent” />
will still be reported to an application by a garden-variety XML processor as a meta element containing an attribute named “name” whose value is “dc:creator” and an attribute named “content” whose value is “Simon St.Laurent”.
There is no magic processing of the name attribute into “a QName whose URI is “http://purl.org/dc/elements/1.1/” and whose local name is “name”, which is the information you need to actually make the triple. XML processors do this work for element and attribute names, but not for content.
Even if a schema identifies the name attribute of being of type xs:QName, only schema-compliant processors producing a post-schema validation infoset (PSVI) will provide that, and there aren’t a whole lot of those in the world. The PSVI and similar approaches aren’t particularly renowned for their efficiency in any case, and it’s especially hard to justify using that heavy a style of processing on what was until recently pretty ordinary and easy HTML, even if it was cleaned up to XHTML.
If a schema isn’t available, the lucky implementer gets to keep track of which namespace prefixes are in scope at a given point in the document and break down the QName manually. If developers are using an environment that didn’t value prefixes enough to keep them around, they’re out of luck. If the user didn’t bother to declare the namespace, the document is still well-formed XML, but the developer can either guess what it was supposed to be - the equivalent of the earlier duct-tape solution with the dot - or just give up.
There is a simple solution that cleans up the joint and builds a stronger structure on which RDF and other developers can build, however. It does mean discarding the HTML META element’s extensibility through a name attribute, and turning to the very technology that makes this particular use of META so painful: namespaces.
Instead of fiddling with this:
<meta name=”dc:creator” content=”Simon St.Laurent” />
<meta dc:creator=”Simon St.Laurent” />
This will be consistently reported as a meta element with an attribute whose URI is “http://purl.org/dc/elements/1.1/”, whose local name is “creator”, and whose value is “Simon St.Laurent”. All the information needed to create the triple is available, without the need to use tools any more complicated than a namespace-aware XML processor, which is most of them.
You can even go from there to:
Once you start treating meta as a first-class container, you can make much more sophisticated statements, using all of RDF/XML if you really want. It becomes a clean joint between HTML and RDF using core XML mechanisms, capable of supporting far more weight.
The mechanisms for including metadata in the body that are described in the rest of the Note are yet another tangled mess of chewing gum and baling wire that needs repair, but as this blog entry is already far too long, I’ll leave that as an exercise for the reader.
Note: If you’re curious about the gum and baling wire metaphor, see
Ever notice that using markup syntax as designed is a lot easier than extending syntax in ways that aren’t magically supported?