REST offers a great way to build simple applications that Create, Read, Update, and Delete resources. But what if you want to get at part of a resource?

I’m having a bit too much fun in working with Rails 2.0’s RESTful approach. First, I enjoyed the way it lets applications spew XML without spending forever pondering schemas and agreements. Now, I’m starting to wish for a counterpart to ActiveRecord that works with XML documents instead of relational databases.

I’m not yet nearly enough of a Ruby programmer to build that, but it did get me thinking about some old technology that solves a problem ActiveRecord never has to address.

A database row is a simple thing, even if it enables immense complexity. It contains named fields - no more than one to a given name, usually conforming to a rather predictable schema. There’s nothing floating between the fields, and every row contains the same set of fields. (They may be empty, of course, but they’re clearly defined.

An XML document - or even an XML document fragment - is potentially incredibly complicated. While many XML documents are regular and relatively simple, the ones that aren’t simply holding data as it moves between databases are often very complicated. XML elements are kind of like fields, sure, but:

  • There might be multiple elements with the same name (and rather different content structure);

  • There might be text (not just whitespace either) between the elements;

  • There’s all kinds of metadata in the attributes on those elements;

  • And most techniques for addressing parts of XML documents have at least the possibility of selecting more than one piece in the same document!

Nonetheless, it seems like the basic operations most people would like to perform on these documents (and other loosely-structured resources) are the same operations people want to perform on database records: Create, Read, Update, Delete. CRUD is everywhere, and CRUD is good.

Typically, though, an XML document is treated as a single resource. A book might assemble itself using entities or XIncludes that pull in chapters, of course, and those chapters could be individually addressed as resources, but that has limits. Though it’s possible, I don’t think anyone wants to write paragraphs in which each sentence is included from a separate file using one of those mechanisms. As soon as you hit mixed content, the entity approach breaks down anyway. (Other formats, like JSON, don’t have entities but share a few of the same problems.)

So how can developers build RESTful applications that address parts of documents?

One approach that’s getting a lot of discussion in the last few days is to add a new verb, PATCH, to HTTP. As soon as I started reading about it the hairs on the back of my neck stood up, and red flashing lights and sirens went off in my head. Visions of infinite diff formats danced in my head, triggering an avalanche of memories from the too-many conference sessions I’ve attended on XML diff techniques.

It seems to me that the problem is not that developers want to do something that can’t be expressed with a RESTful verb - in this case, probably UPDATE. The problem is that developers can’t address the resource on which they want to work with sufficient granularity given their current set of tools and agreements.

Though I’ve inveighed against the many many sins of XPointer for years, that incredibly broken process was at least working to solve the problem of addressing XML documents at a very fine granularity, extending the tool most commonly used on the client side for this: fragment identifiers.

There are some glitchy things about fragment identifiers that are great kindling for a flame war. Clients are the only tools that process them, and the server never actually sees them. Perhaps most complicating of all, every MIME type is entitled to its own flavor of fragment identifier. XML can use different syntax (and everything) from HTML, JSON, JPEG, PNG, etc. Combine that with content-negotiation, where a server picks what representation of a document to send a client, and it’s a recipe for endless spin and fruitless arguments.

The only reason that fragment identifiers work is that we typically only use them when we have some certainty what’s going to be coming down the pipe for a given request. For the kinds of situations I’m thinking about, that’s mostly okay - or at least when it breaks, it’ll probably be clear what happened.

So, since fragment identifiers don’t normally get sent to the server, how can we use them to get fragments and only fragments from the server?

The key is a shift that was mentioned in the earliest drafts for XLink (from which XPointer was eventually separated. They offered three different ways to identify a fragment within a URL:

If the XPointer is provided, the designated resource is a “sub-resource” of the containing resource; otherwise the designated resource is the containing resource.

  • If the connector is “#“, this signals an intent that the containing resource is to be fetched as a whole from the host that provides it, and that the XPointer processing to extract the sub-resource is to be performed on the client, that is to say on the same system where the linking element is recognized and processed.
  • If the connector is “?XML-XPTR=“, this signals an intent that the entire locator is to be transmitted to the host providing the resource, and that the host should perform the XPointer processing to extract the sub-resource, and that only the sub-resource should be transmitted to the client.
  • If the connector is “|“, no intent is signaled as to what processing model is to be used to go about accessing the designated resource.

The notion of “subresources” didn’t go over very well, though I think it’s needed. The first of these bullets is the classic client-only fragment identifier approach. The second shifts the fragment identifier into the query string, where the server has access to it. (I believe that’s true of PUT, POST, and DELETE as well as GET, though I couldn’t find much in the way of discussion or examples.) The last was an interesting idea that never went anywhere.

The idea that intrigues me at the moment is putting the fragment identifier into the query string. A URI referencing fragments might presently look like:

http://simonstl.com/book.xml#xpath1(//book/chapter/title)

A client processor that understood the xpath1 scheme (like Mozilla) would currently know that that referenced all the chapter titles in a book (to say it in English.)

To let the server know that something was directed toward that same set of titles, the query string syntax might look like:

http://simonstl.com/book.xml?XML-XPTR=xpath1(//book/chapter/title)

A server might then return just the titles, or perform an operation on those title elements if a verb other than GET was used. (Be very careful with identifiers that reference multiple fragments!)

I have a lot of thinking to do on this, and hopefully eventually some coding, but this seems worthwhile. There’s been some interesting conversation on xml-dev around this, and hopefully some of these nearly 10-year-old ideas can finally get traction.

It may be that we finally need them!