Last week, the Open Archives Initiative (OAI) published a set of beta-stage recommendations for compound documents, called Object Reuse and Exchange (ORE). This set of specifications has been published as version 0.9 and has been released for public review and comments (ironically, the press release is a PDF blob).

The problem of compound documents (how to specify that a set of URI-identified resources together form one compound resource) has been around for a while, and never has been solved properly. There are various proposals from different application areas, such as XLink (not quite for compound documents, but it could be used for this purpose as well), METS (using and extending XLink), and DIDL. I am certainly missing some other technologies here, please let me know what they are. The problem is that none of these languages ever caught on, mostly because none of them tried to be general. XLink focused on navigation, METS on libraries, and DIDL on multimedia.

However, it would be good to have a general and simple language for compound documents. If designed well, it could even be easily extended to be used for application-specific scenarios such as those covered by XLink, METS, and DIDL.

The problem is, OAI-ORE will not be it. Instead of designing a simple data model and a simple language for it, they settled for RDF. None of the documents contains any explanation as to why RDF was chosen over a simpler XML-based model. There even is a document that talks about how to implement OAI-ORE in Atom, and all it does is showing how to embed RDF into Atom. Which means that for processing such an Atom feed you need an Atom toolkit as well as an RDF toolkit. As a side note: the terms in the Atom categories are URIs, which does not really follow Atom’s idea of terms as strings.

Generally, it is disappointing to see that a problem as important and manageable as compound documents, which still is an open problem looking for a good solution, has been approached on the wrong level. It is of course possible to come up with an RDF-based solution for that problem, but this unnecessarily introduces technology layers which for this particular problem are not required.

This means that the quest for a general and XML-based format for compound document descriptions is still on, and OAI-ORE is not a real contender in this race. Well, maybe it still could be one if the abstract data model also got a representation in plain XML. Unfortunately, the model is not as abstract as its name implies, it is a rather concrete definition of an RDF vocabulary, which will make it quite a bit harder to come up with a good and isomorphic XML representation. The effort might be worth it, however, the installed base of XML is significantly bigger than that of RDF.