In the markup world, the jargon is that inline markup is the tags that delimit ranges of text in a document (e.g., Plain Old XML), while out-of-line markup is where the structures and labels are in one place but the subjects of the structures and labels is in other place (e.g., XLinks). Of course, you can have XPaths which drill down to some piece or bundle of information with inline markup, but where there is out-of-line markup there is potentially another XPath that can drill down through the out-of-line markup and end up labelling the same information.

What may not be obvious is that a web system that uses the PRESTO is in effect using URLs that act like XPaths on virtual out-of-line markup. “Virtual” because no actual tree is ever explicated (necessarily): notionally PRESTO uses resolver rewriting.

That good markup practice is to directly markup the information without fluff and tricks and in as pleasant a way as possible is universally acknowledged; and that there are many kinds of information structure where the markup cannot be a neat model of the data such that all elements represent objects of the same analytical importance is also widely known and regretted. (Think of the distinction in XSD between the components (the objects of the schemas) and the tags used for each component, for example. Or the *Pr containers in OOXML. )

A PRESTO URL should give the view in terms of the (conceptual) components, not the specific tags used if the resource is stored as an XML document. And not necessarily every tag, certainly. But every concept (every significant concept) should have a URL, even if there is no representation available or only a pretty crappy one.

So if in PRESTO a URL represents a kind of XPath to a virtual out-of-line markup view of some data, then it is possible to have a virtual schema for that virtual markup: in effect, you could have a schema for the URL. For example, given the virtual schema (as RELAX NG compact syntax here):

  element address {
     element tent { text },
     element oasis  { text },
     element wadi { text },
     element desert { text }
  }

which would allow PRESTO URLs like

   http://www.eg.com/address
   http://www.eg.com/address/tent
   http://www.eg.com/address/oasis
   http://www.eg.com/address/wadi
   http://www.eg.com/address/desert

In PRESTO, these should be available regardless of how the data is stored, because the idea is to model the user’s conceptions. (And if an exact match is not available, to provide the best fit. This certainly creates a task allocation between front-end and back-end systems that may not be workable for some organizations or tasks. No sweat.)

But what about cardinality? Here is a schema more typical of literature:

   element law {
       element title { text}
       element part * {
            element title { text } ,
            ( element p { text } |
              element list {
                  element item  { text } +
              }
            )*
         }
    }

The Xpath for accessing a particular part’s title would be /law/part[2]/title so the PRESTO URLs would need some kind of convention.

In PRESTO we *might* have URLs for

     http://www.eg.com/law/
     http://www.eg.com/law/title
     http://www.eg.com/law/part
     http://www.eg.com/law/part2/title
     http://www.eg.com/law/part2/p3
     http://www.eg.com/law/part2/list4
     http://www.eg.com/law/part2/list3/item4

Now, I am not sure I understand the issues well enough to say which system for indexing is absolutely best. But I think the advantage of http://www.eg.com/law/part2/title over http://www.eg.com/law/part2/title is that it is probably a more common case that your system is interested in /law/part[2]/title rather than all titles of parts /law/part/title. But it is a matter of the particular use case and the consequent virtual schema.

(Another possibility is just to bite the bullet and allow XPath syntax directly in the URLs, with appropriate percent escaping. For example http://www.eg.com/l/law/part%5B2%5D/title. Is this reinventing XPointer? Well, in a way, except that in Xpointer you are locating a file then drilling down according to the actual markup: in PRESTO there information is merely hierarchically accessible according and you are using the Use Case concepts to zero in on the information.)