How delightful it is to be confounded, when something that shouldn’t work does work well, and when we are forced to admit the world is not boring, predictable and under control. The success of XML is such a thing, as was the success of HTML. I have come up with a new little theory about the success; maybe it is not new, I forget so much: undoubtedly it is obvious to someone else and has been raised and pooh-poohed before.

Lets imagine we take a database (a set of facts) and a set of human comments on those facts and a set of metadata about the whole thing. We’ll call that our data. Now lets categorise the relation between an information item and another information as:

  • intimate
  • strong
  • moderately strong
  • connected
  • similar

Such categorization is in addition to the labels on the data. Even though these relations are very general, they are enough for a human to clarify a lot of semantics based on the labels. Not nearly as clear as “has a” or “is a” relations, or grouping as bags or sets, or labeling as about or description or topic. Much more fuzzy. Nothing like what goes on with relational data or RDF.

But those categories are just what XML markup provides. An attribute suggests intimiate relationship. A child is strongly related to its parent. A successor element is moderately strongly related to its predecessor. A referenced or linked to element is connected to its parent. Two information items with the same name have some kind of semantic similarity, which increases the more that their context grows. A programmer uses these to glean the meaning of the markup.

Now the relations I suggest may not be perfect; you could improve on them undoubtedly. And they may not be reliable guides, in that there’s nowt as queer as folk and folk make schemas. And the categories do not necessarily correspond to neat or orthogonal logical or linguistic categories. They don’t need to. But surely we can reverse engineer something about how humans work from their artifacts. Are there some kind of quasi-linguistic properties that make XML successful, apart from the obvious reasons of representational power, internationalization power and corporate power?

Of course, there are other ways to look at it: elements/attributes as noun/verb, as substance/accident, and so on. Those operations can also be at work.