The IBM Information Server has a business glossary manager that I am implementing for several clients. Some of those clients have existing data dictionaries and glossaries that will need to be imported into the product. The IBM information server has an XML format to allow you to import/export business glossaries.

There is a lot to talk about in examining this format. There is the good, the bad and the ugly in this format. Before we begin our dissection there are two contextual topics in need of some discussion. First is examining the goals of the format and second is determining whether those goals could have been achieved using existing formats.

At a high-level, the format has three main goals which correspond to its three main elements: represent terms and their definitions (via the term element), categorize terms (via the category element) and add custom attributes to categories or terms (via the attribute element). Except for the metadata extension mechanism (custom attributes), this is a simple way to create and organize a dictionary in XML. When examining the schema or the example of the format it is clear that it is far from a complete standard. For example, the available data types for custom attributes is only String. So, it is clear that this format will evolve. A bigger question is - should it? And should it even have been created in the first place?

There are quite a few formats for capturing glossaries, dictionaries and thesauri in XML. A colleague of mine, Ken Sall, examined this for the government a few years back. The W3C has SKOS, IBM has subject classification in DITA (though DITA is much broader than glossaries), and XML topic maps can also serve this purpose.

So, although we will continue to explore the details of this format and even conversion of some of the others mentioned into this format, what are your thoughts on it?

Until next time, see you in the trenches… - Mike