May 2008 Archives

Kurt Cagle

AddThis Social Bookmark Button

Is Drupal on your IT map yet? Chances are pretty good that either you are shaking your head vigorously in the affirmative, or you have no idea what I’m talking about. Drupal is an open source web content management system … though this is actually a little like saying that a Jaguar is a car; it’s true as far as it goes, but the description doesn’t really do Drupal justice.

Drupal started out in 2000 as a community project in Belgium originally called Druppel. The creator, Dries Buytaert, planned originally on calling it Dorp (which means Village in Dutch), but he introduced a typo when filling out the domain registration, and liked the way that it sounded. The idea behind it was simple - build a CMS system that promoted the concept of community rather than simply being a way to store content. To do this, Drupal was build early on around the idea of content nodes (think of them as very simple documents with a title and body) and the heavy use of syndication.

M. David Peterson

AddThis Social Bookmark Button

Update: Okay, so I really needed to clarify my point regarding Miguel a bit better. Hopefully the updated title more clearly describes what I was attempting to suggest.

Also, as chromatic points out in a follow-up comment below,

I think you may have misspelled “Jim Hugunin”.

Point well taken, chromatic! Jim certainly deserves the credit for bringing dynamic languages to Microsoft, doing so on a foundation of openness. I believe Miguel and friends deserve the credit for breaking down the barriers from an external perspective. Jim, John, and the rest of the dynamic languages crew at Microsoft deserve the credit for picking things up and moving them forward internally.

Thanks for helping to set things straight, chromatic!

[Original Post]
I first met John Lam when I was working as the Technical Evangelist for the Windows CE (1.0) team back in ‘96/’97. I was impressed by him then in the same way I am impressed by him now…

He doesn’t give up. Nor does he give in to political pressure.

John Lam on Software: IronRuby and Rails

Perhaps even more important than all of this technical stuff is what the IronRuby project represents at Microsoft. IronRuby has pioneered a number of new processes that make it easier for other folks at the company to build and release Open Source products. What we learn from building IronRuby will be applied in other product groups to help us become more open and transparent than we have been in the past. We have a great leadership team that is willing to push the envelope on openness and transparency to create a world where both Microsoft and our customers can benefit.

IronRuby represents a beacon of change at MSFT. As does John Lam and the rest of the dynamic languages teams at MSFT.

It’s time we pay better attention to what they are accomplishing.

SIDE NOTE: Question: Could any of this have been possible if it wasn’t for Miguel de Icaza?

/methinks not. Prove me wrong.

Eric Larson

AddThis Social Bookmark Button

At my day job I’ve been working on integration between different systems through creating widgets. There are a few different avenues to take. My metric was the preceived integration with the site, so using the DOM ended up as the best option.

The DOM actually makes for a pretty decent medium for communication. If you force widgets to truly be written in the DOM instead of an iframe, every widget has access to those values that are in the DOM. Sure, you could also used namespaced JavaScript to grab values from other widgets, but then you require those widget’s dependencies. When the DOM is used, you can mock the information the other widgets provide. This allows you the option of rendering the widget on the server or via Javascript. This also allows you to specify initialization details via simple HTML instead of including extra JavaScript.

Likewise, by using the DOM, you can integrate your widget throughout the page. Things like CSS can apply directly to the widget HTML. Also, a widget may not even insert visible HTML in the dom. For example, an annotation widget might look through a DOM element to attach small balloons for notes that others left. None of this is trivial, but assuming the cross domain security issues can be dealt with reasonably, utilizing the DOM as something of a data store has its advantages.

Where this concept becomes even more powerful is in light of the Semantic Web. The Simile project provides a wealth of examples on how to take Semantic Data and present it in a useful way. Eric Miller describes this as creating an experience, which conveys the true value of Semantic Data. The idea of using the DOM in my widgets is to support a certain user experience where different services are used in unison to integrate the value of the systems. As the Semantic Web continues to evolve, the user experiences are where the biggest gains will be seen.

Eric Larson

AddThis Social Bookmark Button

I just read this article on where semweb technology really works well. The context of the article is searching and points out how semantic search excels at essentially performing complex relational queries over the web.

Hopefully, it is clear from some recent discussions of tools like eXist and XML focused data stores that there is a market for generic XML document stores. Conceptually, I think this kind of technology is similar to CouchDB with the difference being there is a wealth of XML libraries to help more complex processing, where CouchDB might be more limited to specific usage where JSON excels. I’m not suggesting either technique is better, but rather the idea of a document store that can be queried in more typical SQL-like contexts seems to be gaining popularity. Taken within the context of RESTful services, I think it is a killer combination.

AddThis Social Bookmark Button

Big news in the ability for XForms to run under non-Firefox browsers. By the end of June there will be a new IBM/webBackplane XForms client library that will allow XForms 1.1 applications to run under IE.

Here is the link to the Google Code web site:

http://groups.google.com/group/ubiquity-xforms

By September they plan support Safari.

The plan is to build a smoother on-ramp to get AJAX/JavaScript people to use the XForms specification. To get a fast start they will be working with some of the Mark Birbeck’s FormsPlayer code base. The team plans to use the newer AJAXSLT XPath libraries from Google.

This is a big development and it is not a coincidence that they are hosting it on Google code. They are hoping to partner with Google in the future to bring desktop-quality applications to the web.

I am sure that we will be hearing more about this in the near future.

AddThis Social Bookmark Button

XForms allows you to load an entire XML database into a client with a single statement. But this is not always a good design decision. Consider the concurrent user access requirements when you design the grain of your locks.

In the past, when we developed with POHF (Plain Old HTML Forms) each web form has a small set of key-value pairs. Developers loaded multiple transforms of the database through middle tier objects into the web form. They then updated the database by reversing these transformations. Developers tended to work at lower levels of the tree (the root being the top of our upside-down tree).

With XForms however, you can easily load an entire database into your client with a single statement. To do this you just add the following line to your XForms model.

<xf:instance src=”path-to-my-xml-database.xml”/>

But just because XForms enables the developer to easily do this does not always make it a good design decision. XForms give the developer great power, but this power needs to be used responsibly and you need to take your multi-user requirements into account when you do this.

Why? Consider the issues surrounding the locking of records to prevent multiple users from overwriting each other’s changes. Most databases provide record locking. With eXist this is just a simple XQuery statement (util:exclusive-lock) that the developer puts on the server when an update form is loaded.

What you need to consider is how to gracefully deal with multiple clients accessing the same data. Ideally the first user gets the data and sets the lock. The second user should be notified that the record is locked, when it was locked (and perhaps by who) and only allow them a read-only access. By the way, turning a form into a read-only form is just a single line of code in XForms also. See http://en.wikibooks.org/wiki/XForms/Read_Only. I have been tempted to put a nudge button on these forms to let the locker know that someone is waiting for them to close the form, but I have not got around to that yet.

I have also created many administrative XForms for giving non-techies the ability to change things like server configurations. They don’t need to use an XML editor and I can set up complex business rules using bind statements that prevent accidental configuration errors. I wish Apache configuration files can with an XForms front-end! In this case I just lock the file on the server and load the entire configuration file into the XForms client. In this case, the locking grain is course.

You still may want to save prior versions of these configuration files and use XML diff tools (which are also bundled with eXist) to see who changed what and when. And you may want to allow users to revert to a prior version with a single click. But most of this work is just a few additional lines of XQuery on the server.

What you find is that all these lock/read/nudge operations can be done simply and elegantly with the combination of using XForms clients, XQuery on the server and REST interfaces.

What is needed now is a unified framework built around XRX to make all theselocking issue something that is can be addressed in a single XQuery module that can be loaded on the server.

Note that you don’t need to lock records as they are being created but you may want to check for duplicate records as the users enter their data. This can easily be done in an as-you-type submission if you don’t mind a little extra bandwidth.

So how have you dealt with locking in the past? Do you have any techniques that you have developed in the POHF/RDBMS era that we can use in the XRX era?

Jeni Tennison

AddThis Social Bookmark Button

I wrote some XSLT the other day that was so neat it made me smile, so I thought I’d share it. It’s an example of how the new <xsl:next-match> instruction and tunnel parameters can combine to simplify your code. Fair warning: this is XSLT 2.0 through-and-through, and the use case here is one you will only care about if you process documents (rather than data), and pretty complex documents at that.

M. David Peterson

AddThis Social Bookmark Button

Solution One: Put phriggin’ AdSense ads on everyones profile page already!

Do you honestly think anyone would care? And if they do, offer them a premium “Don’t put ads on my profile page, damn it!” opt-out plan. WeatherBug does it. (For weather info I can get for free by sticking my head out the window!) Why can’t you?

Oh, and don’t even think about charging me for the right to “tweet”. Imagine a world where the ideas of yester-year to impose an SMTP “stamp” came into fruition.

Actually, nevermind. Forget that. SMTP would have never existed (clarification: SMTP, in the form of sendmail, existed *LONG* before the “email stamp” tax ideas came into existence. So SMTP would have survived. The idiotic politicians who made attempt to implement this plan, on the other hand, wouldn’t have survived.), nor would it continue to exist if such a scheme were imposed.

Of course, we all seem willing to pay our mobile phone providers anywhere from a penny to twenty cents USD to send the equivalent of a tweet to our buddies mobile device. With this in mind, herein lies the other revenue streams you seem to be more than happy to neglect,

Rick Jelliffe

AddThis Social Bookmark Button

Charles Goldfarb’s idea of using grammars to represent documents has proven itself useful in many situations, and the DTD legacy lives on in ISO RELAX NG and W3C XSD. However, there are many structures that regular grammars, as conventionally implemented, cannot cope with. And it is possible to get a certain cart-before-the-horse mentality about grammars, where any structure that cannot be represented by a grammar is regarded as bad ipso facto.

However, we need to be striving towards systems that free us so that what is congenial to the mind is easy to do on the computer.

I was looking at Ant files recently and they provide another good example. Ant files are configuration files for a modern make system, open source through Apache and most associated with Java development. Ant files are mostly a defined set of elements and attributes which you could have a grammar-based schema for quite easily.

But you can extend the elements inline in the document itself. For example, I am working on (updating Christopher Lauret and Willy Ekasalim’s) Ant task for Schematron, to be available as an Ant extension. In Ant, you just need this:

 <target name="test-fileset" description="Test with a Fileset">
    <taskdef name="schematron" classname="com.schematron.ant.SchematronTask"
        classpath="../lib/ant-schematron.jar"/>
  	<schematron schema="../schemas/test.sch" failonerror="true" debugmode="false">
  	  <fileset dir="../xml" includes="*.xml"/>
  	</schematron>
  </target>

Where the taskdef element defines that there is a task called schematron, and this can then be used as an element later.

In Schematron you could validate this by the following:

      <sch:pattern>
          <sch:title>Check allowed elements</sch:title>

          <sch:rule context="target/*[name() =  ancestor::*/taskdef/@name]">
                  <sch:assert  test="true()">
                  The target element may contain user-defined tasks.
                </sch:assert>
          </sch:rule>

          <sch:rule context="target/*" >
             <sch:assert test="self::bunzip2  or self::bzip2 or self::depend or self::javac or ..."
                diagnostics="unknown-name" >
             The target element should only have built-in Ant tasks apart user-defined tasks.
             </sch:assert>
          </sch:rule>

     </sch:pattern>
...

    <sch:diagnostic id="unknown-name" >
               The element <sch:name/> is not one of the built-in types in Ant (at least, as at Ant 1.7.0).
    </sch:diagnostic

Unless I have made a mistake with the XPath what this does is

  • The first rule finds every element that is a child of target for which there is an in-scope taskdef element for that name. In-scope means that any taskdef underneath any ancestor. The assertions in this rule can never fail, and they just filter out properly defined extension elements so that they do not fire the second rule.
  • The second rule, which applies to any other element under target, checks against the full list of the built-in Ant tasks.

That grammars cannot represent this is not just a lost opportunity for better validation: after all, the Ant program itself can generate messages. But it is a real shortfall for documentation: I cannot see one place in the Ant documentation in which all the structural rules are consolidated. I suppose if you are not used to going to a schema first, then you might not miss it, but I think one of the major convenience factors of DTDs, RELAX NG compact syntax, and Schematron can be the convenient and terse collection of structural rules, like a help card for programmers.

I have added a little diagnostic message too: just to let the user know what the unexpected element actually was. It isn’t part of the main assertion so that the assertions are “pure” positive descriptions of what should be.

Now, lets assume you are Vigorous Grammar Fanboy (VGF). You object, why not just have a container element like user-task fo all the points where you want these, along the lines of the CustomXML elements in OOXML where the name of the desired element is effectively in an attribute not the actual element name? First, because it is ugly. Second, because it emphasizes that this is an extension element, which is of interest during setup and then extraneous information afterwards. Third, because then you are messed up with using the element name to determine the contents of the element anyway. And fourth because it is not what the original writers found idiomatic, direct and minimal. Or was that point one again?

But you, the VGF, are not content with that. Oh no, you are relentless, like a killer whale attacking a seal pup on the beach. You say “Err, isn’t this what namespaces are for?” And, indeed, Ant is starting to add support for namespaces which may in time supercede this. My answer: namespaces are difficult for the kind of developer who are making Ant tasks: they are probably not addressing XML problems at all. And namespaces pose more problems for users. In fact, the Ant declaration system is one of binding a local name to a class, and so it is no more prone to name clashing that if namespaces had been used (i.e. conflicts with the same element name are no different from conflicts from the same prefix.)

So a quick comment to developers: if you have used XML for configuration files or other things, and then found that XSD doesn’t have enough power to represent what you have, it is most likely that ISO Schematron can do the job, and do it with clearer diagnostics.

Rick Jelliffe

AddThis Social Bookmark Button

One of the advantages of Schematron is that because the assertion text and diagnostics text are part of the schema, not built into the validator, a user who is distanced from the markup (e.g. by a GUI) can be given diagnostic information in terms of the application domain and even the GUI rather than just in terms of the invisible XML.

But nevertheless very often system-specific details can emerge despite out best efforts, like farts in an elevator and about as desirable.

PRESTO is a set of conventions I have been working on recently, and one of the advantages that recently came out from it is that because the URLs are meaningful to users, it is not so tragic if they emerge into the user’s notice. Just like a good Schematron diagnostic, they give system-level information in terms of how the user thinks, as much as that may be practical, rather than being limited to throwing deployment-contaminated muck at the user.

Rick Jelliffe

AddThis Social Bookmark Button

By popular demand I am reposting this entry. It disappeared mysteriously—Kurt Cagle cruelly suggested it could be something to do with my incompetence, which has the ring of truth unfortunately— so our apologies to readers.

Here is a great quote from William Hazlett’s essay On the Pleasure of Hating, for a change in pace:

The pleasure of hating, like a poisonous mineral, eats into the heart of religion, and turns it to rankling spleen and bigotry; it makes patriotism an excuse for carrying fire, pestilence, and famine into other lands: it leaves to virtue nothing but the spirit of censoriousness, and a narrow, jealous, inquisitorial watchfulness over the actions and motives of others. What have the different sects, creeds, doctrines in religion been but so many pretexts set up for men to wrangle, to quarrel, to tear one another in pieces about , like a target as a mark to shoot at? Does any one suppose that the love of country in an Englishman implies any friendly feeling or disposition to serve another bearing the same name? No, it means only hatred to the French or the inhabitants of any other country that we happen to be at war with for the time. Does the love of virtue denote any wish to discover or amend our own faults? No, but it atones for an obstinate adherence to our own vices by the most virulent intolerance to human frailties. This principle is of a most universal application. It extends to good as well as evil: if it makes us hate folly, it makes us no less dissatisfied with distinguished merit. If it inclines us to resent the wrongs of others, it impels us to be as impatient of their prosperity. We revenge injuries: we repay benefits with ingratitude. Even our strongest partialities and likings soon take this turn. “That which was luscious as locusts, anon becomes bitter as coloquintida;” and love and friendship melt in their own fires. We hate old friends: we hate old books: we hate old opinions; and at last we come to hate ourselves.

M. David Peterson

AddThis Social Bookmark Button

Has anyone noticed the same trend I have? We’re in this weird area of language evolution in which those of us who have been trained to think statically are beginning to envy some of the niceties provided by dynamic languages (such as implicit types) while at the same time those of us who have been trained to think dynamically are beginning to envy some of the niceties provided by static languages (such as explicit types.)

Weird.

Take, for example, C# 3.0 implicitly typed local variables,

M. David Peterson

AddThis Social Bookmark Button

Update: If the embedded YouTube video doesn’t render in your feed reader of choice, you can access it directly via http://www.youtube.com/v/bDsIFspVzfI&hl=en


Kurt Cagle

AddThis Social Bookmark Button

XML is ten years old this year, which by any measure should be treated as a not insignificant milestone. When I started covering the technology as a writer back in late 1997, each article or book that I wrote had to indicate that this was the eXtensible Markup Language (the X was sexier than E, apparently) and that the language in turn was something that could be used to describe documents and possibly other things, as experimentation with the emerging XML parsers began to illustrate.

Edd Dumbill was the key driving force in getting XML.com off the ground for O’Reilly, with the site seen as being the entre into a radical new technology that would likely change the way we make web pages and do a few other things, but the decision to set up such a website was also something of a risk - there were other technologies that were more exciting, and for every person who understood the potential of the language, there were dozens, make that hundreds of otherwise technically competent people who saw XML as being a flash in the pan.

Chris Wallace

AddThis Social Bookmark Button

I think students of information systems design have a tough time compared to other designers, architects and graphic designers say. A budding architect might have lived and worked in dozens of building, visited scores more and seen hundreds before they start to design their own. Information systems designers have no such grounding in examples, their prior experience sometimes confined to being the victim of poor systems.

The web has improved this situation, particularly for interface designers, but it is still hard to see into the guts of an application to learn about database structures and software architecture. Web applications do have the wonderful advantage that they can be explored, and I exploit this feature in my teaching, getting students to take well-known sites like Flickr and del.icio.us and infer, through source viewing and experimentation, the underlying conceptual data model. This works well but only occasionally can we compare our speculations with the real thing. It would be great if more sites had their systems documentation online.

AddThis Social Bookmark Button

I have been using the eXist native XML database with REST interfaces to store metadata for the last two years. It is a great system and I have been encouraged by others to document the benefits. Here is an excerpt.

The Problem

I have been working with a group of Business Analysts (BAs) that were using Visio to document business requirements. Mostly things like use cases and UML diagrams. They asked the question: can you tell me if we are using all the correct approved business terms in our diagrams? Can you create an easy-to-use process to check these against our glossary of approved business terms?

The Solution

Visio has a way to save their documents in XML format. Specifically you can use the “Save As…” function and select the “XML Drawing” file type. Since we use eXist as our metadata store and eXist has a WEBDAV interface it was easy to give them a folder on their desktop they could save their Visio drawings into. Once they did this they just clicked on a URL that ran a simple XQuery on their Visio files. This XQuery (about 15 lines) looked for each text element in the Visio document. The query line looks like this:

for $term in doc($mydoc)//v:Text (: here v is the namespace for Visio :)

The query then uses a standard function supplied with my glossary manager that displays a link to the term if it exists in the registry and red text if it does not exist in the registry. I believe that because I selected eXist and WEBDAV/REST interfaces that the solution was simple and elegant.

So how about your metadata registry? How would you approach this problem? Can you do it in under 15 lines of code and provide a drag-and-drop interface?

Rick Jelliffe

AddThis Social Bookmark Button

Kenneth Chiu from SUNY Binghamton this week sent me a couple of exciting papers on recent techniques for XML parsing. He and his collegues have been looking at paralllelizing parsing of (largish) XML documents to suit multicore processors. One of the papers Parallel XML Parsing Using Meta-DFAs is listed as available on the ACM website and the other Simultaneous Transducers for Data-Parallel XML Parsing seems (I didn’t check) available as part of the large Proceedings 22 IEEE International Parallel and Distributed Processing Conference 2008.

I am always surprised, as I was with the recent work from Simon Fraser University on pipelined parsing techniques, that we are in 2008 and there are still new techniques cropping up.

One thing that struck me about the Chiu papers concerned an application of his parallel ideas outside of the multicore world to the world of interactive text-based XML editors: for example, Topologi’s Markup Editor or other “coloring editors.” These are hardly the glamour end of town, but interesting none the less.

Chiu etc have various ideas based on making new state-machine-like structures that allow all possible parse states of a document to be represented simultanously: these are presented through two different ideas in each paper: the meta-DFA and the simultanous finite transducer (SFT). I defer to those papers for details.

For the application I am thinking of, consider a “<" in the middle of some XML document. In a conventional XML processor, you can just parse using a simple state machine to find whether it is acting as a delimiter (or is it inside a CDATA section, etc.?) In a interactive editor, the state machine has to be much more complicated, because it also has to cope with what happens when there are errors: how do you recover and resynch? In Topologi's case, we register particular status-bar messages with every error transition, so that when the cursor is at an error location, a message is provided. (The next layer of editing above this is to provide also some kind of user interface for fixing the error.)

Now for large interactive documents, memory utilization and snappy response you really don't want to build a parse tree. So various techniques exist. James Clark's RELAX NG mode for emacs uses a check-pointing system, where at regular intervals (e.g. 1000 lines or 1k characters or whatever) the state machine state at that point is available. Jumping to a point only requires the editor to find the last checkpoint and provide a detailed parse only from there to the checkpoint after the current entry.

Making an edit invalidates following checkpoints, but because a state machine is being used, you only need to reparse text until you find the next checkpoint that agrees with your parse: at that point you are in synch. And, in fact, you only re-parse forward when you actually need it, since you don't want trivial edits to force parsing in sections that are way beyond the current display area.

Topologi's Markup Editor takes a slightly different approach, from JEdit and Java's document APIs, where each line has a checkpoint memoizing the current parse state. This reduces the amount of reparsing during editing to a minimum (in fact, we found we had to add delays to the display so that users would be aware that changes had taken place!) But there are the same optimizations: when a change is made, the rest of the lines on the screen are reparsed stopping if there is resynchronization, and an index is kept to the line off the screen to notify where there is some invalidity.

Generally, this works well, but there are a few pathological uses cases. Consider a large XML document (say 100 meg) with no comments. At the beginning the user starts typing a comment, gets as far as the open delimiter, then wants to go to some point late in the document to confirm something. In this case, all the document between the comment open delimiter and the jumped-to location needs to be reparsed, but with no real benefit.

What Chiu et al's work suggests is that you can have a kind of parallelized parse, where each potential delimiter is parsed in all possible states, and each possible transition noted. Think of it as if parsing produced a linked list which associated each delimiter with an array of parse roles and next parse states.

For example, if we had the text <xxx> then our parse list might have a node pointing to the first “<” and a pointer to the following node which points to the “>”. The each node has an array with one entry per mode in the original DFA: what is < when you are in the prolog, what is it when you are inside a tag, what is it inside a CDATA section, what is it inside an attribute value, what is it inside data content, and so on. Then, for each of these entries there is a index to the next mode from the original DFA which is used to index in the array of the next node of the parse list. (To keep things under control, XML parsers usually have a lookahead or peek function, which reduces the number of states: the same technique can be used here.) So in our example, the < in the prolog is a start-tag open delimiter, in which case the next delimiter > should be interpreted using the entry for being inside a start tag. If we started in a comment, then the < delimiter would be just data, and the next delimiter > would be interpreted using the inside-a-comment entry.

(You would presumably use singletons for these, to keep size under control.)

So in our previous example of adding a <!-- delimiter, then reparsing from the top of the document to the new jumped-to location only involves following the parsed delimiters. To get the line-based check-pointing only involves checkpointing the possible transitions at the start of each line together with the transitions that the lead to at the end of the line. (Other optimizations are possible, in particular for resynchronizing.)

So this can reduce the blow out for long documents (though both it and the old method are both linear, so neither suffers from combinatorial explosion). That may be a nice optimization for coloring editors, but they are not something that grabs people’s minds!

What may be more interesting is the idea that you can build an “all-possible parse” DOM on top of this parallel parse list. That has a bit more interest for editors which have, for example, on-the-fly well-formedness and validity feedback. Now I am quite aware that in most cases, a syntax-error prevention mechanism is better UI technique (for many uses): for example if you want to add a comment, you can only put it inside an element, or you have to select some range, but you can never type just an opening comment delimiter.

But for interactive editors, you already have to cope with no-well-formed transitions, so the DFA is already at the level of transition-complexity of the parallelized DFAs in Chiu’s work. What the parallel approach allows is things like saying if the document ends up being non-well-formed, we want to trace back through the transitions from a well-formed result to the nearest point to the editing point, so that WF errors only are shown for the most reduced range possible. For example, if the document was <!-- unterminated comment <x>text</x>, then rather than showing all the document as non-well-formed with an indication that the end-of-document had been reached, it would back track to the last feasible well-formed point, which is the initial < in this case (taking the text !-- unterminated commentas data content when backtracking.)

The difficulty of interactive editing of XML is that errors are identified where they are found, not necessarily where they are caused. These borrowed-from-parallel techniques perhaps could allow a different approach.

AddThis Social Bookmark Button

XRX is a new web development architecture that is a milestone in elegant simplicity. XRX stands for:

XForms on the client
REST interfaces
and XQuery on the server

Because XRX uses a single model for data (XML) it avoids the translation complexity of other architectures. The simplicity and elegance of XRX allows developers to focus on other value-added features of web application development and enables non-programmers to create a rich web interaction experience without the need to use procedural programming languages.

Our Request: An Open Mind

Before you begin …take a deep breath. Some of the thoughts expressed in this article may be unsettling, especially for those of you, like me, who have invested years into the learning of procedural programming languages like Java and JavaScript. Those skills may become obsolete if trends, predicted in this article, happen.

This begins a series of articles to describe how one can become an early adopter of this innovative technology. As organizations become more aware, the ability to quickly build rich-client web applications will spread beyond programmers to less technical audiences thus empowering a new class of web application developers. As you proceed, we ask you keep an open mind about how emerging technology will affect IT as well as business end-users.

Use Case for Real Estate Forms

For the last five years, developers have queried their peers “Can we create rich web applications using only XML technologies.” In January 2007, Kurt Cagle encouraged me to use XForms with an open source native XML database/web-server called eXist. EXist developers selected an innovative architecture where every XQuery is directly callable from a REST interface which is exactly what XForms applications need to directly send and receive data to the database. Kurt’s suggestion came at a very opportune time. I was working on a project with real-estate transactions that had many associated complex real-estate forms. Traditional methods required approximately 40 inserts into separate tables within a relational database. The use of XForms and eXist resulted in one line of XQuery code:

store(collection, file, data)

That was it. Simple. Elegant.

I was hooked. After spending over 20 years building applications with a variety of procedural languages I found my preferred architecture. I have seen the power of XForms and eXist and can’t conceive of returning to my procedural programming ways. It is my hope, that I can convey to you my excitement about this architecture.

This is not the first time an attempt to use a non-translation architecture has been made. In the late ’90s tens of millions of dollars funded object-oriented database initiatives with the hope that objects on fat-clients or middle tiers could be stored a queried without translation. However, for all the promises that object-oriented databases made, they lacked standard interfaces and query languages. Further, IT strategists could not overcome their proprietary system lock-in fear. As web-clients expanded, object-oriented databases soon became niche products for specific industries.

It has only in the last year that the combination of XForms, REST and XQuery has piqued interest of application architects trying to optimize software development lifecycles. XRX promises to not only change the role of the software developer but also the role of Subject Matter Experts (SMEs) and Business Analysts.

Proof of Architecture: FireFox and eXist

XForms, REST, and XQuery has matured in an environment that not dominated by a single vendor or product. XRX did not originate in the labs of Silicon Valley which seems to favor traditional brute-force procedural languages like Java and JavaScript. It was rather, championed by a group international software developer’s lead by German, Wolfgang Meier.

This collaboration of developers from IBM, Xerox, Novell and other organizations started by building an impressive XForms extension to FireFox. As people combined these disparate systems with REST interfaces, the overall architectural benefits began to emerge. One should not think that XRX is a mature technology. To date, there are no fully integrated development environments for XRX model and due to vendor and browser support issues; integrated development tools will be slow in coming. However, if you believe that superior application architecture will trump vendor-locking strategies, you should closely examine XRX, even in its current form.

XRX represents the confluence of mature declarative client architecture in XForms and the ability of persistence engines to easily store and query XML datasets. The term declarative identifies XForms as a set of XML elements that tell a client “what” the functionality of an interface is, and leaves the “how” to a standardized software system. With XRX, a single line of XML can declare your desired functionality and allows graphical tools to manipulate these blocks of code resulting in non-programmer tools.

The Translation Pain Chain

To understand the elegant simplicity of XRX, look at the problem of English language translation. Select any passage from any book and enter it into a translation program such as Google Translate. Perform a translation from English to Spanish and from Spanish to German. Then reverse the process by translating the German to Spanish and the Spanish back to the original English. The result will have little resemblance to the original text and will require manual cleanup.

Here is a roundtrip for-step translation using Google Translate of the Gettysburg Address from English to Spanish to German to Spanish and back into English:

Score six fifty-six years ago our fathers came to this continent a new nation, conceived in liberty and dedicated to the idea that all men are created equal.

We are now in the midst of a great civil war, testing whether that nation or any nation so conceived and so dedicated, can long endure. We have mounted a major battlefield in this war. We have come to dedicate a portion of this area and as a final resting place for those who here gave their lives that that nation might live. It is entirely appropriate and proper that we should do this.

But in a broader sense, we can not dedicate-we can not consecrate, we can not on this sacred ground. The brave men, living and dead, have fought enshrined here, far above our poor power to add or subtract. The World little note nor long remember what we say here, but can never forget what they did here. It gives us life and that is not a case pending struggled here, have so far progressed so noble. Rather us to be here dedicated to the great task before us-that from these honored dead we take increased devotion to that cause was, during the last full devotion that we here highly resolve that these deaths are not in vain - that this nation under God, shall have a new birth of freedom and that government of the people by the people and for the people not perish from the earth.

Now compare this process with what web application developers are doing today in a three-tier stack using Java or .Net systems. Each time one writes a web application using standard HTML forms, those key-value pairs in the form must be converted to a set of middle tier objects using an object-oriented language. When the objects are in memory they are translated from the object type libraries to a set of tabular data streams that use the database type libraries and then inserted into the correct order in one or more relational database tables. When a user wants to view or update the data, he/she must gather the data from all of the tables, put it into objects and then translate back to a set of attribute-value pairs and displayed in a web form.

The Disruptive Change of Elegant Simplicity

If you have studied advanced math and physics, you are mostly likely familiar with Maxwell’s Equations. James Clerk Maxwell discovered four simple elegant mathematical equations to describe the relationship between electricity and magnetism. Prior to Maxwell’s discovery, the fields of electricity and magnetism were considered separate where each used disparate complex mathematics to show their relationship. Maxwell demonstrated that by looking at problems from a new perspective that many pages of equations can be represented in four simple and elegant equations that can be printed on a T-shirts in a science museum gift shop.

We believe that XRX will do for web development what Maxwells equations did for the study of electricity and magnetism. Briefly stated:

XForms+REST+XQuery = XRX = High ROI for Web Developers

XRX gives developers the luxury of using the same data selection language (XPath) on both the client and server. The same expressions can be used in your MVC bind on the client and in Schematron data validation rules on the server. This however, is not the motivation for migrating to XRX. Declarative techniques that use XML structures tend to accelerate the creation of domain-specific languages (DSLs). DSLs are easier to manage with forms and graphical user interfaces which makes them more useable by SME’s and BA’s. XRX is the front runner in the declarative revolution and the forces empowering non-programmers. This is not to say that XRX will not have opposition. Vendors selling operating-specific client APIs or SQL products will resist XRX technologies for the foreseeable future. An entire community of AJAX developers has grown up around the lack of declarative technologies in our browsers. But in the long term these opponents will be required to compete against a simpler and superior architecture. Future articles will explore the hidden benefits of the XRX architecture and the challenges XRX presents to large-scale application developers.

Summary

In the past, the ability to create rich-client web applications was limited to small groups of highly trained and motivated application developers proficient in procedural scripting languages. XRX and declarative programming will expand this community to include a much larger audience, and with it a shift in power will occur. However, in most organizations this will occur only if IT leadership is interested in empowering business units to solve technically challenging problems and create high-quality user experiences. XRX evangelists are needed to break down the walls between IT and the business. We hope this and future articles will be useful as a tool and as a guide for the faithful.

Edd Dumbill

AddThis Social Bookmark Button

A recent proclamation from the W3C’s Technical Architecture Group recommends against using XRIs as identifiers.

Hold up a second, XRIs?

Unless you’ve been paying extra-special attention you won’t have heard of these little critters much anyway. XRIs are “Extensible Resource Identifiers”, and are detailed in specifications that find their home at OASIS.

XRIs are essentially fancy-pants URIs that support extra features such as decentralized allocation, federation and delegation. And as such they are pretty complex machinery. Here’s a few example XRIs I pulled out of one of the specs:

xri://(tel:+1-201-555-0123)!1234
xri://@!a!b*(mailto:jd@example.com)*e/f
xri://@!a!b!(@!1!2!3)*e/f

I offer these without any real clue as to what they mean.

XRIs recently resurfaced on my agenda since reading the OpenID 2.0 specifications, whose service discovery documents include XML namespaces such as xri://$xrds and xri://$xrd*($v*2.0).

To the casual observer, it looks as though URIs have been given a work-over according to the old adage that nothing in computer science cannot be improved by adding a layer of indirection.

XRIs have risen to common attention again through being factored into the OpenID specifications, due to the desire to keep supporting inames.

The W3C’s point of view is that everything XRIs attempt to do can already be done with URIs (or their internationalized incarnation, IRIs).

I’m inclined to agree. There’s still more than enough confusion as to what a URI means, without adding more machinery. Perhaps in 5 years we might have attained widespread knowledge about using URIs — we may think REST has “made it”, but its dissemination isn’t that far and wide yet.

The web I know and love is a collection of small pieces, loosely joined. XRIs by contrast are heavy metal.

Call me old-fashioned, but the combination of URIs, DNS, and HTTP are all the web I can handle right now.

Rick Jelliffe

AddThis Social Bookmark Button

JFK’s line after the Bay of Pigs that Victory has a thousand fathers, but defeat is an orphan has a less adversarial and more useful popular version Success has a thousand fathers, failure is an orphan, and that is what came to my mind when thinking about the Microsoft announcements on first-class support for ODF, direct involvement in the OASIS process, and extending the OSP license to ODF.

It is a great opportunity for hatchets to be buried, and I endorse everything that Patrick Durusau has written this week on it: “Not With A Bang, But With A Whimper” Ending the document format war that never was. Microsoft adopts OpenDocument Format. (and also see Dr Durusau’s “Divorce, Trust and Microsoft” Immediate steps towards building trust with Microsoft in the OpenDocument community.) Alex Brown also has an item Microsoft Moves to Support ODF Standard that I concur with too. (For background, Dr Durusau’s comments hearkens back to Dr Brown’s OpenXML vs ODF in SC34: The Phoney War.)

Pandora

A year ago, I wrote a blog called Fantasy Press Releases which I called for MS to support standards out-of-the-box, as many people did. It looks like we will get this before another year is out. Excellent, excellent. I don’t know why they don’t do the PDF support earlier though: surely if it is just a matter of packaging up the existing plug-ins there should be no problems? I cannot see any convincing reason not to support IS29500 Compatibility in the Service Pack 2 either. It would be good for everyone if they put out an early version of SP2 ASAP with the PDF support in it, under some kind of beta scheme. (One thing that I have learned about MS is that it does take about three years to go from plan to execution: this was of course a reason why support for ODF in Office 12 was unreasonable, it was at the wrong stage in their development cycle. [A report has said they are skipping from Office 12 to Office 14: pfah] )

Almost a year ago, in Remembering George the Animal Steele! Why the Open Source community should support an ISO Office Open XML standard (or, at least, not oppose it!) I wrote:

In my view, the drivers for ODF will continue unabated even after/if Open XML becomes a standard.

So, in my jaded view, ODF will not make Office go away, ISO ODF will not make Ecma Open XML go away, and ISO Open XML will not make ISO ODF go away. So I see no downside in Open XML becoming an ISO standard: it ropes Microsoft into a more open development process, it forces them to document their formats to a degree they have not been accustomed to (indeed, the most satisfactory aspect of the process at ISO has been the amount of attention and review that Open XML has been given), and it gives us in the standards movement the thing that we have been calling for for decades (see my blog last week that compared what Slashdotters were calling for in 2004 with the path that MS has taken).

I think this is what we are seeing. The people who saw OOXML as being some kind of defense against ODF (whether they were on the anti-MS side or the MS side) were wrong. The thing that makes MS support ODF is market demand: significant users saying “We want to use ODF”. It is this positive demand, not emotional anti-IS29500 rhetoric, that is prevailing.

To try to put it again, there is a supply for standards and a demand for standards: adding IS29500 to the standards that can be supplied does not alter the dynamics and drivers for the demand of ODF. In fact, in the long run it increases the demand, because the file format information is out in the open, relatively unencumbered, and there will be many governments who will take the line “We know that ODF 1.0 was not not complete enough, and we know that ODF 1.1 is better, and that ODF 1.2 is looking very good: it is reasonable for us to anticipate that ODF 1.2 will be generally adequate for our requirements with the extra IS29500 input, and that we can start working towards ODF 1.2 by encouraging ODF 1.1 use.”

(On the issue of why MS will support ODF 1.1 not ISO ODF 1.0, the people to ask are the OASIS ODF TC: why didn’t they do their correct maintenance and submit ODF 1.1 to ISO? It would be paradoxical if MS participation energizes the ODF TC to treat ISO as something more than a rubber stamp!)

Goodbye to all that

The decision to broaden the OSP license to ODF (which is no surprise, this is something that participation in the OASIS group would require) does bring up an interesting point. During the OOXML discussions, there was frequent FUD that ODF was preferable to OOXML because OOXML may have IP problems: one problem I had with that was that surely if there were patents applying to techniques of implement office applications held by MS, these would apply just as much to ODF implementations as OOXML implementations? In fact, more so, because the OSP applied to the OOXML. The same issue is true vice versa: Sun’s equivalent to the OSP for its IP in OpenOffice applies to ODF but AFAIK not to OOXML implementations. (When you get to particular media formats that are outside the scope of the standards, there is a different argument, of course: the two shouldn’t be conflated.)

OASIS

Finally there is the issue of MS joining the OASIS ODF TC. I have argued fairly consistently about the benefits of having MS at the table, and I think we owe Patrick Durusau a really good amount of honour here, for demonstrating that it is possible for self-motivated technical experts, who are above the marketing fray and who open themselves to criticism by refusing to budge from their vision despite partisan attack, to have moved the OASIS ODF TC to a point where MS thinks there is some point in participating in it.

However, frankly, I have my doubts. While I welcome the move, my regular readers will know I that I think partisan participation in standards bodies (i.e. where one mob actively blocks the technical requirements of another mob on the grounds “I don’t want to advantage my competitors”) is untenable for a standards body. That there is a significant danger that this attitude will prevail can be seen from the response of (my fanboys) the ODF Alliance Marino Marcich with its talk of “governments will continue to adopt a ‘buyer beware’ attitude” and so on. It will be a challenge for companies who have made “open” a codeword for “anti-Microsoft” to figure out a new marketing position: but where you get “open” people running public conferences on openness under Chatham House secrecy rule and sending emails threatening legal consequences to committee experts if they dare not follow the corporate line, I don’t have high expectations. The word “openness” has become like the “war on terror”: don’t look at the details or what is actually being done too closely!

Will leopards who have made their livelihood pouncing on MS every time it admits or reveals a problem over the last year change their spots: will they learn to have a pragmatic and cooperative attitude where the outcome of a good standard is more important than scoring marketing points along the way? We shall see. I have my hopes and doubts.

What about the dangly bits?

My other reservation about MS’s announcement does have a resonant spike with something else in Marcich’s reported comment ODF Alliance managing director Marino Marcich said the proof of Microsoft’s commitment to openness would be whether ODF support is on a par with Open XML.

We know that ODF 1.1 does not do everything that OOXML can support. So when it is the default format, what happens to the extras? There are a couple of possibilities. Office could just throw them away. That will frustrate users, who expect documents to open the same as when they close, and you would expect that users will be savvy enough to save in whichever format round-trips adequately. Office could embed foreign elements into the ODF: this is of course what ODF allows, but it then it will freak out people who apply the “embrace and extend” hammer to every issue. Or it could add OOXML files into the ODF ZIP file, with dual formatting, along the lines I raised in Can a file be ODF and OOOXML at the same time?.

Lets take a concrete example. As far as I know, ODF has no equivalent to OOXML’s Smart Art feature. Smart Art is one of the those features which makes old-time SGML-ers say “At last, after 20 years, this is the kind of thing we have been talking about” and represents IMHO the most radical innovation in structured GUI design in the last 20 years (given that there have been no real advances in structured editors since 1988’s SoftQuad Author/Editor.) What Smart Art does is allow a list to be edited structurally in a simple nesting list editor, then styled into scores of different diagram types: Venn diagrams, circular lists, all sorts of things. In the old SGML days, this is the kind of thing we would do by transforming from an SGML structure into a troff pic script, for example, but with much slicker graphics.

I have found Smart Art really is a great advance for productivity, and maintainability (not having to keep the graphics files in a separate format for the drawing application), and it is something that I wish there were Open Source equivalents. Now if it is so good, why isn’t it on the ODF radar (and I trust readers will correct me if I have missed it!)? When saving out to a format that does not support SmartArt, Office currently converts it to a graphic, but tries to incorporate metadata or extra information to allow “rehydration” (which is MS’ buzzword for when you roundtrip data through a less-capable format with embedded extras which allow reconstruction of the original format.)

SmartArt is addictive. If it is lost by going through a different application or format that does not support it, or maintained as a graphic, you are liable to replace the graphic with another SmartArt graphic when you re-open the file, with steely annoyance.

One thing about the DIS29500 debates that observers have found perplexing has been the idea that a 6000 page standard has too much information. I don’t know that many people realized that in some cases (and I am not saying this is the only issue) it was a code for “Our product cannot match your feature list” which itself has several sub-issues (”We don’t want for our products to march to MS’ drum”, “We can only get interoperability by limiting features to a common subset”, “Our development procedures are too chaotic to have any goalposts other than adding one level of features to what we already have”, and so on.) SmartArt is definitely in this category.

So I would see Smart Art (under whatever guise) as a touchstone issue for seeing how well MS’ participation in the OASIS ODF TC takes them towards real convergence. I certainly expect that there are many issues such as the formula issue Dr Durusau raised earlier that will benefit quite fast. In fact, just as having IS29500 is helping ODF, I think MS participation in OASIS ODF will also help improve IS29500.

As I said, Smart Art is a really important advance in (QUASIWYG) editing of structured information and it shifts the text/graphic barrier in a really interesting and useful way: AFAIK it is not on the list for ODF 1.2, and it will be interesting to see whether the ODF process can handle innovations that come from the MS side. I think the deafening chorus from users especially governments throughout the DIS29500 discussions that IS29500 was acceptable only because it could help towards convergence is something that the ODF old blood may need to take stock of: Microsoft seems to be taking it seriously, yikes!

Do the right thing

Developers/standarizers on both sides need to be whacked on their heady heads with a mackeral that Not Invented Here is not acceptable. I think people accept that until now there have been reasonable excuses: that Office could not implement ODF before it existed, that Office could not use ODF as its default format until ODF had even minimal features and completeness, that OpenFormula could be syntactically incompatible with everyone else’s spreadsheet syntax, that ODF’s graphics could cherry pick SVG without really providing actual SVG compatibility (SVG Tiny please?), and so on. (Actually, I don’t mean NIH in the sense that there absolutely cannot be multiple syntaxes or technologies for the same thing if there is some historical reason or feature difference, I am primarily talking about rejecting features merely because of their provenance.) The state of the schemas for DIS 29500 mark 1 and ODF 1.0 just reveal their level of maturity and production-level adoption, and there is nothing wrong with being an adolescent. ODF and OOXML will grow up, and they need the partisan spirit and the NIH attitude to be kept under control to do so.

But it remains to be seen whether the OASIS ODF TC can sustain MS participation. I have written before that where there is direct participation in a standards body by rivals who take an uncooperative stance, it is difficult for the work to go ahead without it becoming a ganging-up exercise. (See Is our idea of open standards good enough? for more on this.) If MS proposes things to support Office better in ODF, and Sun and IBM don’t want to have to support those things, what happens: if it were the W3C, with direct member voting, you could expect MS to be rollled and eventually go away out of frustration/pique. The ISO model is one of direct membership for technical work, but indirect membership for final votes (i.e. it is the National Bodies which vote, not corporations or other stakeholders), and that creates a different dynamic that can produce a fairer result.

With all that said, Happy Father’s Day to the many people who have gotten us this far: I think it is positive news!

M. David Peterson

AddThis Social Bookmark Button

If You Don’t Need XML, Then Don’t Use It! - O’Reilly XML Blog

I think now would be a really good time for anyone who doesn’t “get” XPath to start getting it.

Kurt Cagle

AddThis Social Bookmark Button

We are, all of us, guilty of a common conceit - the idea that the web is a unique product of the late twentieth and early twenty-first century and those people who worked hard to create the Internet and the structures on it. Yet every so often we’re reminded that the idea of information management goes back a long way, with pioneers, some lauded and some, unfortunately, forgotten, who blazed the trail. Among the latter was a gentleman few in this day and age have heard of, yet who discovered both the power and problems of large information spaces nearly a century before the advent of the Web.

This article was forwarded to a colleague, and then on to me, but I think there is some remarkable insight to be gained from the story of Paul Otlet: The Forgotten Forefather of Information Architecture.

Kurt Cagle

AddThis Social Bookmark Button

As an editor, it’s all to easy to spend a lot of time reading (and responding to) blogs and articles on the web, and as the editor for xml.com, I find my time is disproportionately allocated to trying to correct misperceptions and even hostile snipings about XML. One such post came yesterday, in Jeff Atwood’s excellent Coding Horrors blog, one that (my comments in this article not withstanding) is a must read for anyone who programs for a living.

In The Angle Bracket Tax , Jeff lays out what he perceives as the flaws of XML, and why they detract from the language. I could attempt to refute him point by point, but in the main, he’s pretty much spot on with regards to his particular complaints - XML is a verbose, complex, oft-misused language that has been pushed into areas where it has no business belonging (more or less) and there are far better formats for doing everything from describing data structures to writing processes to sending messages. Yet he also misses a major, perhaps even a fundamental, point -XML is not a programming language.

Kurt Cagle

AddThis Social Bookmark Button

While a remarkable amount of both ink and electronic bandwidth have been expended upon the use of XML in the data realm, there are times where it is necessary to step back for a bit and look at what and where XML is being used today. One thing that becomes obvious when studying the XML landscape is that a significant amount of XML is still being used for purposes of describing narrative, for telling a story, advising people in the use of a product, structuring reports, and doing other things that focus more on documents than they do on data.

In some respects, this is not all that surprising. In general, when you’re dealing with data-centric applications, XML isn’t always the best choice for working with structured content, and indeed there are times where XML is perhaps the worst, most hideously inefficient mechanism for dealing with data. However, the use of XML as a means of writing and marking up narrative has become the standard means of encoding structured content in most organizations. That doesn’t mean that XML is dominant in most organizations for “unstructured” content - that distinction is still very much in favor of Microsoft Word, with XML occupying a considerably inferior position there - but for organizations that recognize the benefit of structured content, XML languages such as DITA and DocBook are very quickly becoming the standard for storing information.

I had a chance to see that principle at work this week at the DocTrain conference in Vancouver, British Columbia. Conference chairman Scott Abel (CEO of The Content Wrangler ) graciously invited me to the conference and I had the chance to talk with a number of people working with technical documentation, online content creation and related material, and overall it opened up my eyes fairly dramatically to the hyper-accelerated world of content management a decade after the introduction of XML.

M. David Peterson

AddThis Social Bookmark Button

i had the honor of meeting Guido Sohne when I attended the Microsoft Technology Summit this last March. I feel both honored and privileged to suggest that we quickly became friends.

Guido was a good friend. Guido was a good person. I only wish I had more time to get to know him better.

He will be sorely missed.

Newsvine - Rest in Peace, Guido Sohne

Erik Wilde

AddThis Social Bookmark Button

I guess I am suffering from *OA overload. Trying to understand landscapes which are still developing and where religious and partisan statements happen frequently can be hard. Or maybe it’s impossible. I am seeing too many *OAs these days. This is probably a subset of what’s around:

  • SOA: Service Oriented Architecture
  • EOA: Event Oriented Architecture (sometime also called EDA for Event Driven Architecture)
  • WOA: Web Oriented Architecture
  • ROA: Resource (or REST) Oriented Architecture
  • SynOA: Syndication Oriented Architecture

And of course all these *OAs are not independent. WOA = SOA + WWW + REST and similar equations can be found in a number of forums. Eveybody is fighting for his/her favorite *OA. Why not build a crawler that collects all these statements, then feeds them into Matlab, and symbolically computes the true *OA landscape?

Of course, IT is a huge market and as usual it’s all about branding and owning a brand and hopping on the right brandwagon, and it is not a mystery where all of this comes from. Sometimes I just think it would be so nice if all these *OAs were actually well-defined methodologies and we could spend less time trying to understand what the latest *OA is all about and if and how it actually introduces something new, and spend more time actually comparing these well-defined methodologies and their strengths and weaknesses.

My personal bet is on SynOA, one of the less popular *OAs. Atom (and more so AtomPub) needs a little overhaul to be less focused on time, the Atom landscape needs a couple more features such as feed query features, but then Web-style syndication is ready to become the main abstraction for information dissemination in loosely coupled systems. I have no idea how SynOA relates to ROA or WOA, but to me it looks like the right thing to do if you are thinking about Web-style cooperation, which of course is not the only possible scenario where *OAs are needed.

Anybody wants to submit his/her favorite *OA equation so that we can figure the *OA landscape out automatically?

Kurt Cagle

AddThis Social Bookmark Button

Balisage is probably not a term on everyone’s tongue. Its original usage comes from the Navy - for a ship to travel “balisage” means that they are using special dimmed lights for navigation while in enemy territory, a term also known as Silent Running. It has, however, acquired a second meaning more appropriate to computer science in general and XML in particular. Balisage is the use of XML to enable document processing without “giving away” data to a proprietary application’s format. Balisage in this sense is somewhat edgy and subversive, striking at the boundaries where Open Source and Open Standards meet to form Open Data.

It’s perhaps appropriate then that the former Extreme XML conference, long known as the hardest core of XML moots, should take on the name of one of the central tenets of the Open Data movement. Balisage brings together some of the foremost minds in the areas of content management, semantics and ontology, information processing, application development and security to explore how best to build on the shape of this emerging technology. The shift in name also reflects a broader shift going on in the field, as people realize that while XML is core to most of what they are discussing, it is what is being done with XML (and with the harmonics of that activity) that is becoming most important, not the format itself.

Rick Jelliffe

AddThis Social Bookmark Button

I had thought that we all would be buckling down to productive work by now, but I see that there still is some attention being paid to the idea that there is a correlation between OOXML Yes votes and the corruption index. The most recent form of this (which I consider plays to racist views) replaces the corruption index with GDP (and then mentions that the corruption index is correlated to the GDP anyway, wink wink nudge nudge.)

So I thought readers might be interested in seeing a quick graph, in which the final national body votes are displayed against the per capita GDP. It is all a bit crude graphically: the horizontal axis gives 87 national bodies, and the vertical axis gives the $ per capita GDP from the World Bank figures. The green triangle gives the data points for each NB. On the 100,000 line above each green triangle is a little icon showing whether the national body voted Accept (blue square), Abstain (red diamond) or Reject (yellow triangle).

(Please ignore the icons down on the horizontal bar, that is just my general ineptness. Sorry it is PDF, I couldn’t figure out how to export the long diagram any other way.)

To quickly check the distribution, lets see how the numbers are distributed when we divide into four quarters.

Quarter Accept Abstain Reject
1 (highest GDP) 16 4 1
2 15 6 2
3 16 2 4
4 15 4 3

So I would like to make the bold interpretation: national bodies did not vote on per capita lines. A curve could be fitted to describe the relationship, but it has no explanatory power either.

What did they vote on? I have an equally daring view: all sorts of reasons, but mainly just the boring technical ones. Technical people discussed and judged. Standards people reviewed the technical people’s judgments and made their judgments.

But that is no fun. Lets entertain the idea that voting was on some subterranean basis, what could it be? I’d say that the more that a national body came from a English-speaking “peripheral” country (e.g. not UK or USA) the more chance that it would not vote “accept”. And the more chance that a national body came from a socialist (or previously non-aligned) government (China, India, Cuba, Venezuala) the more chance it would vote “Reject”. And perhaps the more that a NB came from a member of Bush’s Coallition of the Willing, the less chance the NB would vote “accept”: it is a funny place for frustrations with having to follow the American lead to emerge, but I think the international mood for independence is very strong, and perhaps it affects the way people see even these standards issues.

However, the way to get standards that are less US dominated, is for non-US people to participate in the various technical groups that are developing these standards: OASIS ODF TC including OpenFormula TC, Ecma TC45, ISO SC34, the ISO work on PDF, etc.

Rick Jelliffe

AddThis Social Bookmark Button

The comments period for the XML 1.0 fifth edition revision finished last Friday 16th May. I didn’t make a submission, in part because I felt I have had a good run in the past and my concerns are pretty well known and unchanged.

In XML 1.0, we went strongly against accepted wisdom which held 1) that the future was Unicode so you didn’t need to support existing encodings, 2) that the present was beautifully layered so one standard shouldn’t try to overcome the deficiencies in others, and 3) that we should all live in a Standards Fantasyland (on the map near Boogie Wonderland) where even if the world had gone one way that didn’t agree with what the existing standards said, we should follow the standard. A complete triumph of engineering (systematizing what works) over schematising (insisting on the right way to do things).

So for 1) the XML encoding header allows multiple encodings. Now, ten years later, we are finally reaching the stage where UTF-8 for web pages has exceeded ASCII and 8879/Windows encoded pages (Unicode wrangler Mark Davis, now with Google but for a long time with IBM, recently released some figures on this), so it may indeed be coming closer to the time when XML can be simplified so as to only support UTF-* encodings: I doubt it will have any demand because it is handy, free (everyone has large transcoder libraries) and doesn’t get in anyone’s way.

For 2) the example is that XML adopted what we now call IRIs for System identifiers in entities: it took IETF almost a decade to catch up and formalize this, surely a record for any standard. “Internet time” are you kidding? XML deliberately didn’t use the official URL syntax, but opted for the approach that it was better to have the software shield the user from the details of delimiting. I think there are very few advocates of XML simplification who would be prepared to go using vanilla URL syntax. But now 10 years later, entities are fast disappearing (mind you, just this week I had a seminar where there were surprisingly many questions on trying to use entities schemas) and the IRI spec is out. Namespaces and XLink should be using IRIs now, but there is an underlying problem that character-by-character comparison of IRIs is not robust unless they are canonicalized.

For 3) the example was again the XML header specifying the encoding header, despite the information supposedly being available in the HTTP MIME headers. But the standards got it wrong: the person who creates a file is not the person who sets the HTTP MIME header, in effect. Now 10 years later the relative reduction in the number of encodings in widespread use does make encoding sniffing a much more workable approach, but still too fallible and time-wasting for mission critical data.

In XML 1.1, engineering won again. The decision was made to open up the naming rules from XML 1.0 to remove a dependency on versions of Unicode. However, because this meant in turn that XML 1.1 processors would not as reliably detect encoding errors (when you see “encoding error” think “database corruption” or “spurious data” or “spurious rejected documents”) the treatment of the C1 range of control characters (0×80-FF in IS8859-* encodings) was clarified to be non-well-formed (with special treatment for IBM’s NEL character). Control characters have no place in markup, as confirmed by Unicode Technical Reports and as emphasized recently by the OOXML BRM which required MS to change a couple of places where some control characters could be entered even though harmlessly delimited. I was startled during the OOXML debates how strongly this was held to be a vital, core part of the XML story from all sides.

XML 1.1 was an enormous flopperoony, for the unsurprising reason that if you put version="1.1" then an XML 1.0 processor would spit the dummy. Some people have tried to claim that it failed because previously well-formed 1.0 documents that had C1 controls in them became non-WF. I have never seen such a document in the last decade, nor have I ever had any credible reports of one, and I can see no cases where putting C1 control characters in a document would be legitimate practice, so I think it is just bluffing: there has always been a wing of users of XML whose life would be easier if they could embed raw binary into XML and they deserve no sympathy or help.

So along comes XML 1.0 (fifth edition) as a draft. It has only a couple of changes of significance. The first is that it finally puts in place a rudimentary versioning system: E10 allows an XML 1.0 processor to parse an XML 1.x document on the understanding that it only reports things in terms of XML 1.0 rules and capabilities.

The second change then makes a mockery of the first. It introduces the lax naming rules from XML 1.1. Now such a change is not required for any reason, because XML 1.1 exists and could be used. So rather than go into a well-managed regime where documents are well-labelled, and XML minor versions chug along, XML 1.0 draft fifth edition just allows a new XML 1.0 parser to accept documents that all the other old XML 1.0 parsers will reject: and remember this is not because of previous bad practice being more consistently exposed, but because some innocent person has created a document with the new name characters and the XML 1.0 processors deployed in the last decade reject it.

Basically, the W3C XML WG is saying that if you get a document that breaks in this way, it is the receiver’s problem. The sender can say “But it is well-formed against the latest version of XML 1.0″ and the XML WG washes their hands. It is the triumph of bad engineering practice, of doing what can be guaranteed to fail, of putting the responsibility on the wrong person. It will cause problems first for the nominal beneficiaries of these extra name characters (since they will be unreliable) and second for people using non-UTF-8 encodings who won’t get as many WF errors. So who will benefit: the makers of standards who will have less housekeeping. They are not an unworthy set of stakeholders.

The W3C XML WG needs to revise the goals of XML (in s 1.1) to accomodate these changes. In particular

6. XML documents should be human-legible and reasonably clear.

no longer holds. The new rules allow a blank check, so you could have a document entirely made with element and attribute names from code points which have never even been allocated a character by Unicode. With the fifth edition, the goal becomes

6. XML documents may be human-legible and reasonably clear.

And the goal 5. needs changing

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero

because in effect support for these new naming characters becomes an optional feature: does your XML 1.0 parser support editions 1-4 or edition 5?

I didn’t write a comment to the W3C XML WG because nothing has changed over the last 10 years that makes the decisions in XML 1.0 and in XML 1.1 inappropriate. I don’t have any new information that changes anything, and the XML WG certainly has produced none. All that is needed is for the fifth edition to fix up the minor versioning issue, and then we could all transition to 1.1 on an as-needs basis. This minor-versioning fix is already at least five years overdue: fixing it opens the door for XML 1.1 to have a snowflake’s hope and will allow a better transition to XML 1.2 potentially including some other overdue changes (building in xml:id, namespaces, etc.)

To summarize: XML 1.0 (fifth edition) is bad from a standardization and engineering viewpoint, betrays the goals of XML 1.0 which have served well for the last decade, and may hurt the end-users it is intended to support. It sets up a workable versioning mechanism then fails to use it for a significant change. It provides a good foundation for workable minor versioning, then ignores the foundation and builds on sand with its allowing of incompatible names.

I may be wrong, but it looks like a hack to me. However, fortunately it barely impacts anyone in the West, including me nowadays, so who cares? Interoperability, schminteroparibility! Unambiguous labelling of data formats, gedoudahere!

I am not trying to suggest the W3C XML WG is doing this because they prefer to sit by some giddy swimming pool in their floral-printed bathing costumes sipping umbrella-ed beverages, that they clear their desk by making incompatibility problems someone else’s problem, or any laziness! But I think they at least owe it to explain why they are doing a substantive minor version change as an edition change, failing to use the edition mechanism they are setting up at the same time which would allow people who needed this feature to access an already-existing minor version!

AddThis Social Bookmark Button

If you are interested in seeing how XForms can be used as a development environment, I would suggest you check out the new Orbeon XForms Builder:

http://www.orbeon.com/forms/builder

This is a great example of “Eating your own dogfood” where a development tool is used to build other development tools.

Orbeon is a great organization because their forms products run not just inside FireFox using the XForms extension but on any web browser. They do this by running on the web server and translating the specification of the XForms application into HTML and JavaScript. So you can start your development with FireFox and deploy when you need IE support.

We want to contrast the Orbeon approach with the traditional “build yet-another Eclipse extension”. With the Orbeon solution you don’t need to download any client, no custom installers and all your forms can be stored on a central file server, versioned and shared. XForms can be your future IDE.

So my hats of to the guys at Orbeon for this great milestone. Hopefully we will have more XForms-tools-to-build-XForms in the near future…like a metadata registry (hint hint).

M. David Peterson

AddThis Social Bookmark Button

As I pointed out in my post regarding Norm Walsh leaving Sun to join Mark Logic,

And lastly, if the back channel rumor mills are correct, my guess is that this isn’t the last big-name XML luminary we’ll see moving over to Mark Logic. Time will tell… ;-) I’ll write a new entry related to this topic if/when it seems appropriate to do so.

It seems my “sources” were spot on,

Erik Wilde

AddThis Social Bookmark Button

The recently published HTML 5 draft does not change anything regarding HTML fragment identifiers. They are still limited to IDs only (with <a name=”"> as alternative for backwards-compatibility). This means that any reference into an HTML page depends on how the page is using IDs.

But wouldn’t HTML 5 be a wonderful opportunity to bring a little bit more hypermedia back to the Web? XML had XLink and XPointer. Both were failures for a number of reasons, but I am still a big fan of trying to make the Web more hypermedia-like. So why not learn from XPointer and try to give HTML 5 a more practical and useful set of fragment identification methods than just IDs?

The whole fragment identification idea is a classic chicken and egg problem. Why use them when they’re not supported? Why support them when they’re not used? We had a lot remarks like that when we worked on fragment identifiers for plain text files, but I still believe it is good to have mechanisms like that. Assume Firefox had a feature where you just moused over a paragraph, right-clicked, and then you could send an email with a pointer to that paragraph. If the receiver had Firefox, the browser would scroll to and highlight that paragraph. I am still convinced a lot of people would find such a feature pretty useful. And things would not break in another browser, users would simply not get the scroll/highlight behavior.

M. David Peterson

AddThis Social Bookmark Button

In a follow-up conversation to the post made by Dimitre Novatchev, Jesper Tverskov provides an excellent summary as to why XML and JSON are incompatible. He goes on to describe several functions in XSLT he feels would help alleviate at least some of the pain, but to keep things focused on understanding the incompatibilities between XML and JSON, I’ve left them out. When XSL-List archives todays conversations, I’ll update this post with a link to his comlete post.

However, before I provide his summary, I want to quickly provide some of my own thoughts on the XML -> JSON <- XML discussion which I provided in follow-up to a comment made by Robert Korberg,

On Sun, 18 May 2008 08:04:26 -0600, Robert Koberg wrote:

> Bottom line - there is no standardization. If you want to do xml2json
> and json2xml you pick your library and write for it.

I agree. Furthermore I believe the notion of converting XML and JSON to and from each other is the wrong approach altogether. Instead I believe the emphasis should be upon creating a standard format for JSON in which can be referenced and queried by XPath in such a way that regardless of whether the incoming format is XML or JSON, the same XPath applied to both will result in the same generalized result set.

In fact, if not mistaken, this is what Mike Champion and friends have been discussing over in the MSFT camp for several years now, which makes sense when you look at what they’re doing with LINQ-to-*.

Jesper’s summary follows in-line below,

Jeni Tennison

AddThis Social Bookmark Button

Markup design fascinates me. What is it that makes one format easier to use than another? Why, even within that subset of markup that uses XML syntax, are some markup languages elegant and others unreadable? When is it best to use XML, when YAML, when a custom format?

Not all XML is created equal, and I think the biggest distinction between a good markup language and a bad one comes down to whether the XML was designed as a markup language or whether it’s a serialisation of a completely different model. Practically all the XML serialisations that I’ve seen of object-oriented models, or relational models, or graph models, have been dreadful as markup languages.

Michael C. Daconta

AddThis Social Bookmark Button

I’m a developer who frequently advises senior management on technical policy so I routinely keep one foot in techie-land and one foot in business-land. This position keeps me “squinting” at both and “smelling” both to insure that technology is built right so it can be applied right.

By now, you all know that Jeff Atwood wrote a controversial article calling XML a “tax”. The very title of the article is spin and a one-sided developer’s perspective. Why one-sided? He is not considering the business value of XML. By trying to squeak out a tiny bit of time-saving and save a few characters per line he would be throwing away a robust standard that offers application-independent data, a simple syntax, and a wealth of tool support. From a business perspective - stability and reliability beat your minimal efficiencies. Wake up and stop thinking like a techie! We easily forget, and I have been guilty of this many times, that technology serves business and not the other way around. YAML? Are you kidding me? Computers are fast and disk storage is cheap so I really am not interested in such minor efficiencies. Using indentation to distinguish hierarchy in data? That is very risky! This is not a programming language like Python that is guaranteed to be run through a compiler. In the data you must have explicit declaration of syntax NOT implicit. Again, if you are thinking like a developer your brain says “hey, I can save those angle brackets and not have to type them” or “I can save that closing tag so I don’t ‘waste’ my precious time.” Please … those angle brackets and that closing tag make the data demarcations explicit. That is more important than a developer saving five keystrokes per line. So, I don’t know whether Jeff is a senior developer or not, but he needs some more exposure to the business side of the house. Developers re-breathing their own air is dangerous.

Before I go - kudos to Norm Walsh and Eric Larson for their excellent commentaries on this same subject. We have different perspectives on the issue but some of the points are similar.

The key point here is that stability and reliability (like the explicit syntax of angle brackets) over minor developer efficiencies wins every time in the minds of business executives. And they are the ones who pay the bills! Violate this principle and you, or your business, will lose. As an interesting corollary for developers as the customer, here is an interesting ZDNet article on how Microsoft may be losing the trust of developers due to lack of stability in its development platform. In this case the developers are the customers and they have to choose a development platform. If you keep switching positions because the new group of developers Microsoft hired wants to do it differently for “small efficiencies”, you lose the trust of your customer base who get tired of the tail wagging the dog. The perfect example of what not to do is the Ribbon interface in Office 2007 - I just witnessed a large government agency with many, many employees delay pushing out Office 2007 because Microsoft’s office developers threw out years of people’s knowledge of the menus and layout in Office 2003 - and for what? No gain from a business perspective that’s for sure because creating word documents or slide presentations is not rocket science. So if Word 2007 reduces worker productivity by any appreciable percentage that is a direct hit to the bottom line.

So, small developer efficiencies must not derail hard-won business value. Application-independent data with explicit syntax that everyone is familiar with (think HTML) is a gold-mine of business value. For any developer to be ambivalent about that is downright naive. Sorry if that offends some, but developers develop to achieve business value and not technical value. The days of “I saved 8 key strokes in my data”, like saving “8 bytes in 64k of memory” are over.

Eric Larson

AddThis Social Bookmark Button

My wife is a rather amazing woman. In addition to being an amazing song writer, performer and guitarist, she is insanely smart, with a heart to help. This summer she will be helping out Girls Rock Camp. Austin also has a great program called GirlStart that aims to help girls get excited about math and science. What is so great about programs like Girls Rock Camp and GirlStart is that it provides young girls a chance to be feel empowered. What excites me most about these kinds of programs is the opportunity for perspective.

My wife and I play music together and have throughout our entire marriage. I can’t tell you how much I have learned from the experience. The biggest benefit is the constant exposure to a women’s perspective. I’m convinced that men and women are simply different in how problems are approached. If you’ve ever worked on a problem for a while, only to talk to a coworker that provides an entirely different approach that is more elegant and better in all measurable ways, then you get an idea of what it is like to have a women’s perspective. It is something totally different from your own, and the vast majority of the time it is enlightening.

I recently read an article on why girls should consider a career in IT. The author mentions programming as art and it made me think about my own experience creating something artistic with my wife. I’m confident that without the perspective my wife brings to the music, the songs and band as a whole would be nothing compared to what it is now. I can only imagine what the software landscape would look like if a women’s perspective were more prevalent.

Assuming there is a wave of ladies infusing this male dominated software industry, us guy should get prepared. The first step is losing the derogatory comments. Anytime you speak of a person as though they are an object or less than a person, you are training yourself for failure because you are refusing to step up to the challenge of acknowledging equality. The next step is recognizing when you are allowing a person’s gender or differences influence how you evaluate contributions. Becoming attentive to when your internal dialog includes derogatory comments can be scary. On a personal level, realizing that you have bigoted thoughts is a tough pill to swallow. But, if you are to move beyond your own insecurities, you must be in control of your views. Lastly, practice really listening. Learning to listen is a valuable asset, and what’s more, it places the attention on what is being said instead of who is saying it.

Hopefully in the near future more women will find their way into the software industry. I truly believe it could revolutionize how software is written in addition to helping change society for the better.

M. David Peterson

AddThis Social Bookmark Button

Dimitre Novatchev recently posted the following to XSL-List, something of which I thought would be of both interest and benefit to those of you in Land-O-XML who care about things kinds of things. As such,

Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video

Building on its influential predecessor chicagocrime.org, EveryBlock takes the local-data mashup to new levels. Founder and hacker Adrian Holovaty talks about the philosophy and technology behind EveryBlock, the untapped potential of address-specific news, open data, and life after Google Maps.



Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video

Geoff Zeiss (Autodesk, Inc.)–Convergence is about breaking down islands of information based on traditional disciplines or professional categories or those created by the traditional organization of the architecture, engineering, construction, transportation, and utility and telecommunications industries. The convergence of architectural and engineering design, location, and 3D visualization and simulation technologies developed is resulting in a framework for interoperability across the lifecycle of building and infrastructure including design, construction, and operation and maintenance.

The business drivers for this transformative technology advance are productivity and efficiency in the construction and facilities management industry, and improving the performance of facilities over their full life-cycle. The goal is seamless access to architectural, engineering design, and geospatial data inside, outside, and under a facility.



Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video

Paul Torrens (Arizona State University)–Ambient crowds are the new distributed computing platform. Smart mobs are fashioning new architectures for social networking. Armed with cell phones and mobile gaming devices, they are the new business model for location-based services. Seditious crowds are creating havoc in urban theaters of war and at global economic forums. Crowds of shoppers, endowed with smart chip credit cards and RFID tagged merchandise are trailed by long-lasting data shadows that follow them ubiquitously.

Embedded in urban infrastructure and in the very products we consume, new technologies are emerging to enable cities to think about—and process—the people that pulse through them, with a burgeoning code-space being developed to capture the actions and interactions of individuals within large dynamic crowds. This presentation will focus on our recent research work in developing models of crowd behavior and their application to theory-building and scenario evaluation in the contexts just described.

We have developed a reusable modeling platform for constructing large simulations of individual and collective behavior in dense urban environments. The simulations are developed with individual agents, equipped with geospatial AI that allows them to perceive and react to their evolving surroundings with an incredible level of behavioral realism. These agents are also capable of social and antisocial interactions. The simulation architecture is coupled to Geographic Information Systems, allowing for a suite of geospatial analytics and data-mining to be performed, across a wide array of scenarios. Moreover, the models have been developed as realistic 4D immersive environments with unprecedented levels of graphical realism.

From O’Reilly Where 2.0, San Jose, CA, Tuesday, May 29th, 2007.



Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video
James Greiner, Senior Vice President and General Manager, MapQuest, Inc. In preparation for Where 2.0, MapQuest conducted an ethnography study. The massive survey polled users on what they want from location-based services, mapping sites, and in mobile. It should be a very informative look into the desires of the people (many) our apps are made for. From O’Reilly Where 2.0, San Jose, CA, Tuesday, May 29th, 2007.
Kurt Cagle

AddThis Social Bookmark Button

O’Reilly Video
Since Google first presented a snapshot of the geoweb at last year’s Where 2.0, it has considerably evolved: more Geo data is published on the web, KML was accepted as an OGC standard and is adopted by a growing number of tools. Join John Hanke, Director of Google Earth & Maps to hear the latest on the evolution of the Geoweb and Google’s effort to organize it and make it universally accessible and useful. In this video from the O’Reilly 2008 Where 2.0 conference, John Hanke demonstrates the latest in Google geo development with Jack Dangemond of ESRI.
Eric Larson

AddThis Social Bookmark Button

Today I took some time to quickly scan through a backlog in my feed reader. There were a good number anti-XML articles cropping up. This got me thinking. What do you think of when I say “XML”? I personally associate XML as a baseline technology in a large set of tools used for describing data. For example, I think of Atom and XHTML within the scope of RESTful web services. Next up would be document formats such as DITA and DocBook. This starts me thinking about linking data and technologies such as XInclude and XPointer. As I reflect on where my mind wonders when thinking about XML, themes of linked data and document resources quickly rise to the top. What does not come to mind is WSDL, XML Schema, object serialization, configuration files, or SOAP.

What do you think of when I say XML? What kind of context does XML succeed and where does it fail?

M. David Peterson

AddThis Social Bookmark Button

Update: len trumps his own QOTD with these two gems. I’ll let you decide which you feel is funnier/more accurate, cuz’ I can’t decide,

I try to tell him that no element is *really* non-terminating but he gets wrapped up in the language abstraction and forgets to breathe.

.. or ..

We made this a lot harder than it has to be in the name of “just in case.”

[Original Post]
Is it really that taxing… - O’Reilly XML Blog

Most of the time when I find a programmer struggling with XML, they are a relational database programmer or an object-oriented programmer, or both. We should have lined these guys up against the wall at the beginning of the revolution, really.

NOTE: I met Jeff Atwood for the first time a month or two back. Nice guy. Obviously an OO-trained programmer. But a nice guy, none-the-less. ;-)

M. David Peterson

AddThis Social Bookmark Button

Brain.Save() - We are pleased to bring you new features in .NET 3.5 SP1

Syndication OM for the Atom Publishing Protocol. We added strongly-typed OM for all of the constructs defined in the Atom Publishing Protocol specification (like ServiceDocument and Workspaces) and put them in the System.ServiceModel.Syndication namespace.

M. David Peterson

AddThis Social Bookmark Button

So as Jeff Barr recently pointed out over on the Amazon Web Services blog,

Amazon Web Services Blog: Redundant Disk Storage Across Multiple EC2

M_david_preparing_for_ec2_persisten
XML Hacker M. David Peterson has put together a really interesting article.

As part of his work at 3rd and Urban, he has implemented redundant, fault-tolerant, read-write disk storage on Amazon EC2 using a number of open source tools and applications including LVM, DRBD, NFS, Heartbeat, and VTUN.

Mark notes that "the primary focus of this paper is to present both a detailed overview
as well as a working code base that will enable you to begin designing,
building, testing, and deploying your EC2-based applications using a
generalized persistent storage foundation, doing so today in both lieu
of and in preparation for release of Amazon Web Services offering in
this same space."

The article provides complete implementation details and links to source code for the scripts that Mark developed.

You can read the article, and you can also follow progress via the discussion group.

– Jeff;

Firstly, and most importantly, as pointed out in the first portion of this article,

Eric Larson

AddThis Social Bookmark Button

Jeff Atwood mentions the Angle Bracket Tax and not surprisingly, I don’t agree. XML can be difficult and painful at times, but I think the reasons are not entirely technical. Recently, I had the opportunity to work with XML in Java and it was definitely “taxing”. Even though the process was frustrating, it really had little to do with XML. The biggest pain was actually Java.

After working in Python/Ruby for a good portion of time, declaring types, long CamelCase variable/class names and overly complex Object Oriented patterns feel painful. I’m not much of a Java hacker, so most problems could be chalked up to my lack of experience in the language. While a better understanding of Java would have been helpful, in learning XML and C# I had a very similar (and frustrating) experience, which makes me believe it is not necessarily the XML. After working with XML in Python (and Ruby to a lesser extent), it is clear that the real problem is not an “angle bracket” tax. In fact, I would argue that I got a “tax return” by understanding concepts like DOM, which I became exposed to through JavaScript, rather than Python.

It is fine to think XML is “hard” and as I said before, it can be frustrating. But to consider XML the source of frustration probably is not considering all the factors. Jeff’s blog generally focuses on programming and human factors and this is a great example of a human factor. If you expect the static typing, wasteful OO patterns and require IDE support, then you have accepted the struggles as normal. When you then are forced to deal with XML, the common links to OO ideals and patterns don’t match, leading you to the conclusion that XML is hard. The people who breeze through XML and enjoy the technology are simply those who have invested a little time in learning basic tools. Of course, these people also have set their own expectations regarding XML, but therein lies the secret.

Programmers are supposed to be logical people who make decisions using reason. The reality is programmers are people with irrational feelings and emotions that impact decision making. There is a good chance your criticisms of XML are rooted in the thousands of blog posts saying XML sucks. There is an even better chance you promote tools such as YAML without ever actually having used it heavily. Opinions obviously have a place in software, but so does logic and reason. The next time you deal with XML, take a minute and try to learn something new before complaining. When I recently worked with Java, I went ahead and put aside my frustrations for a bit to get a good handle on Ant. Lo and behold, it was interesting and I learned something new that helped improve my perspective on Java. If you take the time to learn XML basics with an objective mindset, you might still say it sucks. On the other hand, you might realize it is not so bad.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve just caught up with this document from W3C which fills in a big gap in English-language technical material. Japanese typesetting technology has been very influential in the other Ideographic countries, and they share many commonalities (e.g. Japanese ruby text and Taiwanese bopomofo.) There is a Japanese standard JIS X 4051, but it has no translation available: though parts of it, usually called the kinsoku rules, are floating around in material from vendors, particularly Adobe’s Ken Lunde and some MS material.

By and large, Chinese and Korean have different details (e.g. different characters) but the same analysis applies.

One term that the W3C draft uses but does not define is kihonhanmen; readers getting held up by this could substitute underlying grid (or text block or even constant width frame) for this.

AddThis Social Bookmark Button

Hello all.

My name is Griffin Caprio and I’m a new blogger here on xml.com. Apologies are in order to Kurt, my fearless editor, since this is my first post and I actually came into the O’Reilly fold back in March. I am very excited to blog here, as the O’Reilly xml community is at the forefront of XML evangelism.

I’ll be blogging mostly about semantic technologies. However, this will not be just another Semantic Web blog. Most of the talk surrounding semantics is centered on the Semantic Web and it’s potential to usher in a new era of interactive / integrated web applications. Personally, I’m more interested in using semantic technologies to tackle the oceans of data companies are amassing internally. Some people call that Business Intelligence (BI) and some call it Data Mining (DM). I’ll try and stay away from those types of product categorizations and concentrate on the pragmatic application of semantics.

As is standard practice here at xml.com, please feel free to leave me comments and / or suggestions if there is something you think I should know. We welcome all feedback!

Michael C. Daconta

AddThis Social Bookmark Button

A few months ago I wrote an article for Government Computer News on the battle over Rich Internet Applications. At that time, I thought it was odd that the other major contenders, Silverlight and Flex, use XML and JavaFX does not. I wonder, if in the rush to push something out the door, Sun forgot about separation of concerns and the benefits of skill specialization to quality production. I see the trend towards declarative User Interfaces as a good thing - the proper domain of graphic designers - so, why did Sun seemingly take a step backwards? If you are a JavaFX guru, I am interested in understanding this. I found a good simple way to compare these techniques was the bubblemark site which programs a simple animation in all three (and many more) variants. What are your thoughts on XAML versus MXML versus JavaFX? In looking at the bubblemark application I still feel that MXML is the cleanest. This definitely will require some more looking in to… in the meantime, see you in the trenches. - Mike

Rick Jelliffe

AddThis Social Bookmark Button

A couple of years back I had a very surprising experience with a junior programmer, who had just joined our team. I had asked him to work on some code until there were no more JUnit errors. A few hours later he proudly showed there were no errors, and explained it was easier than he expected because he just commented out the tests! Then he paused, regarded my startled expression for a few seconds and quickly blushed deeply. Doh!

Poor old Alex Brown has been in and out of favour with the extreme anti-OOXML-ists (perhaps I should use a new acronym, such EAOOXMLista, to say for the hundred thousandth time that not every anti-OOXML person is extreme?) over the last few weeks. First, he didn’t somehow stop the DIS29500 BRM somehow (exactly how?) from doing its job. So he is bad. Then he works with SC34 to organize getting more improvements made to OOXML and ODF. Again, bad. Then he says ““The question behind the question, for a lot of the current OOXML debate, seems to be: can Microsoft really be trusted to behave? We shall see” which earned him the quote of the day on ConsortiumInfo. So presumably he is good.

Then he does a smoke test of validation conformance of Office and the various OOXMLs, and reported the validation errors he found. So he is deemed good. Now he has validated various versions of Open Office and ODF and reported the validation errors he found. And that makes him the devil again.

Unless there is some tussle between evil twins going on, I’d like to suggest that Alex is just trying to faithfuly fulfill his normal committee responsibilities, which include checking through standards. Alex has long been involved in Data Quality issues for publishing professionally, and has been very involved in the development of ISO DSDL at SC34 (which includes RELAX NG and Schematron.)

So what is it that Alex found about ODF that has caused the fuss? It is quite technical, but the gist is this, as I understand it: if a schema is not itself valid, no documents can be formally valid against it.

(When the invalid part of the schema is only detected at run-time when exercised by a particular instance document structure, and the document does not contain such a triggering instance, the implementation may report that the document is valid, but that is a false positive. And you make look at the schema and say “I know what was intended, and the false positive is in fact correct against the intent of the schema” but this is lucky accident, i.e. hacking, not formal validity.)

The particular issue is quite interesting because it relates to an area in a W3C Schema standard where the user requirements for XSD could not be supported by the facet model used, and where XSD fudges it. OASIS RELAX NG, also to an extent inherited this problem.

The problem is with attributes of type ID in the ODF schema. Alex Brown has provided a very simple fix, which I hope gets adopted into ODF 1.2.

The problem with IDs is this. XML inherits ID type attributes from SGML. They have various constraints, which include that they are XML names (tokens), that their values are unique within the document, and that an element can only have one ID attribute.

When XSD came to make its datatyping the XSD WG made a nice theoretical distinction between lexical space and value space: these are entirely context-free distinctions, which relate only the atomic values of the individual pieces of text. XSD also provided another mechanism to declare that certain data values should be unique. But the constraints that an ID attribute value must be document-unique and that an element may only have a single ID attribute are left out in the cold by this model, and are not directly in the XSD specs. Blink and you’ll miss them, there is a little handwaving going on but it is a good pragmatic workaround: the spec references the XML specification; that these non-facet constraints on IDs are intended is made explicit in the (non-normative) Primer which forms Part 0 of the spec:

the scope of an ID is fixed to be the whole document.

and, more importantly, the XSD Structures Spec Part 1 specifies the ID/IDREF table as part of the PSVI.

ODF uses RELAX NG, and ISO RELAX NG specifically allows (s. 9.3.8 data and value pattern) datatyping to validate using more than just the atomic string:

services may make use of the context of a string. For example, a datatype representing a QName would use the namespace map.

(This seems to be a difference from the original OASIS RELAX NG, which AFACS started with a more atomic view of datatypes. )

So when an ODF schema says an attribute is an ID type, we expect for full validation it will have all the XSD/XML semantics, and that for full validation of the schema conflicts would be pointed out. If you don’t want these semantics, you just use the base type xs:ncName which has the lexical and value space but adds none of the other constraints.

So we come to the concrete problem that a couple of content models allow wildcarded attributes in any namespace, and many of the attributes in the namespaces in question have ID attributes. So the argument (which you can follow on Alex Brown and Rob Weir’s blog) is what class of error this should be: all the implementations of RELAX NG and Alex say this makes the schema invalid (in ISO Schematron I specifically included definitions for a “good schema” and a “correct schema” as well as a “valid schema” in order to make these nuances clearer); Rob thinks it shouldn’t be an error (”thinks” is too weak a term) and seems to think it should only be an error if a element actually has two ID attributes. I think this is also legitimate possible approach that the standards could take (but they don’t.).

Alex has found the fix for ODF, but I think RELAX NG and XSD could well have some extra clarifaction text (non-normative) to stop basic mistakes. If a schema, whether DTD, XSD or RELAX NG, says something is an ID, it has all the semantics of an XML ID.

So what was the point about the programmer turning off tests to make some code fault-free? That is Rob Weir’s suggestion on how to make the ODF documents valid: turn off ID testing! Brilliant! So what is the point of ODF 1.0 making these things IDs in the first place if that was not the intended semantics?

I suspect this is actually another example of where it would have been more satisfactory all around to have these constraints in Schematron. For example, not use ID type but xs:ncName (this is not real code, but to give the idea…you’d use a regex and this assumes a consistent naming convention in ODF and sub-vocabularies wrt attribute naming):

<sch:rule context="whatever">
   <sch:report role="duplicate-ids"  test="count( @*[ends-with(name(), 'id']) &gt; 1">
    There should not be more than one attribute called  id.
   </sch:report>
</sch:rule>

This seems to give the intended constraint against duplication, but makes it a run-time instance-driven problem, not a static schema error. Another assertion would handle uniqueness.

So my take: Alex is right that the schema has a flaw, and right to point it out and offer a fix; Rob is right that it is unnecessary for this to be a static error (which is the positive point I would infer from his over-reacting blog), but wrong that the way to fix it is to turn off validating that constraint.

Michael C. Daconta

AddThis Social Bookmark Button

The IBM Information Server has a business glossary manager that I am implementing for several clients. Some of those clients have existing data dictionaries and glossaries that will need to be imported into the product. The IBM information server has an XML format to allow you to import/export business glossaries.

There is a lot to talk about in examining this format. There is the good, the bad and the ugly in this format. Before we begin our dissection there are two contextual topics in need of some discussion. First is examining the goals of the format and second is determining whether those goals could have been achieved using existing formats.

At a high-level, the format has three main goals which correspond to its three main elements: represent terms and their definitions (via the term element), categorize terms (via the category element) and add custom attributes to categories or terms (via the attribute element). Except for the metadata extension mechanism (custom attributes), this is a simple way to create and organize a dictionary in XML. When examining the schema or the example of the format it is clear that it is far from a complete standard. For example, the available data types for custom attributes is only String. So, it is clear that this format will evolve. A bigger question is - should it? And should it even have been created in the first place?

There are quite a few formats for capturing glossaries, dictionaries and thesauri in XML. A colleague of mine, Ken Sall, examined this for the government a few years back. The W3C has SKOS, IBM has subject classification in DITA (though DITA is much broader than glossaries), and XML topic maps can also serve this purpose.

So, although we will continue to explore the details of this format and even conversion of some of the others mentioned into this format, what are your thoughts on it?

Until next time, see you in the trenches… - Mike

Rick Jelliffe

AddThis Social Bookmark Button

Now that ODF and OOXML are both set to be on the ISO/IEC books, it is useful to consider what the next productive steps are.

For genuine ODF Supporters who are concerned that ODF has languished a little out of the limelight during 2007, there are a lot of useful things to be done. You don’t even need to join the OASIS groups or your local National Body or SC34 to begin.

I suggest here are some things that will help the ODF effort coming into ODF 1.2.

  • Lobby the component standards groups, notably the W3C, to have official RELAX NG schemas available. Without schemas, there is no validation, and without validation there is no conformance testing, and without conformance testing there is no interoperability. (Or, at least, it becomes significantly more difficult in each case.) I believe SMIL is an example of this. If possible, actually have the schema ready and waiting, to make it easy: you will feel more of an achievement to have part of the standard that you can say “I contributed that”!
  • In a similar vein, lobby the component standards groups to harmonize their standards with ODF. SVG is the one in particular that seems needed. It would be great if not only would W3C SVG group add the few missing attributes and so on, but perhaps also make a profile of SVG to match ODF better (this is not a concrete suggestion, just something whose usefullness could be checked up by someone wanting to get involved.
  • Speaking of SVG, some open source XSLT transforms for going from ODF’s “SVG” to standard SVG would be good.
  • Join in the KOffice and Open Office efforts, especially in areas that effect you or for which you have expertise. Maths is a good area, for example.
  • Check through the IS29500 spec that are of interest, when it comes out, and figure out whether they are things that are decorations (which can be handled merely by foreign elements in ODF) with the current ODF behaviour an adequate fallback, or features that are currently unsupported in ODF, that will need attention. Share your results with the SC34 committee and with the OASIS and ECMA committees.
  • Patrick Durusau has made a request that he thinks the area of checking how well some of the detailed descriptions of formula functions in IS29500 accords with the reality of Office as currently implemented, would be really helpful. This would help both IS29500 get improved and provide better information for IS26300.
  • Join in a conformance testing group: make up test documents. Ideally a test library will have some tests that test one thing per document, which makes a very large number of documents, and others that test cascaded errors. So I wonder if algorithmically generating test documents from schemas is viable.
  • Get you National Body to submit more Defect Reports, so that SC34 does not lose impetus. Remembering that when something becomes a standard, maintenance becomes a community job not “their” job.

Of course, if you were not interested in being constructive but in trying to frustrate yourself there are other things you could do. You could for example, mount a court action asking for something that you know to be impossible (e.g. withdraw a vote on a ballot that has been closed), with reason that you know won’t stand up (e.g. that a committee of long-term experts changes it vote after being satisfied that there have been enough changes to proceed with a standard), with odd legal ground (if the voluntary standards group is not subject to administrative law, not being under the government), and where you know that your standards body’s final vote is a credible one (because it was shared by more than an absolute majority of other National Bodies around the world.) Why would someone do that, my readers might be asking themselves? Embarrassment? Sour grapes? Vindictiveness? Marketing?

I certainly hope that national standards bodies will stand by their committee members and provide financial support during court cases, for time and expenses the private individuals will be dragged away from their work. This kind of intimidation, to use courts and the threat of legal action to force a result after you have lost the technical argument, should be seen for what it is.

Now please, I am not saying that I have confidence in every NBs votes. While I believe that every NB acted intra vires and therefore legal overturns are futile, I was not pleased with the Norwegian national vote (for just the same reason as I was not pleased that several NBs voted for ODF bypassing their technical committees too) and the Brazillan vote (after an IBM representative blogged that he had convinced them that if they had *any* outstanding technical problems they should vote no: if he is true, the NB secretariat should have picked this up in committee and told its members that perfection is not a requirement for a standard.) But, I don’t see them as acting outside their powers.

And, most importantly, it is a different class of problem to have a standard accepted than to have it blocked.

National and international standards bodies are highly aware that their activities and importance is tolerated and encouraged only because they create markets. The minute a national or international standards effort becomes a servant of some clique or cartel, to the exclusion of others, it loses its fundamental justification. (I say “effort” because a body may have thousands of efforts on the boil at any time.) For standards bodies, exclusive behaviour is a mortal sin; in comparison, too much inclusiveness (i.e. by having multiple standards where in a perfect world we could imagine having only one) is only a mild (and bearable) fault. (And, indeed, in most cases I consider support of plurality, to allow the market to choose, a positive virtue.)

Eric Larson

AddThis Social Bookmark Button

When I think of REST, the biggest benefit always revolves around using XML dynamically. It is a matter of reducing the contracts to their smallest possible state. This is why technology like Atom and RSS have flourished, they are totally "abusable". Dynamic XML involves taking this "abusability principle" and applying it to XML. My previous comments on Schematron reflects this mindset. Using dynamically typed languages also helps. Thinking in terms of resources and representations also pushes flexibility over functions and objects.

Dynamic XML involves a subtle constraint of robustness. A huge reason this concept works is because it sets developer expectations. If you expect that you are going to get large variety of XML, it is much easier to disregard everything as long as there is a value at "//xyz:my-value". This is almost exactly the same practice as a function that accepts some object that has an attribute you need. In both cases you will most likely still be checking the actual value anyway, so why force the entire object or XML to be meet requirements that just don’t matter.

When you think of all the recent discussions on functional programming, monads, duck typing and dynamic languages in general, it is clear the programming landscape is changing. I think many programmers have benefited from taking a serious look at these ideas. There is no reason the same ideas can’t be applied to XML with the same result of finding new ways of solving problems.

M. David Peterson

AddThis Social Bookmark Button

So I got a ping from William Candillon yesterday on IM, but I wasn’t around so am just now getting in sync with him today. He and I had a discussion about a year or so back regarding a potential internship with Dana Florescu, you know, the primary mastermind behind the XQuery language. Well, fast forward to a year or so later and it turns out that through a collaborative cross-organizational effort, the following folks,

Cezar Andrei
Vinayak Borkar
Matthias Brantner
Nicolae Brinza
William Candillon
Dana Florescu
David Graf
Donald Kossmann
Tim Kraska
Dan Muresan
Sorin Nasoi
Daniel Turcanu
Markos Zaharioudakis

… got together and created,

Advertisement