As usual, I’ve been busy the last couple of weeks, working mainly on a new JavaScript template library that I hope will simplify a lot of the pain of XML development, at least on Firefox. I’ve discovered the same truth that other JavaScript practitioners are coming to realize - if you stop thinking of JavaScript as Java, the language becomes far more flexible and capable - sometimes scarily so. I’ll be posting this new template library to SourceForge shortly, along with the documentation for it and article here describing it, but until then …
Norm Walsh recently provided an update about XProc - a generalized XML Pipeline language that is being worked on by the W3C. The idea behind XProc is simple enough - you create an XML document that provides “glue” or conditional bindings for difference processes that can occur in an application. One such “standard” project for such a language already exists - Ant - and I find it interesting that Ant has been slowly replacing the cryptic and awkward make syntax in an increasing number of applications, only a small portion of which are XML based.
There have been more than a few arguments about XML based “programming languages” (both make and ANT are examples of what used to be called Job Control Languages, way back in the days of mainframes) … that they are extremely verbose, that they put an undue requirement upon the use of XML (which for some reason procedural developers of both the C++ and Java stripe tend to embrace only very reluctantly), that because it is the “new kid on the block” XML is being used solely for being novel, without any other benefits out of it, and so forth.
I’d contend that many of these arguments exist primarily because these same developers often have at best only a marginal understanding of what XML is, or why it is so important. XML’s primary benefit comes about not because of its performance (which even most XML gurus will cringe at) but because it provides a standardized mechanism for abstraction. If I encode a pipeline process in XML, then it doesn’t really matter (for the most part) whether I work in C++, .NET, PHP, Java, JavaScript, Perl, Python, Ruby, Haskell or Miranda. With the XML representation, I can describe processes that can work in any of the languages given, and more importantly, with those processors and some form of web services pipe I can even pass pieces of processes to different languages, machines, or environments and still get my tasks accomplished.
The future of programming is distributed and asynchronous (and by extension compartmentalized and localized). A mashup is a comparatively simple concept, but in some respects it represents a huge jump forward compared to almost all previous forms of distributed programming (DCOM, CORBA, RMI, etc.). Provide a common language for abstraction, a common meta-language for encoding structure, a common lexical and linking framework and make the processors for handling this meta-language ubiquitous, and you get distributed programming for very nearly free. Because you are using an abstraction mechanism for the data, it becomes the responsibility of the individual language or processor to provide the specific back-end functionality to properly interpret the form of the data (rather than the data itself) - something which has been accomplished with remarkable alacrity in the programming world for XML.
As an aside, I think this is one of the key benefits that XML has over potential rival technologies such as JSON. JSON is topologically similar to SimpleXML, a notion that keeps popping up now and then but has always remained below the threshhold of critical change. JSON requires that a certain level of data type abstraction be specifically asserted within the data structures - fundamentally the notion that an element and an attribute are in fact simply representations of the same thing. However, in working extensively with both, I’ve found that attributes - qualities that are descriptive about a given element, rather than substructures associated with that element, do tend to naturally occur, and unscrambling them in JSON forces the introduction of artificial (and generally unstandarized) vocabulary labels.
Indeed, it may very well be that the rise of JSON is in fact simply another expression of the SimpleXML debate, except that it is occurring outside of the XML orthodoxy and as such has neither been suffering under the disapproval of those who have made investments in XML or faced the scrutiny that SimpleXML has repeatedly faced in the last decade.
Nonetheless, the challenge of a distributed environment is that centralization attempts in general are far more difficult to coordinate. This has been especially true of the pure pull model (aka Web client/server) where the only recourse to confirmation or further action of a transaction is an out of bound process (typically e-mail). However, AJAX is rewriting the rules here, and one of the most immediate effects of this rewriting is that in-bound processes (receiving asynchronous confirmation of a transaction, for instance, or receiving a bundle of information from a transaction necessary to perform additional processing) now become feasible. Any form of resource management application, from stock trading to hotel room scheduling to medical management systems can now treat the browser not so much as an end-point but simply another processing node in a (potentially cyclic) tree of processes.
Something needs to coordinate those processors, however. Pipelines are the logical mechanism to do that. By encoding them in XML, you can pass the process descriptors to the necessary processors without having to worry about what processing language those processors are using. Pipelines by themselves are fundamentally acyclic - they have definitive endpoints, but a good process flow architect also realizes that by placing two pipelines together what you end up creating is a circuit. If you have a pipeline language that can realistically handle asychronous invocation over the short term (which is a fundamental flaw in Ant, as it is (I believe) a synchronous application) then ultimately the only synchronous points that you need come when one pipeline hands off its results to another pipeline.
I don’t find it at all accidental that pipeline architectures have recently gained interest (the W3C has been wrestling with this problem for about a year, and orchestration has long been the holy grail of companies such as Microsoft). AJAX programming is redefining the role of the worlds most common user interface - the web. Asynchronous methodologies on the client are essentially now pushing for a significant re-evaluation of asynchronous methodologies on the server, with the attendant realizations that existing solutions (including the SOAP/WSDL/WDDL stack) are frequently too complex for people to feel comfortable using, because they presuppose that it is the nodes, rather than the conduits, that are the critical pieces of the network.
Pipelines, on the other hand, are fundamentally RESTian in nature - they concentrate on the interactions of “molecular” conduits, and while the characteristics of the individual “atomic” servers are important, without some formal overriding governor at the molecular level more complex structures become ever harder to build and maintain, and the atoms themselves remain largely isolated.
I’m not sure which pipeline architecture will ultimately prove dominant, though if history is any indication it is likely that for a while we may have several in play before any one network becomes sufficiently compelling. I personally am intrigued with the pipeline architecture presented by the W3C, both because of the esteemed efforts of Mr. Walsh and because the design itself has a great deal of merit and integrity. I’d recommend checking it out at http://www.w3.org/TR/xproc/.
Kurt Cagle is an author, industry analyst and software developer living in Victoria, British Columbia, which is, like much of the Pacific Northwest, in significant danger of floating away.


Nice overview! Methinks we may need to collaborate on building a prototype -- lets chat off-line about it.
SOAP and WS is not antithetical to pipelines. You could argue the SOAP processing model (actor/role) presages the current pipeline fad. WS and enterprise busses are making distributed pipelines real...
Kurt, good thoughts.
We'll have to see how the OASIS CAM technology plays into this in 2007 also. Seems like a natural fit at first blush - good mash-up toolkit?!? jCAM open source tool
Rich,
I agree with your assertion (and probably need to rewrite that section a bit, because it is a little forceful). Actor/role SOAP processing (and SOAP as a general messaging protocol) works fine in a pipelining context - I just tend to have reservations about ad hoc client/server SOAP RPCs, because of their inherent fragility. I recognize that most modern day SOAP experts feel much the same way, but I also know that for many people SOAP still means RPC.
Pipelining and orchestration is macro-programming - utilizing the network as a generalized processor. I think we're really just beginning to enter the age where such macro-programming is feasible, and like every other software endeavor we've undertaken over the years, it will take time and experience to sort out what are the best protocols and methodologies, and it will also take a certain realization that once you shift from a largely private conversation between two nodes into a public (and ever shifting) conversation among many nodes in many potential configurations, that it necessitates a shift in perspective and abstraction. WS/SOAP is one key approach to this, and I think it has helped to highlight both the strengths and weaknesses of previous thinking on macro-programming, and will continue to gain presence in developing larger scale applications.
My question is whether that will translate into a corresponding change at the other end of the spectrum, the one currently occupied by RSS/Atom feeds and the fairly strong Fielding philosophies that seems to form the counterweight to SOAP. I see the emergence of two architectures at the network level, one marked by reliability at the expense of simplicity, the other marked by simplicity at the expense of reliability. Both are intrinsically necessary - the requirements of bank processing monetary transactions are VERY different from a political organization trying to communicate their message, for instance - and my suspicion is that we'll still be arguing the benefits of REST vs. SOAP (or its contemporary incarnation) in 2035.
-- Kurt
"I’ll be posting this new template library to SourceForge shortly, along with the documentation for it and article here describing it"
Sounds good... Is this available yet?
Lars