Technical Archives

Erik Wilde

AddThis Social Bookmark Button

During the recent discussion of the OAI-ORE drafts (which use RDF), the claim was made that RDF is serialized in RDF/XML and thus could be considered an XML representation of the underlying data model. My response to that was that the RDF model is different from XML, and that it thus is pretty hard to process RDF/XML using XML tools, in particular when considering all constructs allowed by RDF/XML, and maybe even the possibility how to update RDF/XML data using XML tools alone.

I tried for some time to find a general-purpose RDF/XML parser written in XSLT, but so far could not find one. But Google is imperfect and i might not know the best places where to look. So here is my question: Is there a general-purpose RDF/XML parser written in XSLT? It has to support all the fun stuff allowed by XML and RDF/XML, such as weird uses of namespace declarations, XML Base, rdf:ID and RDF/XML syntactic sugar. It must accept anything that is valid RDF/XML. As a result, it should produce some form of normalized RDF/XML, but I really don’t care that much about the exact format (ideally, it should be XPath-friendly). The parser must be robust enough to produce the exact same normalized result for inputs that look radically different because of XML and RDF/XML syntax variations.

I am really interested to see whether such a beast exists, and if so, how big it is. My guess is that it’s not trivial to write such a parser, but it definitely is possible. After finding out whether such a beast exists, my follow-up question will be whether there is an associated function library that can then work on the parsed RDF model, so that the data can be traversed, queried, updated, and serialized.

AddThis Social Bookmark Button

At the Semantic Technologies conference in San Jose I attended an interesting presentation entitled “persistent identifiers for the real web”. XML often uses URLs for identifying schema namespaces, and I suppose could be credited for influencing RDF’s practice of using URLs for identifying resources. In using RDF to describe and annotate things a problem arises…are you describing the web page, or the thing the web page is talking about. For example, if I assert that:

<http://tcowan.myopenid.com> :likes <http://www.myspace.com/lettucefunk>

Does that mean I like the web page or the band the page is about? As you’re traversing the semantic web it’s going to be advantageous to distinguish between content assets and the real world entities they may represent. Their proposed solution involves PURLs (http://purl.org for example). Normally a permanent URL redirects you to the best representation of the resource via a 302 response. They propose that when the PURL represents a real world entity that the response be given as a 303 (see also). The computer agent can then understand that the “thing” is a real world entity, and that the redirect is not to the real thing, but to another web resource about the thing.

I’m very much in favor of permanent URLs. Otherwise all our assertions will become disjointed as links break, or we’ll have to keep our own “archives” of dead links and sites. I also appreciate the simplicity of Dave and Eric’s proposal, however, I’m not so sure this is really the best way to solve identifiers for real world things. Consider books for example…what would be the best way to represent a book, it’s URL on Amazon or it’s ISBN number as a URN? If we use the Amazon URL we can’t be sure it’s a book, it might be binoculars or a coffee table. The URN however makes it clear:

URN:ISBN:0-395-36341-1

The urn namespace indicates that it’s a book, without a doubt. If PURL were to host a “see also” permanent URL scheme for each declared URN namespace we’d be able to visit that URL to find out more…

http://purl.org/urn/isbn/0-395-36341-1

But on the practical web, we don’t use PURLs or URNs for books, we use the Amazon.com url. I think in practical terms things are going to be represented on the web by the domain that has the best collection with the best open content. Perhaps the best approach in the end is to take advantage of blank nodes.

<http://tcowan.myopenid.com> :likes _:a
<http://www.myspace.com/lettucefunk> :describes _:a
_:a a :funkBand

In English, http://tcowan.myopenid.com likes the funk bank described by http://www.myspace.com/lettucefunk. Now we’ve made it clear, and without the use of PURLs or some new PURL redirection strategy.

Rick Jelliffe

AddThis Social Bookmark Button

I had an interesting discussion today with a key player in the development of a large, quite successful industry-specific standard by an industry consortium with representation from all the key stakeholders. I was surprised that he was less than sanguine about the standard: a common vocabulary was being used by multiple groups each making a schema for their particular sectoral use case, so it looked quite healthy.

But my contact had two particular gripes. The first was that the standardization process was addicted to making new vocabulary items, to the extent that talking about standardizing other things had never worked: the consortium was for making schemas not solving problems! In particular, while there was a lot of attention paid to describing what each field meant, there was no facility for comparison or identification: to say that “this address is that address” or “this person is that person” or “this agent is that agent” except by accidental string matching. So electronic forms using these schemas could be filled out, but data could never be integrated.

The second gripes comes out of the first. Because of the lack of ability to integrate and identify data, it all had to be kept together or messaged around in a bunch. So the schema for a complex process has to include fields for everything in that process except for trade secret fields, which wouldn’t be interchanged anyway: the consortium is made up from fierce competitors with a religious belief that their internal processes will be different from the internal processes of any other company in the same industry. Originally many participants would not even disclose the field names in their databases, they regarded them as so important—only to find that ther field for address was not so interestingly different from their competitor’s equivalent field.

So the result: kitchen sink standards that include so many optional or process-particular fields that the consortium is now having a problem that not enough vendors are able to implement the whole thing. However, underlying this is the problem that without even a simple process model, where each stage of the process could have a fat-trimmed or specific schema, one size has to fit all.

So my contact actually saw the standard as dying rather than thriving: the mania for new elements and structures bloating the project in the direction of unworkability coupled with a refusal to look at standardizing even basic process models or identity/tracking/aggregation capabilities.

Philip Fennell

AddThis Social Bookmark Button

My previous post ‘XSLT and Binary File Formats‘, brought-up the subject of the sequence in XSLT 2.0 and how it can be used to build a byte sequence for a binary file format like a TIFF image. For the XSLT generation of new binary files to be even remotely useful, you would need something that requires transformation into binary data and a way to transform it.

In the world of 3D computer graphics Pixar are the ‘King of the hill’ and their Reyes Image Rendering Architecture defines a very powerful image processing pipeline that is used for the transformation of complex graphics primitives into smaller, simpler primitives that are easier to sample and rasterize. The keyword in the last sentence was transformation, and XSLT is very good at transforming hierarchical data structures like computer graphics models.

To simplify the implementation of a Reyes pipeline processor in XSLT, it makes scene to start with just two dimensions and use SVG as the source model, while the final output format can be TIFF (see previous post). The following example shows an enhanced Reyes pipeline, expressed in XML, that makes use of their bucketing technique to allow for more efficient sorting and sampling strategies. The XSLT transform consumes the pipeline definition in order to control the processing of the source model.

Chris Wallace

AddThis Social Bookmark Button

“One thing leads to another” might be the sub-title for the web. Last night I found myself by some circuitous route in LiteratePrograms, a wiki set up by Derrick Coetzee. The site incorporates a version of that earlier WEB. Donald Knuth’s tangle and weave programs allow a single literate program script to be transformed to a view which make sense to the compiler and another which makes sense to a human reader.

The wiki is a laudable effort to provide code examples, clearly explained in many languages but it is fighting a bad case of spam. Comparative examples is a valuable educational resource to show how to abstract away the specifics of a syntax to see through to the essential similarities and differences. No examples in XQuery and only a couple in XSLT however, so here’s another opportunity for an XQuery evangelist.

Fibonacci is the computer science ‘hello world’. It is well represented in the wiki in numerous languages and many algorithms. The lack of a higher order factoring of the many algorithms away from the code needs tackling. XQuery with its limited functional model, lacking such devices as higher order functions, lazy evaluation, closures and generators, restricts the algorithms to pure recursive functions. When coming to XQuery from a background in imperative, object-oriented languages, the loss of updating variables is quite a challenge. .Good old Fib is not a bad example to start with.

Philip Fennell

AddThis Social Bookmark Button

With all the recent talk of angle bracket taxes and what XML is and isn’t good for, I thought it would be fun to look at taking XSLT to places where it is not normally associated - the generation of binary file formats.

The sequence in XSLT 2.0 is of more use than the humble node-set. Not just restricted to nodes, you have access to things like the tokenize() function, that creates a sequence of strings or you can concatenate a sequence using the comma operator. The comma operator can be used on any data type.

However, there is nothing here that lifts us out of the ordinary; not until, that is, you create a sequence of xs:unsignedByte numbers. This sequence can be considered a byte sequence, and if you can create a byte sequence you can create just about any binary file format you like. A good example of this would be an image file like a Tagged Image File Format (TIFF) image. If you don’t get involved in image compression, it is relatively easy to create a TIFF image, after all it is only a series of sequences of bytes.

Mind you, there are two problems to deal with. The first is that a basic XSLT 2.0 processor does not support the xs:unsignedByte data type. Only a schema aware processor is required to support that data type. So, in the absence of the latter you’d have to make do with xs:integer and put up with the extra memory needed. Secondly, and more importantly is - how to get a byte sequence out the other end of an XSLT processor!

AddThis Social Bookmark Button

XForms allows you to load an entire XML database into a client with a single statement. But this is not always a good design decision. Consider the concurrent user access requirements when you design the grain of your locks.

In the past, when we developed with POHF (Plain Old HTML Forms) each web form has a small set of key-value pairs. Developers loaded multiple transforms of the database through middle tier objects into the web form. They then updated the database by reversing these transformations. Developers tended to work at lower levels of the tree (the root being the top of our upside-down tree).

With XForms however, you can easily load an entire database into your client with a single statement. To do this you just add the following line to your XForms model.

<xf:instance src=”path-to-my-xml-database.xml”/>

But just because XForms enables the developer to easily do this does not always make it a good design decision. XForms give the developer great power, but this power needs to be used responsibly and you need to take your multi-user requirements into account when you do this.

Why? Consider the issues surrounding the locking of records to prevent multiple users from overwriting each other’s changes. Most databases provide record locking. With eXist this is just a simple XQuery statement (util:exclusive-lock) that the developer puts on the server when an update form is loaded.

What you need to consider is how to gracefully deal with multiple clients accessing the same data. Ideally the first user gets the data and sets the lock. The second user should be notified that the record is locked, when it was locked (and perhaps by who) and only allow them a read-only access. By the way, turning a form into a read-only form is just a single line of code in XForms also. See http://en.wikibooks.org/wiki/XForms/Read_Only. I have been tempted to put a nudge button on these forms to let the locker know that someone is waiting for them to close the form, but I have not got around to that yet.

I have also created many administrative XForms for giving non-techies the ability to change things like server configurations. They don’t need to use an XML editor and I can set up complex business rules using bind statements that prevent accidental configuration errors. I wish Apache configuration files can with an XForms front-end! In this case I just lock the file on the server and load the entire configuration file into the XForms client. In this case, the locking grain is course.

You still may want to save prior versions of these configuration files and use XML diff tools (which are also bundled with eXist) to see who changed what and when. And you may want to allow users to revert to a prior version with a single click. But most of this work is just a few additional lines of XQuery on the server.

What you find is that all these lock/read/nudge operations can be done simply and elegantly with the combination of using XForms clients, XQuery on the server and REST interfaces.

What is needed now is a unified framework built around XRX to make all theselocking issue something that is can be addressed in a single XQuery module that can be loaded on the server.

Note that you don’t need to lock records as they are being created but you may want to check for duplicate records as the users enter their data. This can easily be done in an as-you-type submission if you don’t mind a little extra bandwidth.

So how have you dealt with locking in the past? Do you have any techniques that you have developed in the POHF/RDBMS era that we can use in the XRX era?

Jeni Tennison

AddThis Social Bookmark Button

I wrote some XSLT the other day that was so neat it made me smile, so I thought I’d share it. It’s an example of how the new <xsl:next-match> instruction and tunnel parameters can combine to simplify your code. Fair warning: this is XSLT 2.0 through-and-through, and the use case here is one you will only care about if you process documents (rather than data), and pretty complex documents at that.

M. David Peterson

AddThis Social Bookmark Button

Has anyone noticed the same trend I have? We’re in this weird area of language evolution in which those of us who have been trained to think statically are beginning to envy some of the niceties provided by dynamic languages (such as implicit types) while at the same time those of us who have been trained to think dynamically are beginning to envy some of the niceties provided by static languages (such as explicit types.)

Weird.

Take, for example, C# 3.0 implicitly typed local variables,

Chris Wallace

AddThis Social Bookmark Button

I think students of information systems design have a tough time compared to other designers, architects and graphic designers say. A budding architect might have lived and worked in dozens of building, visited scores more and seen hundreds before they start to design their own. Information systems designers have no such grounding in examples, their prior experience sometimes confined to being the victim of poor systems.

The web has improved this situation, particularly for interface designers, but it is still hard to see into the guts of an application to learn about database structures and software architecture. Web applications do have the wonderful advantage that they can be explored, and I exploit this feature in my teaching, getting students to take well-known sites like Flickr and del.icio.us and infer, through source viewing and experimentation, the underlying conceptual data model. This works well but only occasionally can we compare our speculations with the real thing. It would be great if more sites had their systems documentation online.

AddThis Social Bookmark Button

I have been using the eXist native XML database with REST interfaces to store metadata for the last two years. It is a great system and I have been encouraged by others to document the benefits. Here is an excerpt.

The Problem

I have been working with a group of Business Analysts (BAs) that were using Visio to document business requirements. Mostly things like use cases and UML diagrams. They asked the question: can you tell me if we are using all the correct approved business terms in our diagrams? Can you create an easy-to-use process to check these against our glossary of approved business terms?

The Solution

Visio has a way to save their documents in XML format. Specifically you can use the “Save As…” function and select the “XML Drawing” file type. Since we use eXist as our metadata store and eXist has a WEBDAV interface it was easy to give them a folder on their desktop they could save their Visio drawings into. Once they did this they just clicked on a URL that ran a simple XQuery on their Visio files. This XQuery (about 15 lines) looked for each text element in the Visio document. The query line looks like this:

for $term in doc($mydoc)//v:Text (: here v is the namespace for Visio :)

The query then uses a standard function supplied with my glossary manager that displays a link to the term if it exists in the registry and red text if it does not exist in the registry. I believe that because I selected eXist and WEBDAV/REST interfaces that the solution was simple and elegant.

So how about your metadata registry? How would you approach this problem? Can you do it in under 15 lines of code and provide a drag-and-drop interface?

M. David Peterson

AddThis Social Bookmark Button

In a follow-up conversation to the post made by Dimitre Novatchev, Jesper Tverskov provides an excellent summary as to why XML and JSON are incompatible. He goes on to describe several functions in XSLT he feels would help alleviate at least some of the pain, but to keep things focused on understanding the incompatibilities between XML and JSON, I’ve left them out. When XSL-List archives todays conversations, I’ll update this post with a link to his comlete post.

However, before I provide his summary, I want to quickly provide some of my own thoughts on the XML -> JSON <- XML discussion which I provided in follow-up to a comment made by Robert Korberg,

On Sun, 18 May 2008 08:04:26 -0600, Robert Koberg wrote:

> Bottom line - there is no standardization. If you want to do xml2json
> and json2xml you pick your library and write for it.

I agree. Furthermore I believe the notion of converting XML and JSON to and from each other is the wrong approach altogether. Instead I believe the emphasis should be upon creating a standard format for JSON in which can be referenced and queried by XPath in such a way that regardless of whether the incoming format is XML or JSON, the same XPath applied to both will result in the same generalized result set.

In fact, if not mistaken, this is what Mike Champion and friends have been discussing over in the MSFT camp for several years now, which makes sense when you look at what they’re doing with LINQ-to-*.

Jesper’s summary follows in-line below,

M. David Peterson

AddThis Social Bookmark Button

Dimitre Novatchev recently posted the following to XSL-List, something of which I thought would be of both interest and benefit to those of you in Land-O-XML who care about things kinds of things. As such,

M. David Peterson

AddThis Social Bookmark Button

So as Jeff Barr recently pointed out over on the Amazon Web Services blog,

Amazon Web Services Blog: Redundant Disk Storage Across Multiple EC2

M_david_preparing_for_ec2_persisten
XML Hacker M. David Peterson has put together a really interesting article.

As part of his work at 3rd and Urban, he has implemented redundant, fault-tolerant, read-write disk storage on Amazon EC2 using a number of open source tools and applications including LVM, DRBD, NFS, Heartbeat, and VTUN.

Mark notes that "the primary focus of this paper is to present both a detailed overview
as well as a working code base that will enable you to begin designing,
building, testing, and deploying your EC2-based applications using a
generalized persistent storage foundation, doing so today in both lieu
of and in preparation for release of Amazon Web Services offering in
this same space."

The article provides complete implementation details and links to source code for the scripts that Mark developed.

You can read the article, and you can also follow progress via the discussion group.

– Jeff;

Firstly, and most importantly, as pointed out in the first portion of this article,

Michael C. Daconta

AddThis Social Bookmark Button

The IBM Information Server has a business glossary manager that I am implementing for several clients. Some of those clients have existing data dictionaries and glossaries that will need to be imported into the product. The IBM information server has an XML format to allow you to import/export business glossaries.

There is a lot to talk about in examining this format. There is the good, the bad and the ugly in this format. Before we begin our dissection there are two contextual topics in need of some discussion. First is examining the goals of the format and second is determining whether those goals could have been achieved using existing formats.

At a high-level, the format has three main goals which correspond to its three main elements: represent terms and their definitions (via the term element), categorize terms (via the category element) and add custom attributes to categories or terms (via the attribute element). Except for the metadata extension mechanism (custom attributes), this is a simple way to create and organize a dictionary in XML. When examining the schema or the example of the format it is clear that it is far from a complete standard. For example, the available data types for custom attributes is only String. So, it is clear that this format will evolve. A bigger question is - should it? And should it even have been created in the first place?

There are quite a few formats for capturing glossaries, dictionaries and thesauri in XML. A colleague of mine, Ken Sall, examined this for the government a few years back. The W3C has SKOS, IBM has subject classification in DITA (though DITA is much broader than glossaries), and XML topic maps can also serve this purpose.

So, although we will continue to explore the details of this format and even conversion of some of the others mentioned into this format, what are your thoughts on it?

Until next time, see you in the trenches… - Mike

David A. Chappell

AddThis Social Bookmark Button

I’m presenting a keynote at the next International Association os Software Architects (IASA) IT Architect conferences on May 22 - 23 in New York City.

I was looking through the agenda and I came across this -

Interesting Real-world Architectures and the Handbook of Software Architecture presented by Grady Booch via Second Life

I checked with the conference organizers, and sure enough, Grady is going to be at his home base (which is usually Hawaii), and broadcasting the presentation via 2nd Life, and conference goers will be witnessing his avatar giving the presentation on the big screen.

How cool is that!? I want some!

I assume that means he’ll be also giving it in second life, and others will be able to join him there.

Dave

M. David Peterson

AddThis Social Bookmark Button

SIDENOTE: @amike: The run was good while it lasted, eh? ;-)

SIDENOTE.NEXT: After rereading the title, I’m not even sure it makes any sense. But then again, what’s new? ;-) :D

[Post.Body]
DonXml’s All Things Techie : Mixing Object, Functional and Aspect Oriented Programming

Within a DSL it would be cool if you could map its Nouns to Objects (described via OOP), its Verbs to Functions (described via FP), and its Adjectives and Adverbs to Aspects (via AOP).

I have to do some research, but does this fit within the definition of a composable language? I tried to fine a definition of what a composable language, but didn’t seem to find one.

Oh, the power of XSLT 2.0 (and XPath 2.0), where you can bring together the knowledge-pool and massive underlying code base of OOP, fold in the power of Functional Programming, and weave all of it together with AOP to produce a truly composable language as a result.

For example,

AddThis Social Bookmark Button

The linked data principles formulated by Tim Berners-Lee read quite straight-forward:

1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information.
4. Include links to other URIs. so that they can discover more things.

However, when it comes down to implementing them, you may start wondering how to interpret them. I guess some of them were intentionally phrased rather generic. Astonishingly, RDF is not mentioned explicitly. Though, the third principle talks about ‘useful information’: I think this can be interpreted to be useful for machines in the first place; we’d expect to find something at the other end of the link that is at least GRDDLable (such as microformats, RDFa, etc.). The fourth principle is also known as follow-your-nose.

One of the key ideas in linked data is to interpret the property in an RDF statement as a typed hyperlink. So, when I come accross the RDF statement

<> dc:author <http://sw-app.org/mic.xhtml#i> .

I will assume that the author of this post is indeed http://sw-app.org/mic.xhtml#i. I can learn more about this person when dereferencing this URI, i.e. do an HTTP GET to fetch its content. To learn more about URI design and usage, have a look at Cool URIs for the Semantic Web, a W3C note recently finalised.

The above mainly arose from a recent discussion with Richard Cyganiak regarding our riese system, an RDFised and interlinked version of the Eurostat data.

AddThis Social Bookmark Button

Lately I’m becoming more bullish on RDFa. I loose heart when things don’t materialize on my timeline, however, I was recently reminded of a scene in “Under The Tuscan Sun” wherein the actor describes how they built tracks over the Alps before there was a train in existence that could make the trip.

“They built it because they knew someday the train would come…

That’s similar to what the semantic web community is doing. In my analogy the tracks are the specifications (RDF, OWL, RDFa) and data sources. The train is of course applications that use these specifications and information to make our online lives more convenient. And for me personally, the mountain is grasping how it all works and most important of all, applying it in the real world.

That brings me to what I want to share. If you are particularly interested in RDFa, http://rdfa.info/ is probably the best link to watch. I found a very well written and informative tuturiol on linked data that I highly recommend for anybody gearing up to apply this technology. The recommendation to “not define new vocabularies from scratch” struck me as particularly good advice. If you are wondering what the more common, well used and understood RDF vocabularies are, here are my top 5:

AddThis Social Bookmark Button

One valid answer to the question in the title would be: I’m both into linked-data and RDFa. Hey, but that’s not the answer you are interested in, right? We’ll have a look into both and find a better answer by the end of this post. Oh, right, by the way, let me introduce myself shortly. I’m new to xml.com and I try focusing on Semantic Web stuff.

In the beginning, there was the URI. Kingsley recently wrote about it, coming from the plain old untyped @href hyperlink. Then there was RDF, not so well known, and still often confused with one of its serialisations, namely RDF/XML. But there are other ways to deploy RDF as well. In a couple of weeks, presumably, RDFa will be finalised by W3C. RDFa is all about delivering structured metadata in HTML. Much as microformats, RDFa uses attributes to ‘hide’ - or, more technically: embed - metadata in HTML.

Coming back to URIs: The hyperlinks basically were the success factor of the Web as we know it. Typed or semantic links are expected to be the same for the Semantic Web. TimBL wrote up the so called linked-data principles a bit ago (URI for everything, HTTP URI, RDF properties). An example might help understanding both RDFa and linked-data; compare

this page is under <a href="http://creativecommons.org/licenses/by/2.5/">CC 2.5</a> license
this page is under <a rel="cc:license" href="http://creativecommons.org/licenses/by/2.5/">CC 2.5</a> license
 

The key is the rel="cc:license" bit. This is actually a piece of valid RDFa (telling that this content is under a certain license) and equally is a typed link. It overloads the simple @href hyperlink and let’s an agent (be it a search bot or a syndication site) interpret and follow it properly. I think you get the point, right? To sum up: RDFa is the way doing linked-data. Coming back to the initial question, I guess the main point is that both are manifestations of the real-world Semantic Web emerging these days. While in the last couple of years most of the people involved in Semantic Web stuff maybe thought ontologies and reasoning are the most important issues to deal with, it’s a bit like building a marvellous roof and finding out one day that there are no walls, and not even a foundation to put it onto.

Simon St. Laurent

AddThis Social Bookmark Button

I spend a fair amount of time providing technical support for friends, family, and the occasional local political campaign. Looking back over the past few years, it seems clear that I’m spending a lot less time helping people with Windows (thank you, Macintosh) but a lot more time helping out with various wireless network problems. Most of those problems seem to be caused by dying routers.

Rick Jelliffe

AddThis Social Bookmark Button

Many people don’t find abstention easy. Some don’t have the habit, some don’t see the point, some people are irrepressible, some people are used to having their way, and others think it is an attack on their rights and duties. Having hung around a few different standards bodies, it seems to me that one of the distinctives about ISO/IEC JTC1 is the role that voting abstain plays. Other standards bodies have it, but there seems sometimes a stigma or idea that abstaining from a particular vote represents a failure in expertise: a loss of face and an insult to pride. The worry that you need to be on top of everything, perhaps coupled with the paranoia that people are trying to scam you. But, as Clint Eastwood says, a man’s got to know his limitations.

Lets look review the Fast-Track procedure. The JTC1 Directives (which have sway here) allow National Bodies three kinds of reply on a standard: see s 9.8 (bold added by me; DIS means Draft International Standard, DAM means Draft Ammendment, NB means National Body):

Approval of the technical content of the DIS as presented (editorial or other comments may be appended);

Disapproval of the DIS (or DAM) for technical reasons to be stated, with proposals for change that would make the document acceptable (acceptance of the proposals shall be referred back to the NB concerned for confirmation that the vote can be changed to approve);

Abstention

Note that the only criteria countenanced under these JTC1 rules for approving or disapproving a fast-track standard is because of the technical content: it is or isn’t up to scratch. Editorial issues alone are not enough. However, any significant comments, even editorial ones will trigger a Ballot Resolution Meeting, where these things can get looked at: they don’t disappear into a black hole. Under the JTC1 rules, non-technical and non-editorial issues just don’t seem to be legitimate grounds for acceptance or rejection: the only slot for a National Body wishing to act in good faith to the JTC1 Directives but who have significant non-technical and non-editorial concerns is to abstain.

Now, a National Body that votes disapprove has a duty (JTC Directives s13.7) to participate at a Ballot Resolution Meeting (BRM). A Ballot Resolution Meeting has to be open to representation from all affected interests, convened in a timely manner, keeping in mind the spirit of the fast-track process. (JTC Directives s13.1) “The spirit of the fast-track process” does not seem to be a defined term.

Issues from National Bodies that arise after the deadline for the initial ballot (or after the BRM, or where the BRM did not go far enough in some desired direction or went the wrong way in the NB’s opinion, etc.) get handled by the NB raising defect notices with the Steering Committee looking after the standard (in this case, SC34, after the fast-track gets standardized. As well, NBs (and ECMA or other liaison bodies) can raise an immediate draft amendment, which can itself go through the fast-track procedure! (If an NB thinks the editor’s instructions have not been followed, they can raise the matter with the ITTF (the body responsible to make sure that the BRM’s instructions have been followed) who, as I am sure is expected of them, will respond with a service-oriented attitude of “Whoops! Thanks!”

A Ballot Resolution Meeting for a fast-tracked draft is unusual because what comes out of the meeting is a set of editor’s instructions. I have read some incompetent reporting on other websites that somehow a BRM’s result is an approval or disapproval of the standard in question. Never let the truth get in the way of a good story, I suppose.

My experience of ISO/IEC JTC1 is only through Steering Committees, Working Groups and a certain recent Ballot Resolution Meeting, on and off since the mid-90s. However I have also participated in multiple groups at W3C and observed OASIS and IETF. The thing that is interesting in JTC1 meetings, from what I have seen, is that there is usually a really strong idea that you do not block the minority interests of another national body, just because you have no interest. (I have seen a committee basically fall apart because one NB dominated and tried to block the legitimate and specific interests of another NB: what happens when NBs attempt this kind of selfish trick can be that the parties who were stymied lose faith and simply go to another standards body.)

An effective delegation at a meeting who have niche requirements will take care to remind other NB’s delegations that unless they have technical expertise in that area, they should abstain. Or if the niche requirement may be significant for broader concerns, an effective delegation will try to explain in or outside their meeting what the technical issue is. However, it is part of the gentleman’s agreement that you vote on the issues: a delegation with particular issues shouldn’t have to make a specific request for other NB’s to abstain on issues that they do not have an actual technical opinion on, any good faith delegation will attempt to do that anyway (though sometimes they may get lost amid all the other tasks.)

I have found that in the ISO meetings I have experienced, the contributions of the individual are really important. In SC34 you think of the contribution of James Clark for example. This was a theme of Martin Bryan’s memorable phrase standardization by corporation (e.g. see farewell report as chairman of SC34 WG1.) The system is geared to having deep experts who are highly sceptical, but who very willingly defer to others in areas outside their expertise. In fact, the ISO Directives (part 1 s 1.11.1, a splendid number) define a Working Group as comprising a restricted number of experts who act in a personal capacity and not as the representative of the…organization…by which they have been appointed however the JTC1 Directives nuance this (s2.6.1.2) WG members shall, where possible, make contributions in tune with their respective NB positions (which does not in any way stifle individual contributions, as long as the status is clear.) There is an interesting example in JTC1 Directives Annex J3.1, concerting the development of standards for APIs, which explicitly mentions that multiple kinds of experts are required. I am not saying that generalists or observers are not important in technical meetings, however, the meetings are technical and need technical people: governments wishing to participate more in standards need to be asking themselves what programs they have in place to develop and encourage the necessary range of deep expertise in order to be effective at this level. (And one of the best ways is to start to send experts to meetings, and getting them to review standards of different sorts, and to expose them to standards practices of different organizations to help them to be critical and functional.)

Technical experts are frequently ratbags, a (nowadays quite fond and) useful Australianism.

Macquarie dictionary (1991):
n. colloq. 1. a rascal; rogue. 2. a person of eccentric or nonconforming ideas or behaviour. 3. a person whose preoccupation with a particular theory or belief is seen as obsessive or discreditable: that Marxist ratbag. -ratbaggery, n. -ratbaggy, adj.’

but the ease of abstinence at ISO tames this tendency. I have read more than once that new people coming to the SC34 meetings are surprised at the level of helpfulness and collegiality that usually can be seen (and I think Ken Holman had a lot to do with achieving this tone.)

JTC1 groups try to act by consensus. But consensus is not unanimity, but is defined in part as a general agreement, characterized by the absence of sustained opposition to substantial issues…. To understand the role that abstention plays in ISO, I think you have to see how it dovetails into this definition of consensus: consensus is not an issue of achieving an absolute positive majority of all parties! In fact, JTC1’s view of consensus demands the ready availability of the option to abstain, otherwise NBs and participants will be forced to make decisions they don’t wish to or are not competent to or are not briefed to.

Voting “abstain” on issues at ISO is not a failure. Indeed, sometimes the briefs for delegations have instructions that require them to abstain. But experts who have to abstain can still be critically valuable to the process. Because of this, and because of the mutual spirit of accommodation and collegiality that usually prevails, abstention is easy and a more frequently used option than people used to other standards systems may feel comfortable with initially. But is it not for no reason.

Rick Jelliffe

AddThis Social Bookmark Button

Prof. Rob Cameron of Simon Fraser University has just announced on the XML-DEV mail list his open source Parabix XML parser, which seems to set new benchmarks for parsing speed, using the SIMD instructions of modern processors.

I am particularly interested in this, because a year ago when Cameron released his UTF-8 converter that trialled his approach, u8u16, I said

I would love to see an XML parser that combines Cameron’ SIMD work with the optimizations from IBM’s XML Screamer, which seem to increase the speed of Java processing by two or three fold.

I’ll have a look at this over the next few days, time permitting, in more detail. There are not many areas in text processing where there is new work being done: the 60s and 70s saw most of the basic work and data structures, so I think it may be a quite startling development. Well done, Rob!

Intel has also being doing work in the area of hardware speed-ups to parsing. Anyone else doing research in this area?

Simon St. Laurent

AddThis Social Bookmark Button

REST offers a great way to build simple applications that Create, Read, Update, and Delete resources. But what if you want to get at part of a resource?

M. David Peterson

AddThis Social Bookmark Button

Update: Subbu Allamaraju has followed up my post with “Idempotency Explained” which is worth a read. I’m not sure if I agree 100% with his comments due to the fact that — as far as I know — the same request to create/edit/update an entry/attribute on SimpleDB will always yield the same result no matter how many times the request was made. Then again, I could very well be completely off base here. /me is reading through the docs again to ensure I haven’t missed something.

Anyone in the know care to clarify one way or another?

Either way, thanks for the extended overview, Subbu!

[Original Post]
So for various reasons I’ve had the opportunity to get to know a lot of the folks who design, develop, deploy, market, and support the various offerings of Amazon Web Services, and it’s because of this I found it funny to hear people criticize Amazon for “setting back web architecture 10 years” with the release of SimpleDB. For example, Dare Obasanjo provided the following commentary,

I’ve talked about APIs that claim to be RESTful but aren’t in the past but Amazon’s takes the cake when it comes to egregious behavior. Again, from the documentation for the PutAttributes method we learn,

<snip/>

Wow. A GET request with a parameter called Action which modifies data? What is this, 2005? I thought we already went through the realization that GET requests that modify data are bad after the Google Web Accelerator scare of 2005?

I’ll admit that at first I was right in line with Dare’s point, or in other words, WTF?

But as I mentioned, I know a lot of these guys personally, and I can assure you not a single one of them could qualify as anything other than the best and brightest this world has to offer as it relates to the field of computer science. So I’ve always held off from criticizing, assuming that eventually it would all make sense.

Apparently eventually =~ February 19th, 2008,

Rick Jelliffe

AddThis Social Bookmark Button

By default, Schematron uses XPath 1 for setting contexts, testing assertions, and producing dynamic diagnostics. Actually, it is XPath 1 as used and extended in XSLT 1. This has lead many people to think it is just a nicer declarative front-end to XSLT, which indeed it usually has been.

However there have been many requests to allow more powerful languages, and ISO Schematron was designed to allow this. There is an attribute called queryBinding on the top-level schema element, and this lets you declare which query language you are using. The standard even specifies a document called a “Schema Language Binding” and says the information that this must provide. It also reserved several names: “xslt1, xslt2, xpath2, exslt” etc.

So here are the draft text for new annexes I will be submitting to SC34 (and thence to national vote) for augmenting ISO Schematron. EXSLT was a community effort to define some more powerful functions for XSLT1. XPath2 is the updated version of XPath from W3C, very much changed, in particular with a different and large function library; the xpath2 query language binding allows the minimal, untyped, untyped-data profile. XSLT2 is the reworked XSLT1, and the xslt2 query language binding allows the typed data (PSVI) if you want it (Schematron doesn’t provide any mechanism for making sure that is what you are working with) and also user-defined functions in the XSLT2 namespace.

Most interestingly, perhaps, is the STX binding. I am supposed to be contacting the STX editor to see about using this query language binding plus the STX specification as an ISO standard (another part of DSDL.) Actually, STX was voted on for this purpose, but without the query language binding some national bodies decided it couldn’t be classed as a schema language, but it should be an easy fix, since the hard work has been done and the NBs are onside at last.

The thing about STX is that works in streaming fashion. So you can test documents larger than your virtual memory. STX is much less limited than the subset of XPath that XSD uses.

The draft bindings are here (sorry in boring custom XML not typeset to HTML.) Comments are very welcome, and thanks to the schematron-love-in mail-list members for comments and prods. There are a few other issues on the table for a revised Schematron upgrade, but they all can procede independently of these bindings, if time is not my friend.

M. David Peterson

AddThis Social Bookmark Button

Pat Eyler works about a block and a half from where I live in downtown SLC, UT, and yesterday we met up for lunch. Amongst our far reaching topics of conversation included the proper way to pronounce Rubinius. In case any of you were like me and had no clue how to properly pronounce it, here’s the general idea,

Say “Rubik’s Cube”.

Then replace the “k” in Rubik’s with “n”, the end result sounding like a Reuben sandwich (mmm… my favorite! :D) or Rubin Stoddard (hmmm… not so much my favorite, though I’m not an American Idol fan (that’s cuz’ I’m not a TV fan, not because I despise the show itself) so that could be why.)

So now that I think of it you can probably just skip the Rubik’s cube ‘k -> n’ transfusion and move right on through to Reuben/Rubin, but whatever makes you happy; I would just run with that and call it good. ;-)

OK! So, now that we have the first part figured out… Say “I” (as in you, but in reverse) make that EEE, as in “I am an idEEEot”. See ‘probablycorey’’s comment and my follow-up below for all the gorey details and then “us” (as in “you and I”), putting all of them together to form,

Rubin EEE us

… and that’s it! You now know how to properly pronounce the Rubinius project :-) But no need to thank me. Thank Pat!

Thanks, Pat! :D

M. David Peterson

AddThis Social Bookmark Button

Update: Thorsten has followed-up with some interesting comments regarding the current state of the virtualized industry, the problems he’s seeing a lot of companies facing, and various obstacles they’re running into along the way. Interesting stuff!



[Original Post]
((AOTD == Advice of the Day) == True)


Amazon Web Services Developer Connection : Instances not responding …

A word of advice, easy to give, hard to follow: design your system so you can relaunch any critical instance!

Amazon has thousands of instances available, just waiting for you to hit the launch button. If a current instance smells bad and your own troubleshooting doesn’t resolve it, launch a new one and bring your service up on it. Actually, if it’s critical, you should have two running so you’d be left with one while you replace the failing one.

All this should be motherhood and apple pie on EC2 or any other hosting facility, or also in your own datacenter for that matter. Systems fail.

Thorsten von Eicken, Posted: Feb 2, 2008 10:53 AM PST

BTW: Thorsten is one of the smartest individuals I have ever had the fortune of coming to know. *GREAT* guy, and someone in whom if you need help with Amazon Web Services-related consulting, in particular EC2, I would *HIGHLY* recommend getting in contact with his company, RightScale. Just the right combination of open source, open minds, and openly giving more than he/they receive in return, so I believe it’s certainly both fair and in-line with the ideals of O’Reilly, and therefore this blog to provide promotion.

M. David Peterson

AddThis Social Bookmark Button

As sad, desperate and/or pathetic as it may sound, I often times will find myself rooting around the Mono Project SVN repository looking for buried treasure; One of the intended side effects of open source software is the freedom and encouragement to experiment, so there’s a tendency for those willing to dig to find things that haven’t made it into an official release, but they’re both useful and useable tools, libraries, applications, etc. none-the-less.

Today, apparently, is my lucky day (though I’m surprised I hadn’t noticed this before given Eno did the initial check in 7 months ago),


 Assembly/	 81031	 7 months	 atsushi	 initial checkin.
  Mono.Xml/	 81031	 7 months	 atsushi	 initial checkin.
  Mono.XsltDebugger/	 81155	 6 months	 atsushi	 2007-07-02 Atsushi Enomoto <atsushi@ximian.com> * XsltDebugger.cs XsltDebugg...
  ChangeLog	 81031	 7 months	 atsushi	 initial checkin.
  Makefile	 81031	 7 months	 atsushi	 initial checkin.
  Mono.XsltDebugger.dll.sources	 81031	 7 months	 atsushi	 initial checkin.

M. David Peterson

AddThis Social Bookmark Button

Actually, there are plenty of reasons why F# ((F == Functional) == True)) *ROCKS*. Here’s a few from the previously linked F# site on Microsoft Research,

Combining the efficiency, scripting, strong typing and productivity of ML with the stability, libraries, cross-language working and tools of .NET.

F# is a programming language that provides the much sought-after combination of type safety, performance and scripting, with all the advantages of running on a high-quality, well-supported modern runtime system. F# gives you a combination of

* interactive scripting like Python,

* the foundations for an interactive data visualization environment like MATLAB,

* the strong type inference and safety of ML,

* a cross-compiling compatible core shared with the popular OCaml language,

* a performance profile like that of C#,

* easy access to the entire range of powerful .NET libraries and database tools,

* a foundational simplicity with similar roots to Scheme,

* the option of a top-rate Visual Studio integration,

* the experience of a first-class team of language researchers with a track record of delivering high-quality implementations,

* the speed of native code execution on the concurrent, portable, and distributed .NET Framework.

The only language to provide a combination like this is F# (pronounced FSharp) - a scripted/functional/imperative/object-oriented programming language that is a fantastic basis for many practical scientific, engineering and web-based programming tasks.

F# is a pragmatically-oriented variant of ML that shares a core language with OCaml. F# programs run on top of the .NET Framework. Unlike other scripting languages it executes at or near the speed of C# and C++, making use of the performance that comes through strong typing. Unlike many statically-typed languages it also supports many dynamic language techniques, such as property discovery and reflection where needed. F# includes extensions for working across languages and for object-oriented programming, and it works seamlessly with other .NET programming languages and tools.

For those of you unaware, F# is now a first class MSFT language, or in other words, this is no longer a “Hey, here’s an idea. Let’s research it.”-type project and instead a true-blue MSFT product backed by mean-green MSFT money, led by some of the very best and brightest minds @ MSFT.

If you were to ask me “What’s the future language foundation of the .NET platform?” I would first state “More than likely, XSLT 2.0++.” And then when you stopped laughing and slapped me upside my head to awake me from my dream I’d say, “What the F#!? was that for?” and you’d say “F#??,” followed by “Isn’t that for programming the way God intended for people to program on the .NET platform?”, and then I’d say “Okay, you got me on that.” at which point we’d move on…

So here’s the thing: While there are *TONS* and *TONS* of reasons why F# *ROCKS* (did I mention that F# is distributed as both an MSI and a ZIP, the latter designed to make it easy for folks using Mono to take full advantage of what F# has to offer?), the biggest reason it *ROCKS* is this,

M. David Peterson

AddThis Social Bookmark Button

I don’t agree with everything Sean McGrath writes in his latest post as I think there are a lot of really smart people who have developed some really smart ways to handle the variable width nature of XML w/o turning to malloc() every time the length of an element or attribute name reaches past any given preset constraints. That said, I can’t help but agree with,

Memory-based caches of “cooked” data structures are your friend.

Absolutely!

For you .NET developers here’s a pre-written recipe that handles all of the dirty work of determining whether to create a new XmlReader or return the in-memory cached version based on the generated ETag for the source file (see Extended Overview below for a deeper understanding of how this works.) To use this recipe you need to do nothing more than create a new XmlServiceOperationManager when your application starts up like so,

XmlServiceOperationManager myXmlServiceOperationManager =  new XmlServiceOperationManager(new Dictionary<int, XmlReader>());

and then use the GetXmlReader method of the XmlServiceOperationManager, passing in the Uri (an actual System.Uri object, not the string value of the URI, though I guess it would be easy enough to create an overload that takes the string value of the URI. Another task for another day. ;-)) of the desired XML file to get an XmlReader in return like so,

XmlReader reader = myXmlServiceOperationManager.GetXmlReader(requestUri);

That’s it! Now you can use your “new” XmlReader however you might need and the next time that file is requested for processing if it hasn’t changed you save all of the time it would normally take to read the source file and convert it into an XmlReader which is fairly significant.

Source code and extended explanation inline below. Enjoy!

Oh, and stay tuned for the next installment of this recipe where we learn how adding,


1 Part memcached
1 Part ETag's

and


1 Part GZip encoding

… can turn your lame a$$ performance sucking web application into a lean, mean, kick a$$ performing machine. For a precursor, see Joe Gregorio’s AtomPub presentation slides from this past OSCON. I assure you, it’s worth every second you spend studying this gem of a resource.

David A. Chappell

AddThis Social Bookmark Button

I just published part 2 of an article exploring the “Next Generation Grid Enabled SOA”. This one is sub-titled “Not Your MOM’s Bus“.

Abstract: In our previous article we discussed how SOA grids can be used to break the convention of stateless-only services for scalability and high availability (HA) by allowing stateful conversations to occur across multiple service requests, whether between disparate service boundaries or load-balanced groups of cloned service instances.

In this article we will challenge traditional applications of message-oriented middleware (MOM) for achieving high levels of quality of service (QoS) when sharing data between services in an enterprise service bus (ESB).We will further compare and contrast a state-based, in-memory storage and notification model, and investigate the intelligent co-location of processing logic with or near its grid data in large payload scenarios. Finally, we will also explain when to substitute an SOA Grid for existing MOM technologies as driven by the following question: “If you have an SOA grid that can reliably hold application state data and the necessary systems can access it, why continue to utilize conventional messaging?”

Read More..

Cheers,
Dave

M. David Peterson

AddThis Social Bookmark Button

I’ll keep this short: Decentralized Conversations-as-a-(Web)-Service. Interested? Can you write code?

If += yes here’s the rules,

*ALL* code will be released under a Creative Commons Attribution License. In other words, the only requirement is that whomever uses the code you might write gives you attribution. They can close the source, hide it in a dark corner of the Internet, and in other ways never be required to give back in the same way that you gave. In fact, some people might attempt to steal your code and call it their own.

Are you okay with that?

If += yes,

David A. Chappell

AddThis Social Bookmark Button

- Grid computing will grip the attention of enterprise IT leaders, although given the various concepts of hardware grids, compute grids, and data grids, and different approaches taken by vendors, the definition of grid will be as fuzzy as ESB. This is likely to happen at the end of 2008.

- At least one application in the area of what Gartner calls “eXtreme Transaction Processing” (XTP) will become the poster child for grid computing. (see Gartner Research ID # G00151768 - Massimo Pezzini). This “killer app” for grid computing will most likely be in the financial services industry or the travel industry. Scalable, fault tolerant, grid enabled middle tier caching will be a key component of such applications.

- Event-Driven Architectures (EDA) will finally become a well understood ingredient for achieving realtime insight into business process, business metrics, and business exceptions. New offerings from platform vendors and startups will begin to feverishly compete in this area.

Rick Jelliffe

AddThis Social Bookmark Button

The vogue quip that “a camel is a horse designed by committee” probably makes more sense to people who don’t live in a desert country. From here in Australia, camels seem to a very plausible design. It is the speaker, actually, who is wrong: what you need is a camel when you are in the desrt, a horse on the planes, a yak in the mountains, perhaps a porpoise in the sea, and an elephant in the jungle.

The ongoing XML Schemas trainwreck shows little sign of improvement; that users have so repetitively stated their problem and received no satisfaction from the W3C shows how disenfranchised they are. I am thinking about these things again this week for three reasons.

First, I saw (only 2 years too late) the AT&T-originated guidelines on XML Schemas Best Practices which underly a best checker tool at Java.net. It goes through the capabilities of a particular class of application (while assuming that everyone is interested in the same class of applications grrr “XML” is not just what one set of software uses) and gives a list of what will cause problems or be unportable. Some (like deprecating <appinfo>) are dubious, but most seem well-founded. It is a good document for anyone reading.

The tables in A.2 and A.3 is especially interesting, or horrific in practical terms. None of the software supported derivation of complex types by restriction fully, most not at all. None fully supported ID datatypes. Only one implementation fully supported enumerations. Basically, type derivation of complex types was a complete non-starter.

The other reason I am thinking about it was for work. A customer wants to use MS InfoPath with a schema I have been working on. But, predictably, InfoPath has a range of things it doesn’t support. Many of them (replacing “unbounded” for the cardinality of choice groups with some reasonable number) are trivial, but it is the same issue.

A little over a year ago, Paul Klee had a great summary article on XML.COM XML Schemas Profile. It mentions the 2005 W3C organized W3C Workshop on XML Schema 1.0 User Experiences, and the do-nothing Chair’s report (”No-one wants anything, and if they do they don’t agree, and if they agree it cannot be done, and if it could be done other people don’t want it, and if other people do want it they actually want something else, and if they don’t want something else it would be confusing.”) It looks like very strong leadership for inertia, and it cheeses me off that their laziness affects me and my clients at the end of 2007.

One positive thing that has come out has been the W3C Basic XML Schemas Databinding Patterns which lists various XPaths that databinding tools can have. (It mentions how to use these in Schematron, which is good too!) But it doesn’t come up to the level of a profile. (And, to be fair, the W3C Schema WG has also upgraded XSD to reduce some gotchas that have been reported, such as allowing unbounded on all groups.)

Why not? Because, as far as I can make out, the idea that we will all be better off if we pretend that XML Schemas is a unified and whole specification, one size that can fit all, then somehow it will magically happen. But fantasy is a really poor substitute for reality. Time and time again I have seen clients happy about XML Schemas and its promises, only to have their hopes dashes as they realize that as soon as they need to start deploying they have to use subsets and there is no support from “standards” to help interoperability.

The third thing? DIS29500 gave XML Schemas that worked in MSXML, but failed in Xerces. This was raised as an issue (by Japan among others) and the schema is being reworked to support Xerces. (The issue is to do with circular imports IIRC: I think the new schemas will be in a single file per namespace and that will help the RELAX NG conversion too.) Again, this is an issue we are dealing with in late 2007.

And that is what you get when you have a large standard that is not sufficiently modular and focussed to support its main applications: guaranteed non-interoperability. This lack of modularity has been an issue that has been relentless pointed out to the W3C XML Schema Working Group and just as relentless ignored: and the result is that it is surprising if we find a schema that works out-of-the-box with the particular tools desired for a job.

Why is that we are going into 2008 and we still have exactly the same kinds of problems that were clearly expressed as real problems in the 2005 experience workshop, and which were predicted vociferously before then?

M. David Peterson

AddThis Social Bookmark Button

So in about 6 days all direct addressed EC2 instances will be shutdown. This day comes with *PLENTY* of warning, so decommissioning the 3 direct addressed EC2 instances that we still have running has been planned for a while. Of course, why do something now if you can just as easily put it off until later? ;-)

Okay, so maybe that’s not the best philosophy in life, but when you’ve designed your server infrastructure around worst case scenario disaster recovery, the thought of “losing” an instance or three doesn’t present the type of anxiety you would normally expect, so in the case of EC2, it actually works pretty well.

That said, as per the following screen scrape *even if* we didn’t design our system with a worst case scenario mentality, we’d probably still be okay,

M. David Peterson

AddThis Social Bookmark Button

It seems Safari is the only browser that will leave you left wondering why on earth it — seemingly randomly — refuses to make even the most token attempt at accessing any particular URI via the document function. That’s because each of the other browsers will automatically URL encode GET requests where as Safari will not and as such will throw an internal (I assume internal to the underlying OS?) error. Of course it won’t tell you it through an error which will be the source of significant hair pulling, but none-the-less, an error has been thrown — somewhere. ;-)

I’m not immediately finding anything in the XSLT 1.0 spec that even remotely touches on whether or not it is the job of the transformation engine, the underlying system, or the developer to properly URL encode a request, so I can only assume that regardless of whether or not it’s a pain in the a$$, not URL encoding requests made via the document function is completely within the realm of a standards compliant XSLT processor. Anyone in the know care to clarify?

In the mean time, one of the better resources I’ve found for both quick and easy reference as well as on-the-fly encoding of any given URI is located @ http://www.blooberry.com/indexdot/html/topics/urlencoding.htm. If you find yourself about ready to rip your hair out because Safari refuses to make any attempt at retrieving the document located at any given URI, check the above resource. Chances are pretty good that something as simple as a | character not being properly URL encoded is the culprit.

M. David Peterson

AddThis Social Bookmark Button

RFC 2068: Section 3.2.1: General Syntax

Note: Servers should be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations may not properly support these lengths.

Okay, so if I’m making a web service call to a particular URI it’s more than likely going to be inside of my own code base as opposed to inside of someone’s client. And in the cases that it’s not chances are pretty good that this same client doesn’t support the extended functionality of my shiny new asynchronous Web 2.x+ app. So whether or not a client supports URI lengths over 255 bytes is probably less of a concern given that these same clients couldn’t support my application in the first place.

But let’s set aside the most likely client-side scenarios and assume nothing: RFC 2068 is about a week shy of being 11 years old. Is the 255 byte URI length recommendation still applicable? From the client perspective, possibly not. But what about from the proxy perspective? And are there clients (possibly mobile browsers?) that I’m not taking into consideration that still impose a limitation on the URI length?

NOTE: As of October 27th, 2007 the limit inside of Internet Explorer is 2083 bytes. Is 2083 bytes today’s equivalent of the 255 byte recommendation of 11 years ago? (You would have to assume that MSFT didn’t arbitrarily arrive at this figure, basing the limitation on known limitations of the existing infrastructure of the Internet, correct?)

David A. Chappell

AddThis Social Bookmark Button

In recent articles and presentations I have been postulating that a concept called “next generation Grid Enabled SOA”, a.k.a. “SOA Grid” and “Not your MOM’s Bus”, combines conventional SOA infrastructure technologies such as BPEL and ESB with middle tier data grid technology to provide a new level of predictable scalability and high availability for SOA based applications.

I often get asked - “How much better is it? What’s the ROI?”

Keith Fahlgren

AddThis Social Bookmark Button

Here’s my notes from the last day of XML Conference 2007. David has collected some of the blogging about the conference.

Keith Fahlgren

AddThis Social Bookmark Button

This is the continuation of blogging from XML Conference 2007. See yesterday’s post for more. There are, of course, a lot of folks blogging about the conference. Here’s my colleague Andy’s take. Elliotte Rusty Harold is providing some wonderful reading as well (and apparently did a smashing job at the XForms talk last night). For a visual sense of the conference, check out David Megginson’s photos on Flickr.

Keith Fahlgren

AddThis Social Bookmark Button

Just like last year, I’ll be blogging from XML Conference 2007. Rather than imposing some editorial structure, this’ll simply be a serialization of the things I hear from various speakers in various sessions.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve took a day off to install Sabayon Linux 3.4. It has taken me a week to get it right, with many false steps. My initial verdict: simple things are simple and work out-of-the-box, hard things are hard and require a separate internet connection to do research. The unfortunate thing is that you have no idea what is simple and hard until after the event…this has all the hallmarks of being a distribution made by gamers with superfast internet connections and superfast machines, and this caused me a lot of grief and wasted time.

Pros:

  • very nice desktop (KDE with modern 3D effects if your card supports it);
  • very good out-of-the-box capabilities especially for support for different media types in Firefox and lots of drivers; this was my first experience of changing the video card on a working Linux system and having the thing work correctly after.
  • very good for people who want quite a large full featured distribution and have no internet access (it takes about 10 meg installed from a DVD!);
  • very good for old UNIXy types like me who want su and want to recompile the kernel;
  • works well with modest hardware: my PC is 8 years old for example: during the week, I upgraded my RAM to 512Meg, and typically don’t go even get into swapping when running Eclipse, Firefox and Thunderbird.

Cons:

  • It didn’t recognize my ATI card correctly, so I had to install in text mode and fix things up by hand. So much for out-of-the-box.
  • It didn’t recognize my LG screen with 1440×900 resolution.
  • When I replaced the ATI with a new NVIDIA card, it recognized this, but the default nv driver did not provide 3D. (I had played with them on another machine: Beryl/Compiz are pretty attractive.) So I am using just the 2D desktop.
  • I downloaded the driver that NVIDIA provides, and found that there were three different web pages with different methods for installing it. I wish people would bother to write which distribution of Sabayon (or Gentoo even) they were writing about. Anyway, I couldn’t get any of these methods to work. One involved recompiling the kernel, which then wouldn’t run. I ran a repair install from the DVD, and (next day) I had a running Linux again, but I’ve ditched the nvidia driver for the generic nv driver. 2D will have to do.
  • Poor desktop admin tools compared to other distributions to help you connect into local LANs not using dynamic addresses; for example, I could not find anywhere in a graphical tool to set the DNS server location.
  • Attempting to connect up to a printer was a disaster. Its nice automatic search tool locate our Ricoh 2035, and let
    me select the drivers for it, but then told me that it did not have these drivers in fact.
  • Sabayon uses a package manager called emerge however it is not a RPM-alike, it works very differently. It downloads patches, then recompiles the application. I made the mistake of doing this for Thunderbird, and it took over 5 hours (8 hours? 24 hours? who knows, I was long asleep).

Monday

I’ve previously used Mandrake Linux, then switched to Mint Linux for months ago: Mint has a lot going for it, but I never got around to configuring it happily to what I wanted, and the Upgrade Button Debacle was a bad start.

So a DVD of Sabayon was available in a newsagent, and looked interesting. I don’t think I’ve used a Gentoo-based Linux before. Sabayon is big: it complained that my 12 gig disk might not be big enough: very different to Mint’s dainty footprint. Saboyon comes with a lot of drivers built-in: one of the attractions being that will be, I hope less downloading and configuration.

I’d tried the DVD-boot on my laptop so I knew the DVD worked. On the laptop, the Beryl/Compiz windowing worked: lots of effects and translucency and vibrating windows. Fun but useless. Sabayon is very much aimed at gamers, I think, but that was a plus for me: I was tired that in Mint there were several media types that would not run in Firefox: I am too busy to track down download things.

Booting from the DVD on my desktop machine, the first thing that became clear was that it incorrectly detected my network card. I have a decade old ATI Rage Turbo Pro, which has worked fine on other Linuxes until now. The web gave the answer immediately: edit /etc/X11/xorg.conf to use the ati driver.

Next, my screen was not correct. Fair enough: it is a LG wide screen that prefers 1440×900. Again, the web got the answer very fast. Type gtf 1440 900 60 to give the correct modeline entry for /etc/X11/xorg.conf (and make sure there are no other screen resolutions at the same depth that are larger in either horizontal or vertical axes.) Great.

Now that I knew the screen would be OK, I installed the OS and, after installing, edited the /etc/X11/xorg.conf to be correct. I configured the networking and started to play with Firefox, which comes as part of the distro. All up, this had taken 5 hours, but only 2 hours that required my attention: the install from the DVD is a long period.

But oh dear, Firefox was clunky and the graphics stuck. Hmmm, probably time for a new card. Go home and eat up the leftover 60/65 eggs and caviar and rocket from Monday’s dinner. I decided to install a new graphics card, which daunted me a bit, because I have never liked altering the hardware of linux boxes: on old UNIX systems it was always a breeze: just recompile the kernel. But I hoped the Sabayon Out-Of-The-Box approach would make things OK.

Playing around with the utilities that look like they should connect or browse to the SMB network here, no luck either. So after a little more than 8 hours, I have a computer with internet connections, but no working update or other networking: it is just a matter of configuration I am sure, but it is taking more time than I thought. I kind of subscribe to the school that it is not a bad thing to gain the skills to have a working system, however, this is probably the 10th different flavour of UNIX I have installed over the years and it still doesn’t work.

Tuesday

I plugged in an NVidia GeForce and booted, and miracle of miracles it saw it, and everything came up fine. Great! Actually, there is a new error message at boot time and I did have to edit the xorg.conf but no grief. The nv driver is built in: there is also a newer nvdia driver that is supposed to be more snappy, but I only want to do things I have to. The graphics are now much better: there is still some problem on large pages with lots of embedded video, such as www.matrixsynth.com, with sluggishness and even hanging. It may be the slower network connection here. But at least all the videos do come up now.

However, on the down side, there is no sign of the 3D-isms: the Compiz panel says that it requires something called dbus, and who knows where it is or how to make it run? More study needed.

Playing with Sabayon, I was most relieved to see that su is available, unlike Mint. Great.

Gentoo defaults to dynamic IP allocation, which we don’t use. The various tools let me configure the Ethernet OK, but they did not provide anyway to actually turn on the ethernet automaticaly at boot time. Gentoo is way behind Mint and Mandrake there. The solution ended up being the command line rc-update add net.eth0 default, it seems.

Wednesday

There was a nice printer interface, including control of CUPS ACLs. Too much for my requirements. The discovery mechanism found one of the three printers, and let me select the Ricoh 2035 driver, and then asked me which flavour I wanted (foomatic, postcripts, etc). However, then it told me that no such driver existed. Grrr, so much for out of the box. I’ll leave printers for another day.

Next, I wanted to do email. One reason for moving from Mint was to give me an excuse to move back to Thunderbird. When I moved to Evolution I had a substantial increase in spam getting through, and I there were quite a few features I couldn’t figure out how to access. Oh, no Thunderbird out of the box.

Fair enough, this will give a chance to try out the package manager: so what is it? Looking around the desktop, there is absolutely no indication what to do for upgrades and updates. Mint is way ahead. Eventually, I found that Gentoo uses a command called emerge and provides a tool called portato. So I tried emerge on the command line.

A website had advised emerge mozilla-thunderbird. Oh dear, it kept looking for some mirror site in Korea that would not respond to FTP. Some more web browsing, and I decide to add to the line SYNC "rsync://rsync.au.gentoo.org/gentoo-portage" to the file /etc/make.conf. No difference: I suspect it may be to do with updating index files not locating mirrors. Oops cut and pasted the wrong thing. I’ll rebuild the index with emerge --sync: this looks promising—it looks at a local (Australian) site like I’d expect. Perhaps the problem was that the index files were out-of-date and emerge was trying to download removed versions of patches and files?

An hour of downloads and indexing later, and emerge advises me to update portage. Ok emerge portage. An hour later (including a reload, rebuild and reinstall of bash for some reason) it finishes. Oh, and then it advises me to synchronize the index again. An hour later (maybe les) and that finishes. Oh, except that I need to move some files into /etc by hand, for security reasons. Actually, I don’t mind this, it is nice to have the chance to feel in control, and diff shows only trivial differences.

I always reboot regularly during installs, because it lets you zero in on problem stages; while the system is down I take the opportunity to upgrade my RAM from 370Meg to a massive 512Meg! Running the OS plus a busy Firefox only takes about 350Meg RAM with no use of swap, but it would do no harm to have a few Gig sometime in the new year, I suppose. <fogey>Of course, I remember getting an extra 2Meg of RAM for my AT&T UnixPC, bringing it up to the extravagant 4Meg: streaming systems are wonderful at requiring small RAM resources.</fogey>

So it reboots fine, and I try emerge mozilla-thunderbird again. This time, it all is good. Except, it clearly isn’t downloading and installing Thunderbird: it is compiling it…surely this cannot be right! I am torn between thinking “Oh no Installation will never end!” and thinking “Recompilation, great! No crashes because my system doesn’t match the developer’s idea” but really this is so far from being plain-folks-friendly that it seems hard to accept. Surely I am giving the wrong command-line option to emerge? I certainly won’t be using it to get Eclipse (if Eclipse is available) :-)

(Update: I think I should have used emerge mozilla-thunderbird-bin to get the binary distribution.)

…It is now three hours later and Thunderbird is still compiling. I downloaded and installed Eclipse Europa in about 2 minutes. It didn’t start at first, because of some problem with the Java 1.6, so I changed the symbolic link in /etc/java-config-2/current-system-vm to link to the JDK 1.5. At least they are all built-in: no need to download, which is something.

Friday

I decide to try to load the NVIDIA drivers. What a complete waste of time. Up until now, the information on the WWW has been first rate and Google has suggested info well. But for these drivers, it is all crap. I check that my kernel has the options it is supposed to: yes all good. But the driver fails. Finally I go the whole hog and recompile the kernel, following the most elaborate of the instructions. Now the box won’t even reboot.

Luckily the intall from the DVD has an option to reinstall but keep foreign files, so I don’t have to go through the eclipse drama again. But it is another overnight operation.

Sunday

So it boots again, and I can reconfigure it for the card and screen, and Eclipse and Thunderbird are still installed. Hoorah. A few little complications again, it is quite confused about the networking settings, so I need to do them again too. Sigh… but 1 week is too long. And actually Thunderbird has stopped working, so I have to reinstall it (just the binary again.)

So…

With that off my chest, I should say that I am much more positive about Sabayon than I was about Mint, at this stage. But both of them required no less configuration effort than Mandrake required, just in different things. It is unfair to judge a gamer’s distribution on office interoperabity grounds, I suppose, but not super unfair. I took a lot of extra time, but that is partly me exploring how the thing is put together, and it is not Sabayon’s fault as a distro that an external driver caused grief: I wonder whether it wasn’t provided built-in for a good reason?

On the positive side, Sabayon definitely impressed me in my card-switching adventure and in the range of codecs available for Firefox. And it feels more mainstream UNIX. (And I now definitely prefer the KDE desktop to the GNOME, it seems.) But while it is no out-of-the-box, it seems that (apart from the emerge problem) the kinds of things I have to study and configure by hand are the kinds of things I expect to have to: it is not nearly as good as I wanted, but I don’t have an out-of-control, situation-hopeless feeling. Linux Situation Normal: SNAFU.

The final install takes up 10 gig of diskspace, and takes many more hours than the “half hour” that the Sabayon website promises. It suffers from the same problem as Mint and presumably some other Linuxes: the developers have no idea that users might not have superfast connections or plenty of spare time, or that users might be interested in time estimates of how long an operation might take to complete before embarking on it. But it certainly took the auto from the matic to get emerge working.

But 1 week is far too long to be playing around getting an OS installed (OK, it wasn’t a fulltime week, I did other things on other systems most of the time…). And I still have to it connect up to our file and print servers. Perhaps I have the wrong end of the stick: instead of thinking of distros as bases which you then customize to get what you want, perhaps I need to think of them as products where you only go beyond the box with severe trepidation. That would be a pretty bad thing. If I had known a week ago about how long it would take, I would have gone with FreeBSD or Solaris, which were my original choices until I was seduced by the shiny box at the newsagent. But I will give it a go, and see what I think.

I should be in a filthy mood from this, but the desktop is so pretty and I feel more like a mountain-climber on top of a mountain.

M. David Peterson

AddThis Social Bookmark Button

ongoing ? On Communication

You see, if you draw the right graph, maybe you’ll see the gaping hole in it, the Next Big Thing.

Communication-graph.png



I don’t know, Tim, but am I the only one that sees blips on a radar screen? ;-)



Who wouldn’t want to expand the human communication spectrum?

Absolutely!

Why aren’t more people thinking about this stuff?

TheyWe are. But instead of thinking and talking (Update: That process has been going on for quite some time now), we’re building and delivering.

BTW… Have I ever mentioned just how much I enjoy releasing projects on January 1st of each year? ;-)

(Something tells me 2008 is going to be a big year.)

M. David Peterson

AddThis Social Bookmark Button

As per the title, this is the first thing that came to mind when I read the following post from earlier today from Dimitre Novatchev regarding transitive closures,

Rick Jelliffe

AddThis Social Bookmark Button

I’ve just started looking into the ACORD schemas. These are the standards of choice in the English-speaking insurance world, from what I can gather (oops I am being too coy), and are quite meaty and mature now: the documentation runs to about 3,000 pages. Various little birds had told me that Schematron had been used to augment the XSD schemas in several places, so I thought it would be interesting to look at why.

This is not to point out deficiencies in XSD (the facts can speak for themselves) but to look for the relative strengths of Schematron. This kind of data, of course, is very prone to have having several layers of rules for each user: business rules, occurrence rules that come from the forms used, systems limits, and so on: each of these can well be represented by Schematron schemas usually (or combined as different phases of the same schema.)

But lets just look at a three additional constraints above the ones that XSD schemas can represent, just from s4 Implementation Conventions of Acord Life, Annuity and Health Standard v2.17

The first interesting thing is the use of typecodes, explained in s4.4. ACORD documents are interesting because they want to use the same XSD schema and elements for each stage of processing. So when a form comes in, before it has been assessed, in some process all the data may be just treated as strings. Then when a datum has been assessed and possibly fixed up, then it can be marked with a typecode has having a certain data type, for example being a date (in 8601 form).

This is pretty much unfeasible in XSD: I don’t think we can use xsi:type for this, because IIRC the type nominated by xsi:type has to be derivable from the actual type specified in the XSD schema, and a date is not derived from String. (Maybe XSD 1.1 fixes this, it doesn’t matter.) In Schematron, it is easy: something like

<sch:pattern>
  <sch:title>Type codes</sch:title>
 <sch:rule context="*[@tc='1']">
  <sch:assert rule=".='true' or .='false' or .='1' or .=0'">
      A <sch:name/> element should be a boolean</sch:assert>
 </sch:rule>
 ...other rules for other typecodes...
</sch:pattern>

There is an interesting constraint in s4.13 that says that aggregate elements with no optional subelements should be omitted. This is not something that can be specified using grammars, since it makes the occurrence of a parent dependent on the value of a child. The Schematron assertion might be as simple as something like this:

 <sch:rule context="*[string-length(.) = 0]">
   <sch:assert test="not(*)">Aggregate elements should contain elements with content</sch:assert>
 </sch:rule>

In s4.14 it speaks of nested data ranges: the example they give is

it would not be valid for a PolicyProductInfo to specify an expiration date of 3/1/2005, while one of its child JurisdictionApprovals specifies an expiration date of 4/1/2005.

This is obviously quite trivial for Schematron, especially when you make life easier for yourself by using sch:let to parse the dates into fragments that make comparison easier.

TriSystems Infobahn have a brochure (PDF) on their approach for using Schematron with ACORD, for people who want more information. The ACORD schemas were developed with respected industry figure Daniel Vint as the senior architect: I see he is potentially nabbable for contract work now.

Rick Jelliffe

AddThis Social Bookmark Button

W3C’s Services Modeling Language group has two new drafts out: Services Modeling Language 1.1 (latest version) and Service Modeling Language Interchange Format Version 1.1 (latest version). From the abstract

This specification defines the Service Modeling Language, Version 1.1 (SML) used to model complex services and systems, including their structure, constraints, policies, and best practices. SML uses XML Schema and is based on a profile of Schematron.

SML comes out of the XML activity at W3C, not the WS-* activity, so it seems more aimed at working on top of POX (plain ole’ XML) systems. It has representation from IBM, Sun, BEA, CA, Intel, HP and a Microsoft. WS has a bad rep at the moment for over-engineering, but that is partly because many people have problems that they want to be solved by the almost-simplest possible technology. The would prefer erring on the side of modesty rather than grandiosity.

SML has nothing directly to do with services despite the name, and nothing to do with modeling for that matter either: that just seems to be the use-case that has driven the development of a more general technology that takes seriously the problem How do we validate systems of documents, including documents held in multiple files and documents that transclude other documents?, which seems to be an entirely practical question to me: this is the kind of use case that should be driving XSD and DSDL development IMHO.

As I understand it, the recipe for SML is roughly

  • Systems or services are modeled using XML documents which are either definition documents or instance documents
  • Definition documents are either schema documents that use W3C XML Schemas (with a completely reworked version of XSDs key/keyref mechanism allowed under appinfo that handles multi-file references), or rules documents that use ISO Schematron (vanilla XSLT query language with a slightly extended XPath). A whole Schematron schema is plonked into the appinfo element rather than using the Eddie Robertsson’ minimal form for embedded Schematrom, however, they use a rule context of “.” which works out the same. A nifty attribute is added to allow better localization.
  • The model documents are validated against the instance documents
  • A little error report container, to hand back bad data.
  • A kind of transclusion link to allow documents to reference other parts: yet another replacement for entity references! The interesting idea is that the refered-to fragments are not substituted in the document, so we have two PSVIs: the PSVI of the document transcluded and the PSVI of the document without the transclusion. A deref() extension is provided for XPaths: I supose this is something to add to the list for the Schematron skeleton implementation. XPointers can be used for references: I see that, of course, it is the restricted XPath that doesn’t include the range-to functions that killed XPointer. The link allows the element name at the other end to be specified.
  • The Interchange Format (SML-IF) provides containers and accoutrements for bundling everything up into a single file for interchange

I’ll write to the SML group, because they have the use of sch:schema/@queryBinding slightly wrong. It is intended to clearly label what query language is used. The SML draft says that it must be “xslt” however actually they use an extended xslt. What they need is a little Query Language Binding document (which only needs to be a paragraph) to define a query language binding name like “xslt+sml” or whatever. If users don’t use deref() they won’t need to do anything, but it is better to catch schema errors early rather than having obscure XPath messages.

The downside of SML is that it again (as did WSDL’s extensions) shows that XSD, despite being so large, is still simply not capable enough: a non-trival language should be able to handle non-trivial problems otherwise what is the point? Schematron’s approach of explicitly allowing different query languages (and providing guidance on profiles and embedded vocabularies) is much more flexible and practical, IMHO.

In other Schematron news, I see that it is being used by the RELAXED online HTML validator (SourceForge). This project is a good demonstration of using the ISO DSDL little schema langauges together: NVDL, RELAX NG, and Schematron. NVDL and RELAX NG are also used in Open XML, and ODF was defined using RELAX NG. For comments on making standards from Schematron schemas, see this blog item.

Uche Ogbuji

AddThis Social Bookmark Button

I finally created a FOAF file for myself. I exported my LinkedIn contacts (that link should work for you if you’ve recently logged into LinkedIn) to “vCard (.VCF file)”. I then imported the vCard into FOAFgen. Result is here. I think I’ll write a Python script that works with the vCard file and the FOAF to handle new or updated contact entries. I must say, FOAF is really ugly (as if, unfortunately, so much RDF/XML), so I’ll have to be closing my eyes a lot as I write tools to avoid my having to stick my fingers into it. I guess the saving grace is that everything else is even uglier (including hCard).

Rick Jelliffe

AddThis Social Bookmark Button

I was reading the Ant (the make system) documentation today, and in the section on copy I came across this horrible note:

Important Encoding Note: The reason that binary files when filtered get corrupted is that filtering involves reading in the file using a Reader class. This has an encoding specifing how files are encoded. There are a number of different types of encoding - UTF-8, UTF-16, Cp1252, ISO-8859-1, US-ASCII and (lots) others. On Windows the default character encoding is Cp1252, on Unix it is usually UTF-8. For both of these encoding there are illegal byte sequences (more in UTF-8 than for Cp1252).

How the Reader class deals with these illegal sequences is up to the implementation of the character decoder. The current Sun Java implemenation is to map them to legal characters. Previous Sun Java (1.3 and lower) threw a MalformedInputException. IBM Java 1.4 also thows this exception. It is the mapping of the characters that cause the corruption.

On Unix, where the default is normally UTF-8, this is a big problem, as it is easy to edit a file to contain non US Ascii characters from ISO-8859-1, for example the Danish oe character. When this is copied (with filtering) by Ant, the character get converted to a question mark (or some such thing).

There is not much that Ant can do. It cannot figure out which files are binary - a UTF-8 version of Korean will have lots of bytes with the top bit set. It is not informed about illegal character sequences by current Sun Java implementations.

One trick for filtering containing only US-ASCII is to use the ISO-8859-1 encoding. This does not seem to contain illegal character sequences, and the lower 7 bits are US-ASCII. Another trick is to change the LANG environment variable from something like “us.utf8″ to “us”.

Now, lets put aside the question of why anyone would copy using text operations rather than binary operations. The larger question is why one earth, in 2007 and ten years after XML came out, we are still using text files that don’t label their encoding?

Let me put it another way: if you make up or maintain a public text format, and you don’t provide a mechanism for clearly stating the encoding, then, on the face of it, you are incompetent. If you make up or maintain a public text format, it is not someone else’s job to figure out the messy encoding details, it is your job.

If avoiding the issue is the wrong approach, what is the right approach? One of the right approaches is to adopt Unicode character encodings (UTF-8. UTF-16) as the only allowed formats. (This is what RELAX NG compact syntax does for example.)

Another right-ish approach would be for every text format to adopt explicit labelling: the disadvantage of this however is that, like HTML’s <meta> element, that it is unsatisfactory to have to parse deep in the document in order to be able to parse the document. And to have recognition software that understands the conventions of each format is impossible.

However, it is possible to generalize XML’s encoding header into a delimiter-independent form that can be adopted . My 2003 suggestion for XTEXT gives the details. I don’t see any disadvantages to XTEXT: in the post-XML world, programmers have moved from being puzzled by encoding labels to understanding that are a valuable part of the furniture.

An XTEXT-aware Ant (or default readers that recognize XTEXT conventions) would allow the problem to go away incrementally, as developers and maintainers adopt it. But the trouble is some mix of a lack of leadership by people developing or maintaining text formats: they don’t see themselves as part of a larger community of text users, I guess, or believe that there is any advantage in participating in a larger community. I suspect that this ultimately because the developers of text formats are people who think in terms of ASCII or who don’t have contact with use-cases where there are different character sets possible. The problem is pushed downstream. Not only incompetent but lazy?

Am I being too harsh? I hope so. In particular, in this day and age of international standards, the burden for fixing this has shifted from the developers to user-community representatives: it is something that governments and non-ASCII-locale standards bodies need to consider.

When I say “You are incompetent” an entirely satisfactory rejoinder back at me is to say “Yes I am: I can only respond to demand from people who are affected by this issue, and the standards and procurements processes are the place for these demand to be manifested!”

But buck-passing won’t fix anything. If we know the problem won’t go away, why cannot we (we consumers or we developers) deal with it?

David A. Chappell

AddThis Social Bookmark Button

Since publishing my recent article on Next Generation Grid Enable SOA and taking this topic out into the world, I have been getting asked to clarify and frame the discussion around why state management in what is supposed to be “stateless” SOA is such an important issue. Steve Jones of CapGemini bluntly stated No they ruddy well shouldn’t be when he wrote his opinion on stateful vs stateless services in a SOA.

My observation has been that the need for state management is a continuum that ranges from completely stateless to fully stateful services as the complexity of the business logic and the longevity of the service instance increases.

David A. Chappell

AddThis Social Bookmark Button

Since joining Oracle I have been working across the various product teams in the Fusion Middleware Group, to create a vision for what I’m currently calling “Next Generation Grid Enabled SOA”. I recently published an article on the subject in SOA Magazine.

Rick Jelliffe

AddThis Social Bookmark Button

One possibility for the co-existence that hadn’t grabbed my attention until today has probably been obvious to everyone else: when converting from OOXML to ODF just embed OOXML-namespaced elements inside the ODF where there is no direct equivalent.

This allows good round-tripping, doesn’t require ODF to be extended with legacy Office-isms, allows developers who want to support more than the ODF base to do so, gives better fidelity for Office users, improves round-tripping and doesn’t require that competitors sit down in the same room. Furthermore, in the case of say DrawingML, the original can be preserved as well as converting it to SVG, so the chances for round-off errors and data corruption from incomplete converter implementations is lessened.

ODF already allows foreign namespace elements. I guess what ODF would need to support this well would be a mechanism to say “This kind of foreign element should be stripped out when its context changes, but round-tripped otherwise.”

The reverse is also true: where ODF supports something that OOXML does not, it can either use the customXML elements or a separate XML part.

That would be a nice use of XML namspaces, actually. Rather than harmonize into a single format, augment each other without defining new elements. I don’t know that this would be satisfactory in every case, though. The daVinci plugin, as I understand it, generated ODF where it could but resorted to nonsemantic markup (binary?) where there was elements it didn’t understand or ODF didn’t support; a better approach would be to use Office Open XML elements for that purpose.

(In a previous blog, I raised the option that a document can be ODF and OOXML at the same time. The idea here augments that considerably. But it has the same thing in common: that there are other ways of thinking about ODF and OOXML than just as arch-rivals. People think that that either ODF and Open XML will go away: I think both will be around for a while so the issue we have to face is how to manage them. Hat-tip to Patrick Durusau for the namespaces idea.)

M. David Peterson

AddThis Social Bookmark Button

Step One: Set up a new project on GoogleCode.

Step Two
: Access http://code.google.com/p/<project_name>/source and locate the “reset the repository” link on the right hand side of the page.

Step Three: Click that same mentioned link.

Step Four: Locate and click the “Reset Repository” button.

Step Five: Smile, knowing you’re about to “stick it to the man” ;-)

Step Six: Using your own GoogleCode username in place of mine, the location of your projects repository on your local file system in place my projects file system location, and your newly created project on GoogleCode projects name in place of my projects name, run the following command from the machine in which your projects repository is located,

svnsync init --username xmlhacker file:///srv/svn/nuxleus https://nuxleus.googlecode.com/svn

Don’t have access to the machine your project SVN repository is hosted on? No problem. Just replace file:///srv/svn/nuxleus with the network accessible svn://, http:// or https:// equivalent of your projects SVN root.

Step Six: With that process now complete, run the following, doing the same replacements mentioned above where appropriate,

svnsync sync --username xmlhacker https://nuxleus.googlecode.com/svn

Step Seven: Smile even bigger knowing “the man has just been stuck” with acting as a mirror for your projects SVN repository.

Step Eight: Stay tuned for the next installment of HGC, “[Hacking GoogleCode:Part Two] Using GoogleCode as Your Projects Main SVN Repository Without Having To Give Up Your Beloved Trac Project Management Interface To Instead Be Forced Into Using The Man’s GoogleCode’s Broke A$$ Attempt at Implementing a Proper Bug Tracking and Feature Management System Much Like I Have w/ the nuXleus Project Repository Which Is Now Happily Hosted on GoogleCode While My Beloved Trac Project Management Interface Is Still Firmly In Place and In Sync With the Check In’s Made To The GoogleCode Repository” (or maybe a title that’s a slight bit shorter than this one. ;-)

Update: DO NOT CHECK IN A SINGLE NEW REVISION IN YOUR PROJECT’S PREVIOUS REPOSITORY IF YOU PLAN TO USE GOOGLECODE AS YOUR PRIMARY SVN REPOSITORY.

Instead, either check out a new working copy from GoogleCode or use svn switch --relocate https://projectname.googlecode.com/svn from within your existing checked out working copy.

On the other hand, if you plan to use GoogleCode as a read only mirror (not really all that wise if you give read/write access to anyone other than yourself as, as far as I know, you can’t restrict check-ins to the GoogleCode repository past simply not adding any new project owners or developers),

DON’T CHECKOUT A SINGLE WORKING COPY FROM THE GOOGLECODE REPOSITORY USING https://. USE http:// ONLY OR FIND YOURSELF *REALLY* MAD WHEN YOU ACCIDENTALLY CHECK IN CHANGES TO YOUR “READ-ONLY” SVN REPOSITORY.

Rick Jelliffe

AddThis Social Bookmark Button

The early tallies I have of the number of comments from national bodies DIS 29500 is about 3,550: I expect there are an awful lot of duplications though. Is that a lot?

A question on this came up on XML-DEV this weekend.

Tim Bray said

The task of addressing all ten thousand or so ISO-member comments, even after removing dupes, and dealing with the callouts to unspecified product behavior, and so on, with no assurance that doing so would result in ISO blessing, seems just insanely expensive and difficult to me. If those guys take it on, they have my respect and sympathy.

Michael Kay responded

Actually, 10,000 comments on a 6,000 page spec doesn’t sound like a large number to me. If I had less than two comments per page on a book or spec I had submitted for technical review, I would be concerned that the review wasn’t thorough enough. Perhaps people were holding back because they don’t want to provide MS with a free QA service.

And Jim “ISO SQL” Melton added

Or perhaps most people were somewhat intimidated by the prospect of (thoroughly) reviewing a 6,000 page document. To put this in perspective for those who know SQL’s size and complexity, the sum of all nine parts of SQL is about 3950 pages. A ballot on SQL frequently receives several thousand comments, and we’ve been balloting versions of SQL for 20 years!

In fact, virtually every large spec I’ve ever had the “pleasure” to review leads to “thread-pulling”, in which every page yields at least “one more” bug, and following up on that one leads to more, and following up on those leads to still more, etc. I would personally be stunned if 30 dedicated, knowledgeable reviewers of a 6,000 page spec on its first public review were unable to find at least 3,000 unique significant problems and at least 40,000 minor and editorial problems. But that’s just me…

And here is a comment from my blog a few week’s ago:

A big standard will have a lot of changes. If my 30 page standard had 10 changes in its final stages of national review, then DIS 29500 will have about 1000 changes at the same rate (assuming it has 3000 normative pages, which is probably too much). That is just the slog in getting a standard out the door, tedious work not a cause of panic.

So my bold prediction is that the extreme anti-OOXML squad will alternate incoherently between “Its too many! We have to draw the line somewhere!” and “Its not enough! It is beyond the powers of mankind to read this thing!” while MS PR will alternate reactively “Its wonderful and thorough! Long live openness” and “We can do it in our sleep!” And the ISO process will continue calmly on, disappointing the bullies and the racists and the cartel-izers and the sour-grapers and the parrots, and deliver a good initial version of the standard. I think many reasonable people who had reasonable concerns about DIS29500 will see that the process actually has allowed their concerns to be addressed, and will see through the hysteria for what it is.

Uche Ogbuji

AddThis Social Bookmark Button

Rarely do I review XML design without seeing something like:


<spam>
<link>http://example.com</link>
</spam>

Putting URLs in element content seems to come naturally to people, regardless of the age-old convention from HTML:


<p>
<a href="http://example.com"/>
</p>

I’ve always disliked this, as I prefer to have URLs and IDs in attributes. I used to think URLs in content was a manifestation of database-refugee XML, but I see it a lot even in carefully-crafted formats.

Rick Jelliffe

AddThis Social Bookmark Button

Here is my free advise to headline writers: please use “Maybe” for the countries that vote “No with comments” on DIS 29500 (Office Open XML).

Those are effectively the four major votes that can be given on an ISO standard by a national body. As always, the best place for disinformation on votes is headlines.

An vote by a national body of “No with comments” is a “Maybe”, and not an absolute “No”. Looking at it more, I wouldn’t now go as far as Job Bosak’s comment that “No with comments” is the same as “Conditional approval”, however. What really matters is the particular comments: if they are doable or reasonable and inline with goals of the standard and the proposer’s conception of the standard, (and if no-one’s hair is on fire) then No means Yes. But if the comments are undoable or unreasonable or out-of-scope for the standard’s goals or depart from what is acceptable to the proposer, the No means No.

As in “New Zealand says Maybe!”, “India says Maybe!”, “Japan says Maybe”, “China says Maybe”, “Brazil says Maybe”, and so on. Is is not so difficult is it? (Now even then there is scope for variation: “New Zealand says Maybe but probably not” or “Japan says Maybe, but probably” for example. But that would require actually research.)

And for journalists struggling to write the story well, here is another big tip: the votes are on particular drafts and the technical and editorial issues in them. So when there is a “No with comments” vote, that is a vote on the particular draft — a book in progress — not on the underlying technology. A careful writer will distinguish between DIS 29500 (the book being voted on) and Office Open XML (the technology.) Sometimes this distinction does not make a difference, but sometimes it really does, especially in the case of “No with comments” where you may be in favour of having a standard for the technology but want some improvements in the draft. In that situation, treating “No with comments” as the same as “No” misrepresents the process.

M. David Peterson

AddThis Social Bookmark Button

As per a comment I made to a post from Eric Larson to the internal Vibe* mailing list regarding the usage of Mercurial instead of Subversion for our RCS,

Of course maybe someone will come along and create a BitTorrent-based Darcs or Mercurial plug-in. Now *THAT* would be cool! :D

My point was in relation to the fact that with a decentralized RCS (which in most cases creates an exact copy of the repository with each checkout), as the size of the repository increases so does the cost of hosting that repository with each new checkout. But if a BitTorrent plugin were to suddenly surface?

Like I said, “Now *THAT* would be cool! :D”

Anybody care to become the *WORLDS BIGGEST ROCKSTAR CODER*? This would certainly be one way of becoming just that. :D

Uche Ogbuji

AddThis Social Bookmark Button

One reason I’m looking forward to Leopard is that unfortunately I’m a victim of the bug where my MacBook Pro 17″ occasionally reboots when I close the lid. Most of the time things are OK, but once a month or so I close the lid and I hear the “bong” chime of the computer restarting. When I open it back up (either right away or after a while) it starts back up as if I’d powered it on. Needless to say I lose any unsaved work, which has caused me to be even more annoyed at software that does no auto-save such as TextMate. It seems to happen in clusters, a few times in a few days, then fine again for another few weeks or so. Anyway here’s hoping Apple has a handle on this one either in Leopard, or in the hardware update to the MBP line that came out a couple of months ago. I’m provisionally happy enough with mine that I’m irrationally eyeing the 1920×1600 and 4GB RAM options in the latest (though the high res is apparently not available with the glossy screen. What’s up with that?).

Anyway, other references to the closed lid reboot bug:

* MacBook restarts when closing the lid
* MacBook Restarts when put to sleep

Update: s/Tiger/Leopard/g. Can’t keep the big cats straight.

Rick Jelliffe

AddThis Social Bookmark Button

By now XSD users are pretty aware of the severe limitations in the complex type derivation mechanisms provided by XML Schemas. Apart from the issue of whether they should be there at all, rather than being treated as a kind of validation issue as they are in RELAX NG, the problems are basically that “derivation by extension” only allows new elements at the end of the content model (”extension by suffixation”) so that I cannot extend <name><first>Rick<<last>Jelliffe</last></name> to be <name><first>Rick<<middle>Alan</middle><last>Jelliffe</last></name> using derivation by restriction (I need to change the base schema), and that I cannot use derivation by restriction to remove or optionalize an element that is required in the base (I need to change the base if I want to remove a required middle name for example.)

Now this is not to say that the definitions of complex type derivation by restriction and restriction are not logical. It is just that they are not useful or too strong in many important situations. The W3C XML Schemas Working Group has indeed worked on finding better definitions for them, but maintaining the core concept that a type derived by restriction is valid against the base type.

But I suggest that there may be other kinds of derivation which are useful. One that I would suggest might be called “Derivation by Implied Restriction”. This is where there are two complex types and neither is the base type for the other, but there is clearly some family resemblance. Rather than creating an explicit base type, I wonder whether it would be useful to ask a lesser question of them: could there be a base type created (automagically or notionally) against which both content models were valid by restriction and in which there was only a single particle for each duplicated particle the source content models? So the implied base type could be given a name that derived types but would not be specified (declared, defined) explicitly anywhere.

So if one content model said (first, last, gender?) and the second said (first, middle, second) the implied base content model would be (first, middle?, last, gender?). However if one content model said (first, middle, last) and the second said (first, last, middle) there would be no implied base type, because because (first, (middle, last) | (last | middle)) have duplicated particles. (I haven’t thought wildcards through.)

In other words, really the type is being derived from the instances, backwards, and if no derivation is possible then the instances are not related by an implied complex type.

I suspect this derivation type (and I am sure there are more) would reduce the complexity for XSD development from the users POV. Something more constrained than ALL but less constrained than current type derivation.

M. David Peterson

AddThis Social Bookmark Button

Signs on the Sand: Saxon, NET and XInclude

Saxon, famous XSLT 2.0 and XQuery processor, supports XInclude since version 8.9. But in Java version only! When I first heard about it I thought “I have good XInclude implementation for .NET in Mvp.Xml library, let’s check out if Saxon on .NET works with XInclude.NET”. I did some testing only to find out that they didn’t play well together.

Turned out Saxon (or JAXP port to .NET, don’t remember) relies on somewhat rarely used in .NET XmlReader.GetAttribute(int) method (yes, accessing attribute by index), and XIncludingReader had a bug in this method.

Finally I fixed it and so XIncludingReader from recently released Mvp.Xml library v2.3 works fine with Saxon on .NET.

More goodness at the above linked post. Thanks, Oleg!

Rick Jelliffe

AddThis Social Bookmark Button

When you see a data field with text like 2007-07-05 you are probably looking at a date in ISO 8601 date format. Year, month, day: YYYY-MM-DD

IS 8601 is in an international standard which gives several standard syntaxes for representing Gregorian dates and times. The full English title is ISO 8601:2004 Data elements and interchange formats — Information interchange — Representation of dates and times. It is only about 33 pages long; you can purchase it from your local standards body or from ISO, and as is common practice for ISO standards, there are final drafts available for free on the Internet. It is maintained by ISO TC 154 who have the Dr Who-ish name of Time Task Force.

Before IS 8601 there were multiple other standards for dates and times. For example, IS 2711 which allowed formats like 5 Jan 2000 and dates using ordinals. 2711 was withdrawn as an ISO standard in 1988, superceded by 8601. However, other nations and bodies have continued to use many of the other ex-standard formats, because it is convenient to have time written according to local conventions. The difficulty, you see, with IS 8601 is that they managed to get a nice unambiguous format for dates by adopting a format that no-one non-technical used: year, month, day. (I won’t be dealing with times in this blog.)

The second rub with IS 8601 is that it defines multiple formats. So as well as YYYY-MM-DD for years, such as 2000-01-01 you can also have the basic form 20000101. And the same date could also be represented as YYYY-DDD as 2000-001 with a basic form version of 2000001, where the DDD is an ordinal counting the number of days into the year. And it could be specified as a relative date too: you could specify it as a duration (from a base of, say, 1999-01-01) using the syntax PdddD where P means period and D means days: so P365D or even P12M where M is month. You can do the same with an explicit base and get the notations 19990101/P365D for example. The more exotic you get, the more chance that you have a need for what the standard calls “a mutual agreement” where the exchanging parties agree on what the notation means, because it could have several different meanings under the standard.

The third rub with IS 8601 is that it is based on the Gregorian calendar. This is unsurprising, in view of the dominance of the West and its ex-colonies in trade and standards adoption. However it imposes a conceptual, processing and formatting burden in places where other dates formats are used. It is not as uncommon as people think: I have lived in Taiwan and Japan, where non-Gregorian calendars are used for example. And obviously the Islamic calendar is in wide use.

XML Schema Standards and Dates

XML Schemas (XSD) is a Technical Recommendation made by an industry consortium W3C, which allows direct participation by representatives of fee-paying members. XML Schemas supports a wide range of ISO 8601-ish date formats. For most formats, there is no difference, and there is even an explicit appendix ISO 8601 Date and Time Formats which gives clear information.

XSD provides many different datatypes for dates, times and durations. However, it does not allow all ISO8601 syntaxes, and it does alter others. For example, ISO 8601 allow a year 0000. This is not allowed under the AC/DC calendar system, where you go straight from 1BC to 1AD. XSD’s date types disallow the year 0. Most importantly, the date notation used is the extended one with the minuses, so 2000-01-01 not 20000101. XSD allows you to “derive types” to restrict dates to certain values or ranges.

When we were discussing date formats in the W3C XML Schemas Working Group, I tried to get localized date formats allowed: I think it is the same principle as IRIs: it is good if a human can author directly (or generate directly) in the form or notation that they use to think about the data. However, my brilliant idea was rejected by the XML Schema Working Group (with the MS representative taking quite a strong stand that neutral/standard formats should be used, not localized ones) probably because I did not have a proof of concept. Since then, Jenni Tennison’s DTTL data typing language has come along, and is being adopted as part of the ISO DSDL multi-part standard. It is, I believe, exactly the right way to go: allow notations in the format that makes most sense to the stakeholders and application requirements, but provide a mapping to neutral/fixed-syntax formats.

In that sense, my personal belief is that ISO 8601 is a relic of a pre-markup and pre-schema mentality. That does not mean it is not valuable nor that it should not be maintained, nor indeed that it shouldn’t be the first port of call when looking at date formats. But it pushes localization to be an application consideration whereas I think it is just as legitimate and feasible to make it a markup/parsing (i.e. schema) level issue. This is not only because localized formats (rigorously described with an appropriate declarative schema language) make it easier for humans to read and write, but also because where the consumers and generators of data are computers and humans are relatively unimportant in the pipeline or critical path, then data field notations localized (again, rigorously described) for optimal computer performance is entirely appropriate and smart.

Office Document Formats

ODF and Open XML both use ISO 8601 dates in the YYYY-MM-DD form throughout for all dates. (The ODF spec uses US date MM/DD/YYY formatting in palces in its text, but don’t let that confuse you.)

ODF has quite a nice, basic and consistent approach to dates in spreadsheets: read and store them in a kind of ISO 8601 format but also allow a “null date” (such as 1899-12-31) to be specified to allow conversions of date into numbers. Spreadsheets very often actually store, manipulate or transfer dates as ordinal values from an index point: this makes calculations with dates very straightforwards. Representing dates as ordinals is also used in other ISO standards: for example, the SQL_DATE data type gives the number of days since January 1, 1841. (It gives this count as a simple integer.) See section 8.5.2 Calculation settings in ISO ODF for more information.

The draft specification for ISO Open XML, from Ecma, does have one oddity, which has attracted much controversy. In SpreadsheetML table cells only, dates are actually saved as durations, as ordinals. The base is set by an attribute on the workbook, and reflects the supported ranges of Excel on different platforms (on Windows, Excel does not support dates before 1900; on the Macintosh, Excel does not support dates before 1904; putting in such a date will be serialized out as 0 into SpreadsheetML.)

The reasons for saving as as duration rather than a date are obvious: it reflects the internal format directly, allows faster loading and save times to the XML, and allows faster loading and saving times when interfacing with an SQL system that uses SQL_DATE etc. The economic value of load/store times for Office documents is enormous, and it would be quite inappropriate to apply the criteria that one might use, say, for DOCBOOK documents, to standard office formats: I actually think that ODF gets it quite wrong here, and that best practice should dictate that optimized formats should be available. However, by the same token, I think that SpreadsheetML gets it wrong, and that it also should allow reading of data in ISO 8601 format as well as in its optimized notation.

The logical question that comes up is Should SpreadsheetML use ISO 8601 duration format rather than just raw ordinal integers. If the ISO 8601 standard notation was used, SpreadsheetML should use <v>P1D</v> to mark up the first day in the range, rather than <v>1</v>. However, the P and D are redundant, because the notation is clearly marked up by attributes (and documentation). This is the old issue of where the barriers should be between information in markup and information in embedded formats. I don’t see that <v>P1D</v> has any benefits over <v>1</v> frankly: it would seem to be an exercise in nominalism and pointless compliance.

<digression>The additional difficulty here is that we are let down by XSD here, again: XSD doesn’t allow the type of an element to be selected in part or whole by an attribute value on an ancestor, unlike ISO Schematron and ISO RELAX NG. XSD is completely deficient in support for these kinds of idioms, because the database mindset of its developers let them to conceive of attributes as merely funny kinds of elements rather than as metadata on an element, of the same importance and character as the element name. So XSD doesn’t allow attributes to select type; therefore Open XML would have to compromise its design, where elements are highly generic (i.e. data values in spreadsheets are in a “v” for value element) in order to allow values to be typed; however then Open XML could declare the value to be an xsd:duration which would then require the P1D notation. Another approach in XSD would be to use xsi:type where the v element is a union of integers, durations, string etc. However, then we would need to consider how to fit shared string references into the datatyping framework. Too much work! </digression>

The second reason why the ordinal values for dates in SpreadsheetML are controversial is because of an out-by-one adjustment that is needed for some functions for the first two months in 1900. To me, this is just a silly edge case: remembering that spreadsheets from Mac Excel don’t even get back to 1900, and on Windows they don’t go before 1900: it is hardly the wholesale subversion of the Gregorian calendar that you might suspect from various comments on the Web. ODF perhaps punts the issue, by allowing date indexes to start on 1899-12-31 or on 1900-01-01 (examples they give) and so they leave it up to the application developer or document generator to figure out which one is appropriate.

In my blog last month on Principles for reviewing standards, I took the position, which I think is the most reasonable one, that for embedded data fields the standard forms should be provided and optimized forms may be provided. From that POV, Open XML should also allow ISO 8601 durations and/or dates as well as the simple duration ordinal. And ODF should allow duration ordinals as a matter of best practice,

M. David Peterson

AddThis Social Bookmark Button

Update: In a follow-up comment, Dave Johnson provides us with our quote of the day,

If only all browsers had the same XSLT support as IE … and IE worked like other browsers in every other respect ;)

I’ll just let that one speak for itself ;-) :D

[Original Post]
Todd Ditchendorf’s Blog. XML, Cocoa, JavaScript, Java. � Blog Archive � Safari 3, JavaScript, and XSLT

Safari 3 for Windows and Tiger is truly awesome news.

Just a feature note: Safari 2 has always supported client-side XSLT. But Safari 3 includes and implementation of the Mozilla-style JavaScript XSLT API… so now you can programatically execute XSLT transforms on the client via JS in Safari. Great news.

SWWWEEEEEEEEEEEEETTTTTTTT!!!! :D :D :D

Let’s see, so that just leaves Opera left holding the “why is there no support for [fill in missing Client-side XSLT feature, in this case the document() function ;-)” bag**, but something tells me that within a reasonable distance of time, Glenn will *FINALLY* get to see the light of day. ;-) Poor guy must be getting antsy, huh?!

Hang in there, Glenn! There’s hope still yet, and as I alluded, I have an itchin’ suspitchin’ the company behind my most favorite browser on the planet is going to pull through for us.

** Though I wonder if Safari has migrated any of the EXSLT functionality from libxslt, in particular the node-set() function? Anyone know off hand? If no, then Opera still has one leg up on Safari. Of course they still have one leg down on Safari as well. ;-)

M. David Peterson

AddThis Social Bookmark Button

So here’s a little somethin’-somethin’ that more than quite a few folks, including myself, will enjoy..

XSLT: Riding the challenge: Transforming JSON

Ever wanted to access and manipulate JSON as ordinary XML? To transform it with XSLT?

No problem, use the f:json-document() and f:json-file-document() as provided by FXSL.

Here is a quick example:

Find more at the above linked post. Thanks, Dimitre!

M. David Peterson

AddThis Social Bookmark Button

// @author RussMiles.com - Home - Why Rails is not yet ready for SOA…

I am most definitely a Rails advocate. Not to the point of religious fanaticism, as you sometimes see around the Ruby on Rails camp, but definitely a massive fan. Rails, to me, is honestly a technology that makes it very easy to create great web applications and, funnily enough, services[3].

So why do I say that Rails is not ready yet to be great for SOA? Well, the key in my title is the word ‘yet’.

More goodness at the above linked title…

Thanks for the review, Russ!

Rick Jelliffe

AddThis Social Bookmark Button

Data structures people like to think of an XML document as superficially a rooted tree of the type called an Attribute Value Tree (AVT) and, when you add IDREFs, a kind of ordered, directed graph. This puts the emphasis on the element structure. But of course an XML document is more than that: it is also a tree of entities, a tree of notations, a tree of character set encodings, and so on. A relational person might see tables of atomic values split up and regrouped according to keys. An SGML or markup person would see it in terms of linear text which has had various range annotations to provide metadata; the element ranges being synchronous (i.e. no overlap) also means that they can be viewed as a tree, however there is no reason why a subrange actually relates in any semantic way to a containing range: the element is a property of the text not the other way around.

There are particular, admittedly niche, areas where the synchronous restriction galls. So there have been various systems for concurrent markup proposed. Many of these go outside the meager resources that XML allows back towards the parsing power of SGML, and some even extend SGML. I was looking over Michael McQueen’s Rabbit/Duck grammars which deals with validating concurrent structures: I wondered about how Schematron could be used.

Lets take the most common case of overlapping markup: bold and italic because it is easy to visualize: we want:

THE GRAPHS OF WRATH

where a brave but naive soul would mark this up as

THE <i>GRAPHS <b>OF </i>WRATH</b>

but the XML markup has to be

THE <i>GRAPHS <b>OF </b></i><b>WRATH</b>

Lets make a constraint that there can only be one “phrase” of bold in our text. An odd constraints, but it relates to grouping arbitrary elements together. First, lets use markup to indicate connection, with an IDREFattribute called join.

THE <i>GRAPHS <b id="b1">OF </b></i><b join="b1">WRATH</b>

Note that we now have represented the concurrent structures, but @join does not require that the sections be contiguous: interrupted or dispersed sections are possible too! Now for the Schematron schema

<rule context=" $context ">
                   <assert test="count(b[not(@join)]=1) and count(b[@join][@join!=../b[@id]/@id)=0 )">
                   There should be exactly one phrase of bold. This should be marked up with one or more
                   b elements, but one of those b elements has an id and the other have join attributes.
                  </assert>
</rule>

The same approach can be extended for different occurrence and position constraints.

Michael Day

AddThis Social Bookmark Button

It is a truth universally acknowledged that “DTDs don’t support namespaces”. Or to be a little more pedantic, that DTDs don’t support namespaces in their full generality. However, one might as well say that XML 1.0 does not support namespaces. Given that the specification of Namespaces in XML augments XML 1.0, it seems more reasonable to ask why don’t namespaces support DTDs?

Rick Jelliffe

AddThis Social Bookmark Button

Different applications on different systems use different fonts, fonts with the same family name but different metrics, different hyphenation algorithms, hyphenation setting defaults, hyphenation dictionaries, different size spaces, different line-breaking algorithms, different widow/orphan/keeptogether rules, and different co-ordinate space measures.

This means that even if a document is saved as XML which completely captures all the page and style settings and so on, and even if the receiving system has the same generic fonts and even all the same compliment of “muffin borders” and other art are available on the receiver and sender systems, a document moved from vendor’s A application on Platform A cannot be expected to open up with line-for-line or (for multiple page documents) even page-for-page fidelity on vendor B’s applications or even on vendor A’s application on platform B. Even with good matching, a word here or there will break or hyphenate differently, a line will break differently over a page, and so on.

This is particularly noticeable on short measures: particular table cells. Unless all the cells in the table are each wider than their content, with no multi-line cells, there is every chance that lines may break differently.

Note that this will happen regardless of whether you are using ODF or Open XML: it is not the limits of the XML representation as much as that applications have different code inside them. If you want exact fidelity, the current state-of-the-art is you have to pretty much use the same application (and platform) to open the document that it was created in.

What can you do to minimize this?

Well, for a start you need to set your expectations appropriately: an HTML page looks different on every different browser and OS and depending on the window size too…do you really need exact line and page fidelity? The HTML experience is strongly that it is better to have presentation-independent design, allowing flexibility, in order to get the benefit of re-target-ability.

Strategies for coping with these issues have dominated SGML/looseleaf publishing systems: it is not simple. One thing is to ween yourself of page and line dependencies: use section numbers to refer to things, and IDs, not page numbers. Never hard-code page numbers or line numbers, but use references and variables.

In your typesetting specs, make your widow/orphan control move paragraphs over the page readily (if you expect there will be additions) so that there is plenty of whitespace at the bottom of the previous page, so that typing a few extra words here and there will not cause repagination. If you do this well, then it also makes using the ocassional forced page breaks more workable.

There are mixed strategies: send PDF as well as the XML document, and use the PDF as much as possible, until there needs to be editing. Or send HTML as well to discourage page-centricism.

Another strategy is to clearly separate out those pages that must not break, and treat them as artwork, included from external documents. Pages contain examples of forms in particular are better handled as graphics, when included in general documents.

And there are procedures to take as well. For example, if someone sends you a document and you open it in application B, first go through all the tables and resize the text so that it breaks the same as the PDF. Of course, this relies on your document using styles: but if you don’t use styles you are probably messed up anyway (because there are many ways to do the same thing, and they may result in different results: for example, on some systems a bold space is bigger than a plain roman space!).

I remember that Word Perfect had a (patented) feature where it would adjust fonts size and table borders for optimal layout. This is exactly the kind of thing that would be needed if we want to get better guaranteed fidelity at the line and page level between applications.

Is infidelity ever forgivable?

So remember, there are three kinds of fidelity: fidelity because the document has all the information used by the producing and receiving applications, fidelity because the applications have the same resources available to them, and fidelity because the producing and receiving applications have the same algorithms and defaults. When looking at the various claims (Len Bullard mentions Spy versus Spy) made by MS on Open XML and” fidelity”, and ODF people on “interoperability” we need to interpret them in the hard light of the Dirty Little Secret.

Governments and procurement projects need to be quite clear that whenever they insist on page fidelity, they are probably in fact locking themselves into one vendor’s tools, in which case it becomes a debate on features, quality, price, training, etc. In a limited sense, everything *except* interchangeability.

M. David Peterson

AddThis Social Bookmark Button

I accidentally blew up the wrong EC2 instance. That same EC2 instance had, amongst other things, Planet XSLTransformations on it. I forgot to set a cron job for S3Sync to back-up that particular directory.

Damn.

Fortunately Google cached the FOAF feed for the site. As such, I created a quick-and-dirty FOAF<2>PlanetPlanet initialization file. Maybe you can make use of it as well? Don’t know, but just in case, here it is…

FOAF2PlanetPlanetInitializationFile

A FOAF to PlanetPlanet Conversion Utility

Introduction

This module will convert a FOAF file to a PlanetPlanet? initilization file. This is an XSLT 2.0-based solution.

Details
Repository Location: http://xslt.googlecode.com/svn/trunk/Modules/WebFeed/FOAF-to-Planet.ini/

Inventory

init.xml
FOAF2Planet.xsl
Any number of FOAF files.

Enjoy!

Rick Jelliffe

AddThis Social Bookmark Button

Ecma 376 Office Open XML’ DrawingML uses an odd measure called the EMU: short for English Metric Unit. There are 36000 EMUs per cm, 91440 EMUs per inch.

The reason for this may become clearer if I note that, using the Adobe “big point” of 72 points per inch (rather than the old 72.72), there is 1270 emu per point. Err, maybe not…

What about this then: 36000 and 91440 are divisible by 2,3,4,5,6, and multiples?

Still no idea? Well, representing numbers in computers is frought with errors every time you have to have anything that requires fractions, or with multiplication or division by numbers that are not 2^n. That even can includes multiplying by 0.5. Computer scientists spent a lot of their early time investigating various techniques to overcome these problems: in a branch of mathematics (or is it engineering?) called numerical methods.

These errors are small by themselves, but when you have, for example, long sequences of calculations such as graphics object where one segment is positioned using the result of the last segment, the accumulated error can increase. In publishing, misalignent can have a serious effect when there is some kind of multi-color printing: you can get registration errors.

One way to circumvent the problem is to move to integer (whole number) arithmetic: you find some convenient small measure that can be multiplied so that you don’t need to use floating-point numbers. When you do divide, you throw away the remainder, because it is below the precision you are supporting; but because the data frequently is aligned to grid positions (1/2 inch, etc) there will be no loss of precision from data capture (what the user sees) to the internal representation. Now armed with this perspective, lets imagine a set of criteria for a typesetting system or vector graphics system:

* use a small unit to allow implementation in integer arithmetic
* this unit should allow allow exact whole divisions (no remainder) of the common measures of modern English-speaking countries’ typesetting: the cm, the inch, and the point. So a half inch, 10.5 points, or a third of a CM are all exact (within the bounds of the system)
* the unit should be small enough to allow non-”English” measurements with, say, 0.01% precision (or do I mean inaccuracy?): the continental diderot or the Japanese Q system for example

If you take these kinds of criteria, and work through the numbers you get something like EMU. They are used by Ecma 376’s DrawingML for ‘high precision co-ordinates” in certain places. The rest of the time, people can use locale-dependent measures.

So if EMU is a reasonable technical approach, is it a reasonable measure to appear in a standard? To my mind, this falls in exactly the same bucket as SpreadsheetMLs use of numeric indexes, though there are accuracy issues as well as performance issues. I think it comes down to the purpose of the standard: when the purpose of the standard is too allow high-quality typesetting and graphics and to reflect the triggering application, I think the exact numbers such as EMU may win. However, when the purpose is to allow data interchange and human/read and writability, then using SI and locale-dependent measurements will win.

The EMU issue is also a interesting one from a standardization viewpoint: there is a kind of premise that supporting a standard (obviously the specific application-independent alternative is SVG-in-ODF in this case, but this applies to systems supporting Open XML too) involves adding functionality or adjusting superficial details (names of elements and attributes, use of property elements rather than attributes, and so on): this is, I think, the view that underlies Tim Bray’s comment (from memory) “how many ways do we need to say some text is bold or italic”? However, there are other changes that go to implementation: converting to and from SVG (as it is) presumably entails foregoing give up exact import and export of data in the “high precision coordinate” system. The difference would be minimal, a rare pixel here or there, I’d expect.

Like the data indexes, I don’t particuarly know why Open XML couldn’t support both the common notations as well as the optimized one. Best of both worlds. But EMUs are a rational solution to a particular set of design criteria, it seems to me: and the name English Metric Units that has caused alarm seems less alarming when understood as just a descriptive name and not a reference to something external.

AddThis Social Bookmark Button

It’s been an XSL kind of weekend for me, thus far. First, an associate from the AOL Developer Community pointed me to the “Ficlets enhanced author feed, an XSL scraper hack” post at the 0xDECAFBAD blog. Then, in the May issue of Dr. Dobb’s Journal, I saw the article “XSL Transformations: A delivery medium for executable content over the Internet”.

My interest in XSL has been on the increase, after several years of lull — driven mostly by the fact that I was too busy with work, and none of the work required XSL. Then M. David Peterson’s “Solving FizzBuzz in XSLT 1.0″, along with the talk about XSLT 2.0, reawakened my interest.

FizzBuzz in XIM?

What I really wanted to accomplish, and hence be able to write about, was that I had created a XML “program” written in the Minimal Imperative Language XIM (see the Dr. Dobb’s article) that would perform FizzBuzz.

Alas, it is not to be — not this weekend anyway. XIM looked straightforward, starting with the example program from the article:

<?xml version="1.0"?>
<program>
  <vars>
    <var_declare name="fact"> 1 </var_declare>
    <var_declare name="last"> 0 </var_declare>
    <var_declare name="nb"> 5 </var_declare>
  </vars>
  <main>
    <assign varn="last">
      <var_use name="nb"/>
    </assign>
    <while>
      <condition>
        <boolop opname="gt">
          <var_use name="last"/>
          <num> 1</num>
        </boolop>
      </condition>
      <statement_list>
        <assign varn="fact">
          <op opname="*">
            <var_use name="fact"/>
            <var_use name="last"/>
          </op>
        </assign>
        <assign varn="last">
          <op opname="-">
            <var_use name="last"/>
            <num> 1</num>
          </op>
        </assign>
      </statement_list>
    </while>
    <end/> <!-- program termination -->
  </main>
</program>

But my attempts to convert this into something that could print a variable each time the loop is executed did not succeed. And studying the 779-line XSL file that performs the processing implied that I’d have to change that too, in order to print variables. [Note: You can get the XSL and the sample XML in the May 2007 source code zip file: 0705.zip]

It’s an interesting project. But I couldn’t accomplish it yet.

On to looking at what Les Orchard has come up with his Ficlets enhanced author feed — that XSL is a mere 93 lines long, a size that maybe I can digest before the weekend ends…

Rick Jelliffe

AddThis Social Bookmark Button

Regular grammars, as used by W3C XML Schemas (XSD), are very good for representing some kinds of patterns in documents. XPaths, as used by ISO Schematron, is very good for locating and testing many other kinds of patterns. One of the reasons that the XML Schemas specification is so difficult is that, after the “XSD schema for XSD schemas” has been taken into consideration, there are still many more constraints left over; and these have to be written up in natural language. The result is that developers and implementers of XSD are left without a standard executable validator.

Here is how you would do it in Schematron. In fact, you could typeset the schema into a useable specification, because it allows rich text and various kinds of linking. Download file

The schema only has a couple of constraint sets, from the several dozen required, but it shows the kinds of thing that Schematron is good for. If you are making up a schema for a public specification, for example and industry standard, and you are finding you have more than a handful of constraints that cannot be expressed in W3C XML Schemas, consider formalizing them in an ISO Schematron schema. The XPaths not only clarify the meaning of the natural language, but they also allow validation with a Schematron validator (which is usually built from a couple of XSLT scripts.)

Rick Jelliffe

AddThis Social Bookmark Button

Schematron uptake is on the increase, and the betas implementation of ISO Schematron is chugging away. The relevant working group at ISO (ISO/IEC JTC1 SC34 WG1) has asked me to look into preparing an update for the standard; most of the other ISO DSDL family of schema languages have just been through a round of corrections based on initial experience, and I want to prepare something by the end of May.

There won’t be any changes that would break existing ISO Schematron schemas. And I don’t think there would be any extra logical apparatus or changes to the class of logic required; and certainly nothing that would prevent implementation in XSLT 1 by default.

I am interested in gathering a wish list, especially things where you have extended Schematron (i.e. your requirement was strong enough that you actually wrote some code!) Please let me know.

Michael Day

AddThis Social Bookmark Button

[Update: see below]. A few years ago, Eric van der Vlist put together a proof of concept XML schema language called Examplotron. The clever part of Examplotron is that the schema for a given XML document is that document itself; a document is its own schema. This allows schemas to be designed by writing down example documents (examplotron, get it?) which can then be generalised automatically to produce a RELAX NG schema for those documents and other documents like them. Clever. Now, what if XPath worked like that?

M. David Peterson

AddThis Social Bookmark Button

In regards to the title, as anyone who writes code for a living understands, things don’t always turn out the way you plan. ‘nuf said. ;-)

So this is really more of a status update in regards to the ongoing Atom Publishing Protocol theme that has pretty much dominated much of my focus, both in blogging as well as code, for the last couple of weeks months. That said, there is one open source project that has really caught my eye as of late that I want to bring to your attention, and will do just that in wrap up at the end of this post.

As alluded to above, I’ve been heads down for the last who knows how many days/weeks/now months (I don’t keep track <- obviously! ;-) with a fairly APP-focused frame of mind. Though it wouldn’t and won’t be obvious what I mean by this until it is (<- and thats pretty much all I can say on the matter (for the moment, anyway)), my professional development focus is becoming increasingly honed into finishing some fine-tuned details of several Vibe*-related projects, one of which we nearly launched recently, then decided to hold back to place attention on some detail work, much of which is directly related to the writing I’ve been doing on nuXleus:Xameleon (Amplee, AtomicXML, LiveClipboard, ModuleT, nuXleus, and so forth). I’m excited by all of this in ways that I can not describe. I hope when you see the result, you too will feel the same level of excitement. More on that when I am able.

In the mean time,

M. David Peterson

AddThis Social Bookmark Button

Find out **.

M. David Peterson

AddThis Social Bookmark Button

I recently wrote an entry to my Dev.AOL blog entitled “Solving FizzBuzz in XSLT 1.0” built upon the premise that in the real world, data changes, and it’s because it changes that instead of thinking of how to solve problems using static data variables, we should instead be trying to solve them in ways that are much more adaptable, and therefore, reusable.

In other words, if your desire is to find someone who truly understands how to write code that solves real-world problems, then use real-world scenarios: Data changes > You’re code shouldn’t have to change with it.

Of course one could argue “code? data? what’s the difference?”, and of course, I would only be able to agree. But none-the-less, there’s still a need to write the initial processing code that will then process and transform the data code into whatever it is it needs to be transformed into, and it’s on this premise I present the following as a picture perfect example of what truly Beautiful Code looks like.

After writing the initial “in XSLT 1.0″ post linked above, I made a post to XSL-List with the following request,

Michael Day

AddThis Social Bookmark Button

Okay, I’ve heard jokes about people parsing XML files backwards, starting at the end of the file and reporting SAX events in reverse document order, but it seems that someone has actually gone and done it. The justification sounds almost plausible: an instant messaging client (Adium on the Mac) that writes out XML message log files and uses backwards parsing as a method for retrieving the last N messages in constant time, regardless of how many messages the file contains in total. However, it’s crazy to think of doing this for XML in general.

Michael Day

AddThis Social Bookmark Button

Following on from Kurt’s detailed reevaluation of XSLT 2.0, I thought that I might share an example of what you can do in XSLT 1.0 with the assistance of EXSLT, a useful set of extension functions that are supported by most XSLT implementations.

Michael Day

AddThis Social Bookmark Button

Hello, my name is Michael Day and I’m here to blog about XML, CSS, web standards, declarative programming, UNICODE and other topics of interest to XML.com readers. Since a lengthy biography of me is not one of these topics, I shall limit myself to one sentence: I am the founder of YesLogic and the designer of Prince, an XML + CSS formatter and a great way of getting web content onto paper.

Now that we’ve got that out of the way I would like to get straight into talking about XML parsing and UNICODE encodings. In Prince we use libxml2 for all of our XML and HTML parsing needs, and have been very happy with it. However, it’s always interesting to see new approaches for XML parsing that may offer greater speed or convenience than existing methods.

M. David Peterson

AddThis Social Bookmark Button

Holy Hannah, is it Friday already!? Actually, I think technically it’s Saturday where I’m located @ in the shadow of the Wasatch Mountains, but the O’Reilly servers are in the Pacific Time Zone, so when I hit post, you won’t realize that I actually missed my self imposed Friday night deadline > Of course, now you do.

I need to learn how to keep my mouth shut.

So here’s the deal: I am holding out what I had planned for today (which is a *BIG PHAT* demo of the wonderful new ccHost 4.0 release) because it exposes one of the underlying sub-projects of the the Viberavetions Project, and I haven’t had a chance to discuss with the BizDev folks whether or not they would have a heart attack if I were to somewhat launch one these aforementioned sub-projects without their blessing.

My guess is “YES!”.

Therefore we wait… ;-)

That said: Holy Hot Flash! < What's Google got cooking over there in the land of Google Code?

Peter Fisk speculates,

M. David Peterson

AddThis Social Bookmark Button

Where is XML Going? - O’Reilly XML Blog

“I do not think that JSON is going to “replace” XML; what I do see though is perhaps the dawning realization that the XML Infoset does not in fact have to be represented in angle-bracket notation”. I very strongly agree with that. ‘XML’ will come to mean the Infoset (or the XQuery data model, or tree views and XPath-like axes over object graphs) more than the bits on the wire format. That liberates XML tools to support JSON, various binary XML formats, HTML tag soup, etc. without insisting that everyone play by the XML syntax rules.


Michael Champion
| February 28, 2007 07:59 PM

So I’ll just come out and say it: XML doesn’t matter!

Okay, yes it does. But not in the same way people seem to thing it does. In this regard, I agree 100% with what both Kurt and Mike (Mike’s comment stems from Kurt’s recent above-linked post) have to say on the matter.

In a follow-up comment a while back, I posed the following question,

M. David Peterson

AddThis Social Bookmark Button

So much to talk about, so little time, but none-the-less, let’s get this party started ;)

Amplee, IronPython, ASP.NET, WSGI, AtomicXML, and Xameleon Update

[Amplee@SWiK.net]

So both Sylvain and I have been jamming away at the integration of Amplee, IronPython, ASP.NET, WSGI, AtomicXML, and Xameleon. Attempting to merge together such a cross-section of various technologies, as you can imagine, has been interesting. None-the-less, we have things working pretty well at this stage, and have in the works an update to last weeks OSS XML Weekly Roundup, in which I will be providing all of the juicy new details in regards to progress. That said, if you would like to start peaking through the curtains to see what we have in store, please feel free: http://extf.googlecode.com/svn/branches/

Sylvain has already finished the first tutorial as it relates to getting Amplee running via IronPython and WSGI, and when he comes back online here in a few hours, we plan to continue forward fine tuning the API we are collaborative working on to integrate AtomicXML with Amplee via the Xameleon XML processing engine. And has he has pointed out at the bottom of the above linked post,

M. David Peterson

AddThis Social Bookmark Button

Update: So much thanks goes out to james (and of course, Ben for getting the party started with his first submission for a Unix 1 line command equivalent to my 60 lines of XSLT.), who has not only helped optimize the optimization process, but has also solved the bug that was excluding some of the necessary dependencies for everything to work properly.

I’ve checked the results into the repository, which can be viewed @ http://nuxleus.com/dev/browser/build (see Changeset 3899 for the specific diff details)

Thanks for all of your help, james!

M. David Peterson

AddThis Social Bookmark Button

So last weeks “Open Source XML” post was a complete flop — well, that’s not true, as it forced me to think about and work on another project of which the code is now in good enough shape to be considered usable, and as the next few days continue forward and I am able to finish up a few higher priority items, I plan to spend some more time on this project as to connect it back into the AtomicXML-foundation (which is where I pulled the initial code base to then extend from) that I have been sporadically working on for the last three years, to then begin integration of the AspectXML code base, pulling things all together into the Xameleon processing engine.

I’ve moved a good portion of the code base from ExtensibleForge.net into a Google Code-based project @ http://code.google.com/p/extf/. This now includes the Extf.Net.Base library (which is an early implementation of the Atom Publishing Protocol > Sylvain and I, using Amplee as our API-guide, are working on bringing this same library in sync with the latest (final? seems possible) v0.13 draft of the APP spec.  However, with higher priorities at the moment, it will be next week before much progress is made on this particular section of the repository.

AddThis Social Bookmark Button

In a recent blog entry, “The Hardware of Tomorrow Versus the Platform of Tomorrow”, Joe Walker raised some important Ajax issues. He talks about the increasing multiprocessing capabilities of today’s hardware, and web browsers’ inability to take full advantage of it.

He mixes two separate issues, talking about the lack of threading/concurrency support in the Javascript language, and also the lack of multi-threading within web browsers.
It’s this second issue that I want to explore here, particularly this comment:

“The problem is that web-browsers are a step backwards as far as multi-threading goes. In Javascript there is no such thing as a new thread, and worse than that, the entire platform (i.e. a browser) runs a single JavaScript thread. If a script in one window goes into a tight loop, or runs some synchronous Ajax then the browser HTML display freezes.”

Hmmm. That last sentence started gnawing at me. I’ll leave the “tight loop” problem for another day, but is it true that a synchronous XMLHttpRequest call will “freeze” the browser?

M. David Peterson

AddThis Social Bookmark Button

Update: Ric (who links to JSON.com, which seems to be a new blog about JSON-related items… SUBSCRIBED!) provides a nice summary in a comment below,

XML is
1) DOCUMENT centric
2) Well known with lots of tools
3) FAT (SOAP is)

JSON is
1) SERIALIZATION of a structure
2) Less known and not so many tools
3) Thin (No client side libraries needed)

JSON is also a form of remote scripting (no XmlHttp request)

Xml is much more mature: quite a LOT of thought on namespaces, unicode, schemas, external file inclusions, and binary attachments (I am working on Base64 encoding in JSON, so that should be not an issue.)

Sounds about right to me. Thanks, Ric!

[Original Post]
… and then it hit me…

ongoing � JSON and XML

The Arguments Are Over � There used to be an argument about whether platform-neutral, language-neutral data formats were important, or whether distributed objects were the right answer. That’s over: HTML, XML, JSON. �

There used to be people who argued that network interchange formats shouldn’t be text-based, but use binary where possible, for efficiency. That’s over: HTML, XML, JSON.

When SOAP is overkill, is JSON the lightweight answer? Seems to me it’s at very least a possibility, and if I were an Anti-SOAP kind of guy (which I’m not, by-the-way… SOAP has its place. If you need it, use it, if you don’t, don’t.) the “war” I would be waging would not be JSON vs. XML, and instead JSON vs. SOAP.

Think of it this way,

JSON: Java Script Object Notation

SOAP: Simple Object Access Protocol

Now before I get myself in any trouble, I’m not advocating that a war be waged against SOAP (please see note from above regarding my feelings on this topic, as well as this three part article on my personal blog for more info.) What I am suggesting, however, is that if you are going to wage a war of any type, shouldn’t you at very least be comparing apples to apples?

Just food for thought…

M. David Peterson

AddThis Social Bookmark Button

Update: An interesting article linked to from “cognitively cognate” sets up stage with the following scenario,

Aniruddh Patel of the Neurosciences Institute in San Diego, California, US, and colleagues wanted to know how people from different cultures group non-identical sounds. They recruited a group of 100 volunteers, half of whom were American and the other half Japanese. The volunteers listened to sequences of alternating long and short or loud and soft tones (audio clips in wav format).

The result? Please click-through to find out, though I will point out the fact that with all of what follows, opinions and results may vary. What I might perceive as one thing, someone else might perceive the “same thing” as something completely different.

Of course, I didn’t even touch on percent encoded UR{I|L}’s, but then again there are no doubt those people who simply do not care or do not agree or do not [something all together different], so for what it’s worth, there ya have it :)

Thanks for the comment/link, cognitively! :)

[Original Post]
NOTE: No red, blue, or any other pill colors will be mentioned from this point forward. In other words, this isn’t that kind of Matrix post. ;)

Last April, Mark Nottingham made a post to the web-http discussion list regarding Matrix URI’s…

It would be great if WADL and other Web description formats could
accommodate Matrix URIs;

http://www.w3.org/DesignIssues/MatrixURIs.html

so that the parameters of the Matrix URI would be handled in a
fashion similar to the way query parameters are handled.

We’re starting to use them pretty extensively.

Until that point I had forgotten about Matrix URI’s, but as soon as I was reminded, I realized…

Why have we not been using Matrix URI’s all along???!!!

Matrix URIs - Ideas about Web Architecture

Dan Zambonini

AddThis Social Bookmark Button

There are numerous misconceptions about the Semantic Web, largely caused by a misunderstanding of its aims and technologies. I’ve created this simple FAQ help dispel some of the myths.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve updated this graphic to include XSD 1.1, ISO DSDL part 9, TEI Feature Sets, CSD and WSDL which has a kind of content model capability.

schema-family-tree-thumb2.jpg

Contact me for the SVGs or PDFs for printing.

Rick Jelliffe

AddThis Social Bookmark Button

The new draft of XProc is out and has fewer spangles. Here’s a post I sent to their suggestion box.

M. David Peterson

AddThis Social Bookmark Button

Designer Widget Properties � Microsoft .Net and Smalltalk

The Designer widget is now updated with z-ordering working and a new panel for changing several common properties.


Designer running in IE7 with property panel opened

The next updates will include exporting and importing of Xaml to permit Vst sessions to share interfaces through Jabber instant messaging.

SWEET!

NOTE: Okay, so technically speaking, anything that can be wrapped inside of an XML envelope can be passed as a message via Jabber/XMPP, so I guess this kind of *IS* your fathers Jabber/XMPP — But did your father ever do this?

Combine XAML, WCF and Jabber/XMPP and what you have is a *WHOLE NEW WORLD* of XML goodness streaming your direction.

Of course, XML as an *OVER THE WIRE CROSS-PLATFORM DATA FORMAT* is nothing new. In fact, it’s what XML is all about. Add VST and WPF/e to the mix and –

Well, I’ll let you decide for yourself what you believe the next generation of Web-based applications will be built with, but if you want my opinion: Please see the above mentioned acronyms.

Thanks for the sneak-peek into the future here and now, Peter!

Kurt Cagle

AddThis Social Bookmark Button

I’ve been giving a lot of thought lately to JSON and JavaScript in particular. For those in the XML community not familiar with it, JSON was largely the invention of Douglas Crockford, a JavaScript expert who currently works as an architect for Yahoo! and who I had the privilege of meeting at the recent AJAXWorld conference.

The idea behind JSON is surprisingly simple, and like many simple ideas, is also remarkably powerful. Dissatisfied with the complexity involved with using XML as a data format for seemingly lightweight tasks, Crockford asked one of those questions that causes all kinds of interesting repercussions: “Why couldn’t we use the “native” data format of the JavaScript, the associative array, as a vehicle for transferring content between client and server, instead of XML?”

M. David Peterson

AddThis Social Bookmark Button

Sylvain just pinged me to say he had left a comment on my recent “IRC’aholic” post. I refreshed the page expecting to see his comment, but no luck.

I checked my email to see if the system had marked it as “must be approved”.

Nothing.

An investigation into the comment section for that entry via the MT admin interface showed no sign of the comment. It wasn’t until I accessed the master comment section and clicked the “Junk” tab that I was able to locate what he was referring to. In reading his post, there is nothing strange about it that would lead me to believe the system would justifiably assume this to be junk, so I’m a bit bewildered by what might be going on.

I have noticed what I would term a *SIGNIFICANT* drop off of comments over the last week. Like basically *ZERO* as opposed to what is usually a fair level of activity. With Sylvain’s comment being labeled as “Junk” by the system this left me wondering how many other valid comments had been labeled junk and began to click through the listing of junk comments to see what I could find.

The problem? 10 pages worth of comments and I’m only about 6 or 7 hours back into the junk comment history, a history which contains about 24,000+ comments. Obviously, this could take a while.

Before I spend too much time rummaging through the “Junk” pile looking for lost treasure, has anybody else been noticing the same problem? Have you left a comment that provided an “Awaiting Moderator Approval” screen, never to see your comment approved? If yes, I would suggest leaving a comment to let me know such that the folks @ ORA can gain a better feel for what might be going on, but I’m not sure the message will get through.

As such, if you have noticed this problem, please let me know in private email such that I can provide this information directly to the folks @ ORA such that they can avoid having to dig through 24,000+ comments themselves looking for clues.

Thanks in advance for your help!

NOTE: If you could provide a searchable field such as the URL or email address you use this should make the task of locating valid comments quite a bit easier. Thanks!

Quick Update: I should note that this shouldn’t be seen as a problem from the ORA side. Sarah Kim came to the rescue a few weeks back, implementing the Akismet comment spam filter plugin, which it seems probably just needs a bit of tweaking to the settings to find the right balance. This kind of thing is normal, expected behavior, *ESPECIALLY* when you are dealing with a variable that can be seen as “typeless” from the standpoint that the comment spammers are always trying new ways to get around the system, so attempting to determine what is legitimate and what is junk is a constant battle between good and evil.

Duck-typing, generics, and to a lesser degree, reflection and constructor overloads are fine when you have a finite set of data types you are dealing with. But when these same data types are changing on an hourly basis, attempting to make a determination in regards to what is what is a task that makes me ill just thinking about. With this in mind, any help you might be willing/able to provide will be of great value, without a doubt.

M. David Peterson

AddThis Social Bookmark Button

Four little letters…

LLUP

Life after email

As a social phenomenon, the end of email has been widely reported. The next generation doesn’t use it. As a technical phenomenon, spam is a persistent threat. Spam’s been a lot worse in the last couple of weeks (no doubt the reason I started thinking about these things); apparently the spammers have concocted a strategy that circumvents Bayesian filtering (it’s only temporary, I’m sure, but the next victory in spam filtering is only temporary too). �

I’ve noticed the same phenomenon. It’s getting really, *REALLY*, bad!

What’s next? IM, Wikis, web forums instead of email? Bleh!

Agreed!

Maybe I’m just too old to learn new tricks, but I want correspondence pushed to me (or I want the appearance of push, anyway) and I want to read and edit it locally, in the application of my choosing, not in some browser form

Agreed. Too much effort. The solution must be seamless, and work with the tools we already use for email-esque communication. In fact, the solution has to be developed in such a way that those with an established position in the email client/server market(s) can quickly, easily, and as mentioned (and is really the key, in my own opinion) seamlessly integrate with these tools such that the “switch” from the existing technologies (e.g. SMTP, POP, IMAP, proprietary protocols such as those used in Exchange for advanced workgroup/corporate communication/collaboration, etc..) may not even require a switch at all (i.e. a driver that allows each of these technologies to easily interop with any of the new required protocols), and if it does, will be as transparent as possible to the customer/employee, etc… who will be using it.

It occurs to me that with a little work, Atom might function as a replacement for POP/IMAP and the Atom publishing protocol might replace SMTP. I can see a glimmer of how I might move forward while mostly preserving a couple of decades of work habits. As usual, the social problems are larger than the technical ones

Yep, completely agree! Through the work I have been doing with LLUP, I have come to my own conclusions that there are a few additional off-the-shelf pieces necessary to complete the puzzle, but without a doubt, Atom and APP are the key behind all of this.

In fact, this was a point I brought out to Eve (Maler) a while back when Russ and I first spoke with her about LLUP. There have been a few people along the way who have insisted that “you guys are taking too long to finish this up” or “if this really was so simple, why not just finish it out and be done with it” to which the answer, as mentioned to Eve, is pretty straightforward,

AddThis Social Bookmark Button

Years ago, I worked in a small group that occasionally needed technical information from a second group. I was young and naive, and believed that if a person said “yes, the proper setting for the frobnitz is 34.128″ then setting the frobnitz to 34.128 was the Right Thing To Do.

I quickly found out that some members of that group were more than happy to supply you with an answer to any question you asked, regardless of whether they knew the answer or not. And those guys would speak as authoritatively when spinning a tale from whole cloth as they would when providing information that actually intersected with reality.

The words “I don’t know” were not in their vocabulary.

Beyond a mistrust for anything these guys said, I also learned the importance of those three words. There’s no crime in not knowing the answer to every question. Since those days, I’ve tried to be clear with people when I didn’t know something. I’ve had many opportunities for clarity.

M. David Peterson

AddThis Social Bookmark Button

I was in Seattle all week and as such, am just getting caught up on things outside the scope of my trip.

I did have a chance to chat with a somewhat excited Sylvain on Tuesday morning who had been chatting with Michael Sparks (BBC Kamaelia project creator) on IRC about various aspects of the Kamaelia project and how this relates to various projects we are both working on, that of Viberavetions and LLUP/Blip Messaging. He specifically pointed at a recent post from Michael that brought together a lot of problems we have been specifically trying to solve with LLUP. When we spoke, I was just getting ready to head out to meet up with Dimitre Novatchev for lunch, but by chance, Russ pinged me around the same time, and as such, picked up the conversation with Sylvain where I left off.

Russ recently provided the following in an entry entitled,

M. David Peterson

AddThis Social Bookmark Button

ongoing � Practical Transparency

Need For Speed � Even a nice clean well-known feed doesn’t quite solve the whole problem. Your typical feed-reader is set up to poll every half-hour or even less often, and there are those in the financial community who are not going to be happy with a potential half-hour’s latency in getting the news. �

I can think of one simple brute-force way to approach the problem, and another that’s a little more sophisticated. The simple solution is, assume that everyone who really cares will want to poll that material-news feed every few seconds. So, you stage Jonathan’s feed, not on the ordinary blogging infrastructure, but on a hyper-fast cache that can take that kind of transaction load; there might even be a business opportunity here for some infrastructure player to offer this kind of special-purpose staging.

If you want to get fancy, you could use something like the proposed new Atom-over-XMPP trick. The idea is that people who want ultra-low-latency feeds don’t poll, but set up a persistent connection to the provider’s server, which pushes entries down the wire the moment they become available. This is elegant in theory; in practice I’d bet on the brute-force polling approach, at least off the top.

And if you want it to really work well, take Atom, APP, XMPP, RFC 4661, OpenSearch, and LLUP (Blip Messaging) and thats it — Problem solved.

From my recent post to the LLUP mailing list,

M. David Peterson

AddThis Social Bookmark Button

Code Search - O’Reilly Labs

Enter search terms to find relevant sample code from nearly 700 O’Reilly books.
The database currently contains over 123,000 individual examples, composed of 2.6 million lines of code — all edited and ready to use.

SWEET! To whomever built this @O’Reilly: Thanks! :D

Quick Update: As would be expected, there’s a web feed to keep you up-to-date in regards to updates @O’Reilly Labs.

M. David Peterson

AddThis Social Bookmark Button

DISCLAIMER+SPECIAL NOTE: If your initial reaction to the titled attempt at a Scheme expression was: “That doesn’t even make any sense!” my response to you would be,

  • Firstly, how many people on this planet actually know that, technically speaking, it’s complete non-sense?
  • Secondly, you try writing a title for a piece using a technically correct scheme expression that includes a function, list, a lambda expression, and its evaluation expression, and then come back and tell me that it technically doesn’t make any sense! ;)
  • Oh, and before you snap open DrScheme, there fly-boy, remember that your technically correct expression must all fit on one line! ;)

[Post]

Dare Obasanjo recently had this to say about multithreaded-programming,

Dare Obasanjo aka Carnage4Life - (thread.Priority = ThreadPriority.BelowNormal) is Bad?

This is yet another example of why multithreaded programming is the spawn of Satan.

Eric Fleischman then followed-up with,

FWIW, there is something worse than multithreaded programming….distributed programming. Concurrency on a node is hard, concurrency across N nodes is even harder. :)

To be fair, I don’t completely disagree with either of their points. That said, I also don’t completely agree, either.

Here’s why,

M. David Peterson

AddThis Social Bookmark Button

I had noticed Dan Sickles reference/link to LINQ in his follow-up post earlier today (e4(x)linq), but assumed it linked back to the LINQ entry page on MSDN, and didn’t click it. However, I needed to reference the e4x specification for some related work, and given I still had his post open in a tab it was easier to just switch to the tab than it was to search for “E4X” (didn’t have it bookmarked — now I do :).

However, in hovering over the general area (the links are separated by “(x)”) I noticed that “linq” was linked to a URI other than its above linked MSDN home. Out of curiosity I clicked through and found an interesting post from Steve Eichert that apparently solved *ALL* of Scoble’s problems. Curious, I quickly parsed through looking for a Green M&M sorting algorithm of some sort, but unless I’m missing something, there doesn’t seem to be one.

Not-to-fret, however, as what seems to be in place of the missing algorithm is something even better: Something that the remaining 6,642,658,382 (based on the result of the formula (((World Population on August 23rd, 2006 @ 11:00PM EDT) - 100)) can use to our advantage… A URI Filter!

Okay, so maybe not all 6,642,658,382 of us… But close!

Setting this potential time wasting point-of-argument aside,

M. David Peterson

AddThis Social Bookmark Button

Update to “One additional note:” below: Uh, found the link, and it seems the way I remember things being handled is different than it is actually handled. In short, they handle things the same way.

So I guess my whole rant is effectively pointless, and meaningless.

On a related note: that really bites! ;)

EXTENDED POINT: You would have thought that making a fool out of myself on the Atom syntax mailing list when MS first announced they were going to handle things this way would have been enough to keep me from making a fool out of myself in the follow-up below.

My only response to this is,

You don’t know me very well, do you? ;)

Okay, now that we have that clear, here’s why what I stated below is a bunch of bologna,

99% of the worlds population are (somewhat) normal.

The other 1% of us are not. Of course, “the other 1%” refers to us geeks.

To us geeks, we get web feeds. We’ve adopted them as part of our daily lifestyle.

The rest of the normal people have not.

As such, MS, Apple, and any other company in the business of presenting content served up by web feeds have to be as flexible as they can be, providing a consistent user experience from one web feed to the next.

With this in mind, the reason why,

* <feed is enough information to pass the XML file to the web feed rendering engine
* <?xml-stylesheet ... must be ignored

is the fact that the user experience MUST be consistent regardless of the edge cases where someone (like me) has chosen to preface the top-most parent of an XML document with something like feed-transform-init, or someone (again, like me) would REALLY like to invoke a browser-based transformation of a web feed using the <?xml-stylesheet ... processing instruction.

The truth of the matter is that folks (you guessed it > like me ;-)) can find other ways to hack around things of this nature (e.g. using a bootstrap XML file that imports an external web feed via the document function (yo Opera, < See why the document function is so important? :D) for rendering locally.) where as normal people who visit a web site, see the little orange icon “light up”, and click it, expect to see whats contained in the web feed rendered in a consistent manner.

If they don’t,

M. David Peterson

AddThis Social Bookmark Button

Just found reference to this article in an email from Peter Hale,

User Driven Modelling: Translation and Aspect-Oriented Programming

XML (eXtensible Markup Language) can also be used in the translation as either a programming language AspectXML, or a language for representing results. The results could be visualised using stylesheets and interactive software, and where useful translated further into other kinds of representations other than trees e.g. SVG (Scalable Vector Graphics) diagrams and graphs.

SWEET! I need to study Peter’s graphs to understand things a bit better, but I’m loving the fact that folks are finding new and innovative ways to use AspectXML! :D

Thanks Peter!

Kurt Cagle

AddThis Social Bookmark Button

This particular series has been ongoing since late last year - not quite a book, though a very healthy chapter towards one, fortunately. When I started it, I was hoping to learn a little bit myself about XForms, as teaching a new technology is, at least for me, one of the best ways I can think of to learn one. However, along the way, I have learned quite a bit, both good and bad, about this technology, and have to admit that I see far more potential in it now than I did when first I addressed the issue in Revisiting XForms.

XForms is not perfect - there were times when I was working on things that I found XForms to be very limiting indeed, sometimes over the most trivial issues. Working on an in-progress implementation certainly didn’t help with this, of course, though I will readily admit that, even unfinished, the Mozilla Firefox XForms implementation is very, very effective, something I’ll say more about at the end of this article.

M. David Peterson

AddThis Social Bookmark Button

A good year and half back, I came to the realization that while Bruce (D’Arcus) claims to be a scholar, he’s truly a hacker at heart.

Proof:

darcusblog � Blog Archive � RELAX NG, XSD, Schematron

However, you end up with a much looser schema, so now what? It’s hardly much use to be creating instances against such a loose schema, where they may be invalid against the normative spec and schema.

Answer: create some separate Schematron rules to model the constraints that XSD cannot. If you want to write it within your RNG customization schema (which can then be extracted using Trang XSLT), then just do stuff like:

    s:rule [
      context = "/cs:style[@class='author-date']"
      s:assert [
        test = "cs:bibliography/cs:sort/@algorithm='author-date'""Must use author-date sorting for the author-date class."
      ]
      s:assert [
        test = "name(cs:citation/cs:layout/cs:item/*[1]) = 'author'""The citation item layout must include an author element first."
      ]
    ]

Finally, write a little shell script to run both validations.

Nice!

Please visit the above linked post for more info.

Thanks Bruce! :)

M. David Peterson

AddThis Social Bookmark Button

via a recent post to the Live-Clip mailing list from Matt Augustine, we discover,

Charles Torre of Channel9 interviewed Paresh Suthar and me about Live
Clipboard and Simple Sharing Extensions (SSE). The video and discussion
are here: http://channel9.msdn.com/showpost.aspx?postid=222215.

You’ll need 30 minutes to get from start to end, but speaking from experience (just finished watching it) its 30 minutes well spent/invested.

Thanks Matt!

Kurt Cagle

AddThis Social Bookmark Button

One of the things that I so enjoy about watching the Mozilla Firefox development process is that they are not shy in pushing forward with technologies that many would have thought solid and immutable. JavaScript is a case in point. JavaScript had largely stagnated as a language from about 1998 on, until AJAX (spurred largely by renewed interest in the Mozilla platform) suddenly came out of nowhere. Accessors (getters and setters), more sophisticated array handling, E4X and other innovations have come out of the development process.

With the release of Firefox 2.0 beta 1, there are several new features that have been dropped into the language, features which are likely to be used sparingly at first but which offer a significant set of capabilities that will likely become welcome tools in any AJAX developer’s toolbox.

David A. Chappell

AddThis Social Bookmark Button

There’s quite a lively discussion heating up over on InfoQ on the subject of ESB’s–Concensus on the definition of an ESB, and ESB use cases are two of the forum topics that are going on over there at the moment. InfoQ is a web site that has recently been launched by some ex-ServerSide.com guys. Go have a look.
Dave

David A. Chappell

AddThis Social Bookmark Button

I was recently interviewed by Vance McCarthy of Integration Developer News, where he asked me a number of questions about what Progress and Sonic have been doing with new advancements in ESB technology -
Here’s how Vance describes the Interview -

“This lively discussion provides IT execs and architects business and technical insights for how ESBs are being used in an Architectural and Lifecycle point-of-view.. “

It goes for 17 minutes. I must have been feeling chatty that day. There’s a lot to talk about, primarily how an ESB can provide an implementation of advanced web services specs such WS-ReliableMessaging, WS-Addressing, WS-Policy, and WS-Security. Also how an Eclipse based IDE can become an Integrated Services Environment (ISE) for modeling processes, configuring services, and distributed debugging -
http://www.idevnews.com/Program_Code.asp?ProgramID=7
Dave

M. David Peterson

AddThis Social Bookmark Button

Update: Sylvain has just posted a FANTASTIC overview of LLUP, looking at things from the view point of the importance of its transport protocol independence, and some of the problems we have specifically set-out to help solve in regards to reducing the use of network resources, allowing for those same resources to process more information, in less time. I would encourage you to stop by for a visit just as soon as you have a chance.

Thanks Sylvain!


[Original Post]
Just got a ping from Russ regarding an article he just finished up that implements an LLUP subscription web service that communicates between Ruby on Rails and the .NET platform via a SOAP-based Blip Messaging service,

SOA Ranch - Articles - SOA and Rails, Part 1

This article, the first in a series of three from Russ Miles, will cover how to get started with a simple web service in Rails. We’re going to create a web service and then test for interoperability with a simple C# application.

Now some most every last one of ya might be asking,

“Huh?”

in which I would respond,

Excellent question… Thanks for asking. :)

[From the above linked post from Russ]

Hari K. Gottipati

AddThis Social Bookmark Button

Simile(Semantic Interoperability of Metadata and Information in unLike Environments), a joint project conducted by w3C and MIT Libraries, released a visualization tool for time-based information. From the project description, its a DHTML-based AJAXy widget for visualizing time-based events. Below is the screen shot of the JFK Assassination timeline - a minute by minute development when John F. Kennedy got shot on November 22nd, 1963 in Dallas.

Kurt Cagle

AddThis Social Bookmark Button

I’ve been fighting a severe day of laziness today (or perhaps exhaustion - its been a fairly trying week all told), and after having started and stopped any number of projects today, I’ve finally decided to get back to the XForms series and concentrate on the next topic in my list - customization. However, before I begin, I wanted to make an announcement or two.

The first is one of those glass half empty type of things. I’ve been working for a while on a Firefox book, but between fairly trying times with a previous employer and a seemingly endless shifting focus on the part of Firefox, I’d really reached the point where progress was crawling there. I think, however, that there’s been another reason for my lack of progress as well - books are a great deal like bananas. If you hit the market too soon, they are green and hard and generally not terribly appetizing. Hit the sweet spot, and bananas are both very good (especially in a bowl of cereal) and yet have a reasonably firm consistency. Wait too long, or if the conditions are not quite right, and you end up with soft, brown, fairly disgusting messes.

M. David Peterson

AddThis Social Bookmark Button

I sometimes find myself in complete and total awe by how complicated people can make something that simply doesn’t need to be complicated by ANY stretch of the imagination. For example,

In regards to the understanding of “Man” and “Machine”, I’ve come to five ABSOLUTE “without a single, solitary, doubt in my mind” conclusions thus far in life…

1 - As the number of levers, control knobs, and bypass valves on or around a device increase, there is a proportionate increase in the level of “understanding” in regards to how something works by a similar proportionate of individuals who will claim they understand how something works.

2 - As the number of levers, control knobs, and bypass valves contained on or around a device decrease, there is a proportionate decrease in the level of understanding of how something works by a similar proportionate decrease in individuals who will claim they understand how something works.

3 - The fewer the levers, control knobs, and bypass valves; the easier something is to understand.

4 - The fewer the levers, control knobs, and bypass valves, generally speaking, the better something works, and even more so, the more reliable something tends to be.

5 - If you want to understand how a particular device works, ask the guy who invented the version with the fewest number of levers, control knobs, and bypass valves.

So, for example,

Jim Alateras

AddThis Social Bookmark Button

Here is a brief interview with Werner Vogels on Amazon and SOA which has some important take aways particularly around designing for simplicity, maintaining customer focus, being business driven and technology agnosticism. Amazon has evolved from a single server, single database environment to a multi-server database platform serving millions of customers. Along the way they discovered that a service oriented architecture provided the flexibility and agility to support their business goals. Amazon didn’t attempt to retrofit a set of technologies or an architecture approach into their business they actually found that such an approach helped them to grow their business. To facilitate the magnitudes of scale and reliability they moved from a centralized mainframe platform to a one built on distributed components. Nothing new but always nice to heat about the practicality of SOA solving real world problems.

Hari K. Gottipati

AddThis Social Bookmark Button

Google Inc. revealed the launch of Google Checkout, a checkout process that makes online shopping faster, more convenient and more secure for Google users. It offers an easy and trusted checkout option that enables shoppers to purchase from participating stores with a single Google login. Bypassing their traditional beta releases(years in beta stage) this time Google came up with fully functional and tested version because consumers would be unwilling to trust their bank accounts and credit cards to a beta version. It will serve as a centralized authorization service for customer purchases, promising the transaction security with industry-standard SSL technology.

Kurt Cagle

AddThis Social Bookmark Button

I know, I know … when you talk about a series, it usually implies that the articles will be somewhat closer together than the ones for my Understanding XForms series has been. Mea culpa. The last couple of months have been busy, but a desire to wrap up my Firefox book (another mea culpa, if it comes to that) has combined with being fairly deeply immersed in the vagaries of XForms on Firefox to prompt me to write the third installment in this series. Take a look at the links at the end of the article for the previous installments. Please note in all of these that the assumption being made that the examples given here, if run in Mozilla Firefox, need the Xforms extension available at http://www.mozilla.org/projects/xforms/download.html. However, the examples should (in general) run in most contemporary XForms enabled browsers (a topic I’ll be discussing a couple of columns from now).
M. David Peterson

AddThis Social Bookmark Button

Update: Credit where credit is due. From Mike Champions comments below, we discover,

Uhh, this really wasn’t my doing. Nithya is the XSLT Program Manger and has worked hard to make the case for XSLT2, Soumitra Sengupta was the Product Unit Manger who made the hard call to pull the plug on XQuery in .NET 2.0, and Anders is the one who laid down the party line that XLinq will not even try to compete with XSLT for loosely-structured doc scenarios that XSLT handles well.

Thanks for the info, Mike! And thanks to each one of these folks who have helped bring ALL of this into reality. It is MUCH appreciated :)

[Original Post]