Reviews Archives

Rick Jelliffe

AddThis Social Bookmark Button

Kenneth Chiu from SUNY Binghamton this week sent me a couple of exciting papers on recent techniques for XML parsing. He and his collegues have been looking at paralllelizing parsing of (largish) XML documents to suit multicore processors. One of the papers Parallel XML Parsing Using Meta-DFAs is listed as available on the ACM website and the other Simultaneous Transducers for Data-Parallel XML Parsing seems (I didn’t check) available as part of the large Proceedings 22 IEEE International Parallel and Distributed Processing Conference 2008.

I am always surprised, as I was with the recent work from Simon Fraser University on pipelined parsing techniques, that we are in 2008 and there are still new techniques cropping up.

One thing that struck me about the Chiu papers concerned an application of his parallel ideas outside of the multicore world to the world of interactive text-based XML editors: for example, Topologi’s Markup Editor or other “coloring editors.” These are hardly the glamour end of town, but interesting none the less.

Chiu etc have various ideas based on making new state-machine-like structures that allow all possible parse states of a document to be represented simultanously: these are presented through two different ideas in each paper: the meta-DFA and the simultanous finite transducer (SFT). I defer to those papers for details.

For the application I am thinking of, consider a “<" in the middle of some XML document. In a conventional XML processor, you can just parse using a simple state machine to find whether it is acting as a delimiter (or is it inside a CDATA section, etc.?) In a interactive editor, the state machine has to be much more complicated, because it also has to cope with what happens when there are errors: how do you recover and resynch? In Topologi's case, we register particular status-bar messages with every error transition, so that when the cursor is at an error location, a message is provided. (The next layer of editing above this is to provide also some kind of user interface for fixing the error.)

Now for large interactive documents, memory utilization and snappy response you really don't want to build a parse tree. So various techniques exist. James Clark's RELAX NG mode for emacs uses a check-pointing system, where at regular intervals (e.g. 1000 lines or 1k characters or whatever) the state machine state at that point is available. Jumping to a point only requires the editor to find the last checkpoint and provide a detailed parse only from there to the checkpoint after the current entry.

Making an edit invalidates following checkpoints, but because a state machine is being used, you only need to reparse text until you find the next checkpoint that agrees with your parse: at that point you are in synch. And, in fact, you only re-parse forward when you actually need it, since you don't want trivial edits to force parsing in sections that are way beyond the current display area.

Topologi's Markup Editor takes a slightly different approach, from JEdit and Java's document APIs, where each line has a checkpoint memoizing the current parse state. This reduces the amount of reparsing during editing to a minimum (in fact, we found we had to add delays to the display so that users would be aware that changes had taken place!) But there are the same optimizations: when a change is made, the rest of the lines on the screen are reparsed stopping if there is resynchronization, and an index is kept to the line off the screen to notify where there is some invalidity.

Generally, this works well, but there are a few pathological uses cases. Consider a large XML document (say 100 meg) with no comments. At the beginning the user starts typing a comment, gets as far as the open delimiter, then wants to go to some point late in the document to confirm something. In this case, all the document between the comment open delimiter and the jumped-to location needs to be reparsed, but with no real benefit.

What Chiu et al's work suggests is that you can have a kind of parallelized parse, where each potential delimiter is parsed in all possible states, and each possible transition noted. Think of it as if parsing produced a linked list which associated each delimiter with an array of parse roles and next parse states.

For example, if we had the text <xxx> then our parse list might have a node pointing to the first “<” and a pointer to the following node which points to the “>”. The each node has an array with one entry per mode in the original DFA: what is < when you are in the prolog, what is it when you are inside a tag, what is it inside a CDATA section, what is it inside an attribute value, what is it inside data content, and so on. Then, for each of these entries there is a index to the next mode from the original DFA which is used to index in the array of the next node of the parse list. (To keep things under control, XML parsers usually have a lookahead or peek function, which reduces the number of states: the same technique can be used here.) So in our example, the < in the prolog is a start-tag open delimiter, in which case the next delimiter > should be interpreted using the entry for being inside a start tag. If we started in a comment, then the < delimiter would be just data, and the next delimiter > would be interpreted using the inside-a-comment entry.

(You would presumably use singletons for these, to keep size under control.)

So in our previous example of adding a <!-- delimiter, then reparsing from the top of the document to the new jumped-to location only involves following the parsed delimiters. To get the line-based check-pointing only involves checkpointing the possible transitions at the start of each line together with the transitions that the lead to at the end of the line. (Other optimizations are possible, in particular for resynchronizing.)

So this can reduce the blow out for long documents (though both it and the old method are both linear, so neither suffers from combinatorial explosion). That may be a nice optimization for coloring editors, but they are not something that grabs people’s minds!

What may be more interesting is the idea that you can build an “all-possible parse” DOM on top of this parallel parse list. That has a bit more interest for editors which have, for example, on-the-fly well-formedness and validity feedback. Now I am quite aware that in most cases, a syntax-error prevention mechanism is better UI technique (for many uses): for example if you want to add a comment, you can only put it inside an element, or you have to select some range, but you can never type just an opening comment delimiter.

But for interactive editors, you already have to cope with no-well-formed transitions, so the DFA is already at the level of transition-complexity of the parallelized DFAs in Chiu’s work. What the parallel approach allows is things like saying if the document ends up being non-well-formed, we want to trace back through the transitions from a well-formed result to the nearest point to the editing point, so that WF errors only are shown for the most reduced range possible. For example, if the document was <!-- unterminated comment <x>text</x>, then rather than showing all the document as non-well-formed with an indication that the end-of-document had been reached, it would back track to the last feasible well-formed point, which is the initial < in this case (taking the text !-- unterminated commentas data content when backtracking.)

The difficulty of interactive editing of XML is that errors are identified where they are found, not necessarily where they are caused. These borrowed-from-parallel techniques perhaps could allow a different approach.

Kurt Cagle

AddThis Social Bookmark Button

While a remarkable amount of both ink and electronic bandwidth have been expended upon the use of XML in the data realm, there are times where it is necessary to step back for a bit and look at what and where XML is being used today. One thing that becomes obvious when studying the XML landscape is that a significant amount of XML is still being used for purposes of describing narrative, for telling a story, advising people in the use of a product, structuring reports, and doing other things that focus more on documents than they do on data.

In some respects, this is not all that surprising. In general, when you’re dealing with data-centric applications, XML isn’t always the best choice for working with structured content, and indeed there are times where XML is perhaps the worst, most hideously inefficient mechanism for dealing with data. However, the use of XML as a means of writing and marking up narrative has become the standard means of encoding structured content in most organizations. That doesn’t mean that XML is dominant in most organizations for “unstructured” content - that distinction is still very much in favor of Microsoft Word, with XML occupying a considerably inferior position there - but for organizations that recognize the benefit of structured content, XML languages such as DITA and DocBook are very quickly becoming the standard for storing information.

I had a chance to see that principle at work this week at the DocTrain conference in Vancouver, British Columbia. Conference chairman Scott Abel (CEO of The Content Wrangler ) graciously invited me to the conference and I had the chance to talk with a number of people working with technical documentation, online content creation and related material, and overall it opened up my eyes fairly dramatically to the hyper-accelerated world of content management a decade after the introduction of XML.

Rick Jelliffe

AddThis Social Bookmark Button

The comments period for the XML 1.0 fifth edition revision finished last Friday 16th May. I didn’t make a submission, in part because I felt I have had a good run in the past and my concerns are pretty well known and unchanged.

In XML 1.0, we went strongly against accepted wisdom which held 1) that the future was Unicode so you didn’t need to support existing encodings, 2) that the present was beautifully layered so one standard shouldn’t try to overcome the deficiencies in others, and 3) that we should all live in a Standards Fantasyland (on the map near Boogie Wonderland) where even if the world had gone one way that didn’t agree with what the existing standards said, we should follow the standard. A complete triumph of engineering (systematizing what works) over schematising (insisting on the right way to do things).

So for 1) the XML encoding header allows multiple encodings. Now, ten years later, we are finally reaching the stage where UTF-8 for web pages has exceeded ASCII and 8879/Windows encoded pages (Unicode wrangler Mark Davis, now with Google but for a long time with IBM, recently released some figures on this), so it may indeed be coming closer to the time when XML can be simplified so as to only support UTF-* encodings: I doubt it will have any demand because it is handy, free (everyone has large transcoder libraries) and doesn’t get in anyone’s way.

For 2) the example is that XML adopted what we now call IRIs for System identifiers in entities: it took IETF almost a decade to catch up and formalize this, surely a record for any standard. “Internet time” are you kidding? XML deliberately didn’t use the official URL syntax, but opted for the approach that it was better to have the software shield the user from the details of delimiting. I think there are very few advocates of XML simplification who would be prepared to go using vanilla URL syntax. But now 10 years later, entities are fast disappearing (mind you, just this week I had a seminar where there were surprisingly many questions on trying to use entities schemas) and the IRI spec is out. Namespaces and XLink should be using IRIs now, but there is an underlying problem that character-by-character comparison of IRIs is not robust unless they are canonicalized.

For 3) the example was again the XML header specifying the encoding header, despite the information supposedly being available in the HTTP MIME headers. But the standards got it wrong: the person who creates a file is not the person who sets the HTTP MIME header, in effect. Now 10 years later the relative reduction in the number of encodings in widespread use does make encoding sniffing a much more workable approach, but still too fallible and time-wasting for mission critical data.

In XML 1.1, engineering won again. The decision was made to open up the naming rules from XML 1.0 to remove a dependency on versions of Unicode. However, because this meant in turn that XML 1.1 processors would not as reliably detect encoding errors (when you see “encoding error” think “database corruption” or “spurious data” or “spurious rejected documents”) the treatment of the C1 range of control characters (0×80-FF in IS8859-* encodings) was clarified to be non-well-formed (with special treatment for IBM’s NEL character). Control characters have no place in markup, as confirmed by Unicode Technical Reports and as emphasized recently by the OOXML BRM which required MS to change a couple of places where some control characters could be entered even though harmlessly delimited. I was startled during the OOXML debates how strongly this was held to be a vital, core part of the XML story from all sides.

XML 1.1 was an enormous flopperoony, for the unsurprising reason that if you put version="1.1" then an XML 1.0 processor would spit the dummy. Some people have tried to claim that it failed because previously well-formed 1.0 documents that had C1 controls in them became non-WF. I have never seen such a document in the last decade, nor have I ever had any credible reports of one, and I can see no cases where putting C1 control characters in a document would be legitimate practice, so I think it is just bluffing: there has always been a wing of users of XML whose life would be easier if they could embed raw binary into XML and they deserve no sympathy or help.

So along comes XML 1.0 (fifth edition) as a draft. It has only a couple of changes of significance. The first is that it finally puts in place a rudimentary versioning system: E10 allows an XML 1.0 processor to parse an XML 1.x document on the understanding that it only reports things in terms of XML 1.0 rules and capabilities.

The second change then makes a mockery of the first. It introduces the lax naming rules from XML 1.1. Now such a change is not required for any reason, because XML 1.1 exists and could be used. So rather than go into a well-managed regime where documents are well-labelled, and XML minor versions chug along, XML 1.0 draft fifth edition just allows a new XML 1.0 parser to accept documents that all the other old XML 1.0 parsers will reject: and remember this is not because of previous bad practice being more consistently exposed, but because some innocent person has created a document with the new name characters and the XML 1.0 processors deployed in the last decade reject it.

Basically, the W3C XML WG is saying that if you get a document that breaks in this way, it is the receiver’s problem. The sender can say “But it is well-formed against the latest version of XML 1.0″ and the XML WG washes their hands. It is the triumph of bad engineering practice, of doing what can be guaranteed to fail, of putting the responsibility on the wrong person. It will cause problems first for the nominal beneficiaries of these extra name characters (since they will be unreliable) and second for people using non-UTF-8 encodings who won’t get as many WF errors. So who will benefit: the makers of standards who will have less housekeeping. They are not an unworthy set of stakeholders.

The W3C XML WG needs to revise the goals of XML (in s 1.1) to accomodate these changes. In particular

6. XML documents should be human-legible and reasonably clear.

no longer holds. The new rules allow a blank check, so you could have a document entirely made with element and attribute names from code points which have never even been allocated a character by Unicode. With the fifth edition, the goal becomes

6. XML documents may be human-legible and reasonably clear.

And the goal 5. needs changing

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero

because in effect support for these new naming characters becomes an optional feature: does your XML 1.0 parser support editions 1-4 or edition 5?

I didn’t write a comment to the W3C XML WG because nothing has changed over the last 10 years that makes the decisions in XML 1.0 and in XML 1.1 inappropriate. I don’t have any new information that changes anything, and the XML WG certainly has produced none. All that is needed is for the fifth edition to fix up the minor versioning issue, and then we could all transition to 1.1 on an as-needs basis. This minor-versioning fix is already at least five years overdue: fixing it opens the door for XML 1.1 to have a snowflake’s hope and will allow a better transition to XML 1.2 potentially including some other overdue changes (building in xml:id, namespaces, etc.)

To summarize: XML 1.0 (fifth edition) is bad from a standardization and engineering viewpoint, betrays the goals of XML 1.0 which have served well for the last decade, and may hurt the end-users it is intended to support. It sets up a workable versioning mechanism then fails to use it for a significant change. It provides a good foundation for workable minor versioning, then ignores the foundation and builds on sand with its allowing of incompatible names.

I may be wrong, but it looks like a hack to me. However, fortunately it barely impacts anyone in the West, including me nowadays, so who cares? Interoperability, schminteroparibility! Unambiguous labelling of data formats, gedoudahere!

I am not trying to suggest the W3C XML WG is doing this because they prefer to sit by some giddy swimming pool in their floral-printed bathing costumes sipping umbrella-ed beverages, that they clear their desk by making incompatibility problems someone else’s problem, or any laziness! But I think they at least owe it to explain why they are doing a substantive minor version change as an edition change, failing to use the edition mechanism they are setting up at the same time which would allow people who needed this feature to access an already-existing minor version!

Rick Jelliffe

AddThis Social Bookmark Button

I’ve just caught up with this document from W3C which fills in a big gap in English-language technical material. Japanese typesetting technology has been very influential in the other Ideographic countries, and they share many commonalities (e.g. Japanese ruby text and Taiwanese bopomofo.) There is a Japanese standard JIS X 4051, but it has no translation available: though parts of it, usually called the kinsoku rules, are floating around in material from vendors, particularly Adobe’s Ken Lunde and some MS material.

By and large, Chinese and Korean have different details (e.g. different characters) but the same analysis applies.

One term that the W3C draft uses but does not define is kihonhanmen; readers getting held up by this could substitute underlying grid (or text block or even constant width frame) for this.

AddThis Social Bookmark Button

How earth shattering could an upgrade from 0.45 to 0.46 be?

In this case there are some really neat new features. For those who are unfamiliar with the software, Inkscape is a visual svg editor. When I first tried svg I was writing the files by hand with Vim. Nothing against VIm, I still use it for coding, but its a lot more fun to create svg’s in Inkscape with a nice image editing interface.

I’ve been able to use it to convert bitmaps into svg’s for a while now, but with the 0.46 release we have a whole new dimension with two new tools. There’s a deformation tool you can use to mush your drawing like a squishy toy.

Its still beta software, I managed to crash it out when I tried to run a raster effect on a bitmap. There’s a whole new set of raster effects, along with a new 3D cube object.

Inkscape doesn’t do animation, but for developing svg graphics or any kind of 2D design Inkscape makes svg fun. You can view the results of thirty minutes of playing with the new 3D functionality on my picasa page.

Rick Jelliffe

AddThis Social Bookmark Button

The Java Community Process is the mechanism Sun set up to develop and evolve Java “in Internet time”. It brings together “a cross-section of both major stakeholders and other members of the Java community”. A group of experts make the initial draft, then “Consensus around the form and content of the draft is then built using an iterative review process that allows an ever-widening audience to review and comment on the document.”

The result is a specification, a reference (proof of concept) implementation, and a technology compatibility kit (tests).

One specification I have been interested in for a while is JCP 296 the Swing Application Framework. The JSR (Java Specification Request) was approved in May 2006. There is an implementation at Java.net.

However, I cannot figure how to find the spec. Looking at the JCP site, there is everything about the spec, but no actual link to it. Looking at the implementation site, again no actual link to the spec. This strikes me as an entirely odd way to do business. What are they trying to hide? :-) Whatever it is, they are doing an excellent job of making sure that no-one finds it.

Looking at the site, it seems JSR 298 is marked “in-progress”. That means, I suppose that it is still a committee draft, that has not been released. After 18 months? So much for Internet time!

It seems like in order to see the draft, I will have to sign up to be a JSP member. For an individual member, it is $0 which is nice, but I have to send a fax to the other side of the world and await them to fax back a password. Or I can fax and send hardcopy by courier.

But even then, I don’t actually know that the JSP 296 draft is available for community review. The status is given as “In progress” but there is no mention of this status on the JCP description page. Presumably for the last 22 months the draft is being written. I presume a draft exists, because it has software that claims to be an implementation.

What is interesting is that this is the opposite of the ISO process. At ISO using the normal rules, it is the early drafts (working drafts, committee drafts) that are given the most exposure and can be floated around openness, and only the very final draft standards that are supposed to be controlled (to reduce interoperability problems where people write systems according to different drafts rather than the final standard, and for the standards that are published commercially by ISO and standards organizations for cost recovery.)

While I am generally in favour of committee room secrecy, to prevent intimidation and silly marketing point-scoring and to disenfranchise armchair experts, and while I can understand that drafts can change substantially so you don’t want to have old drafts floating around, openness is better. But after 22 months, and after there is an implementation, to have no actual draft casually available is not “Internet time”, is it?

M. David Peterson

AddThis Social Bookmark Button

Placeholder for ongoing notes from the Microsoft Technology Summit…

Rick Jelliffe

AddThis Social Bookmark Button

I have been pretty disappointed in the new operating system distros I have been trying out recently. In the last three to six months there has been:

  • A horrible install of a new Mac where the Expose feature caused windows to run away when I tried to click on controls near the edges of windows. It was like some kind of demented joke or game. (The user, who was previously a dedicated PC user, now loves the Mac and thinks it is much simpler.)
  • Today I tried twice to install the new service pack for MS Vista, only to have the install fail with no useful message.
  • An attempt to install a mainstream Linux on my new PC failed when it could not detect the keyboard. I had to get the new box because another install of a newer Linux from another distro was disastrous for performance on my quite old box.
  • I got too bored to continue with another mainstream Linux install, where the DVD instructed me to first burn the image to a bootable CD.

So instead I have installed a recent Solaris Developer, from the DVD of some Linux magazine in the newsagent.

This is one of the easiest installs I have had. (I could only install onto a partition on the main disk, install onto a partition on the secondary disk failed with a bogus message about user accounts. No biggie.)

The system boots up, SAMBA works fine and detects most things I want to detect. (It is interesting that it only detected one printer on our network, however Vista only detects the other one, so that is not so bad.) It has Firefox and Thunderbird, which are what I’d use anywhere, and StarOffice, which is good enough for now; I cannot really use it or OpenOffice for making presentations until Impress gets tables in v3.0. It comes with Java installed, and Netbeans, though I’ll be downloading Eclipse for compatibility with the workgroup here.

The desktop is a nice GNOME and really uncluttered and to the point.

Best of all, it feels like UNIX. Not a half-assed wannabee, or a messy child’s toyroom, the way some Linux distros seem to be. But lots of GNU goodness. I still have to see how it copes with some issues like updates (which was the only real flaw I found in Mandrake Linux, that I was happy with for a few years.)

So I really like Solaris. It seems to suit what I want and expect better than any other OS distro I have come across yet.

But it has one big problem: the screen graphics are super ugly. In fact, so repellent as to make it unusable. I have a 1440×900 LCD monitor and this is not one of the built-in types supported. No problem, I thought, I’ll just change the appropriate xorg.conf (or whatever is the equivalent) file. But I cannot see how to do it: it looks like it is hardcoded or something. So I have some other resolution, with a half inch dangling above the screen and unreachable. And the fonts are ugly and thick: even when I turn on anti-aliasing and play with the LCD settings it makes little real difference. Unless some kind reader can make a good suggestion, it just doesn’t compare to what I have been used to under Windows, Mac or even Linuxes.

I really hope I am doing something wrong, because apart from that Solaris really seems to fit the bill for me. Maybe I have to resurrect the old CRT monitor.

[Further Adventures] I tried to install a different card, only to have a hardware problem, so I switched back to the original new card. Oops, now the thing doesn’t boot. Checking though, for some reason the BIOS had switched around which of the two hard disks to boot from. I don’t understand how this could have happened. In fact, I don’t understand why it can happen either, because I thought both hard disks would be checked for booting in any case. But swapping the order of booting from disks fixes the boot problem, and I am online again.

I looked through the X windows logs, and sure enough the VESA driver only has a limited number of screen resolutions available, and 1440×900 is not one of them. Sigh… So to use Solaris I have to either go down in resolution to fit the monitors we have, or buy in a new monitor. I was finding the wider screen really useful for Eclipse, so I guess I will have to search for something else. All this is taking a frigging long time: I expect to live for three years on a single installation, so having to go through four or five large an problematic installs is wearing me down.

Rick Jelliffe

AddThis Social Bookmark Button

Patrick Durusau has a few more items on his website. Always worth a read for anyone interested in getting more than the party lines. Here is some of his latest TOC:

Rick Jelliffe

AddThis Social Bookmark Button

I’m writing this sitting in the sun looking at the pool, somewhere tropical, en route from the exhausting ISO/IEC JTC1 SC34 DIS29500 BRM meeting (hoping for my lost bags to appear and with every flight delayed by up to 12 hours). And not an acronym in sight here!

Apologies to readers; I took down the rest of the article, because it was proper for me to report back to Standards Australia first. This is quite reasonable, I think. But several sites copied the following from caches:

I’ll blog some more, but the BRM clearly has succeeded in its formal aim, which is to produce a better text. Every response by the editor was formally voted on. The big picture issues were given extra time for detailed discussion, and the NBs had opportunity to raise their highest priority issue, in turn. It would have been great to have had more time to deal with more of the middling issues: where we would have preferred some variant or augmentation of the Editor’s response to our issue or where we didn’t like his answer.

The context of this was that the meeting was productive and calm:

The BRM went pretty much the way I expected: grinding through the issues, politeness, assertiveness, corridor sessions, strange bedfellows, a lot of newbies who made up for it with articulateness, candour and brains. In substance, it was a typical ISO meeting: issues, votes, different personalities and cultures interacting, some people happy, some people pissed off about individual results, limited time, stimulation, mind-numbing alterations to resolutions, convivial dinners with fascinating techoes, late-night study sessions and early morning drafting gallops. But in accidents it was very odd indeed: not just the size of the meeting and the size of the draft and the sewerage farm of disinformation surrounding it…what is atypical is the large number of non-technical delegates and that a few delegates seemed surprised that their delegations would have to figure out a position on each issue by the end of the week (which could be “abstain - we have no position”.) It is not as if they hadn’t been told!

And after that quote was material emphasizing that there is a maintenance process to fix outstanding issues and new ones that get discovered:

There are a lot of those, and they will have to go to maintenance, which really is the big issue: will MS continue these baby steps to openness or will it go soggy once out of the spotlight, which is not unprecedented by other standards stakeholder? Even after the final vote (assuming an acceptance vote, as seems likely) governments will need to keep the pressure on Ecma to continue working with SC34 and to get these outstanding issues addressed ASAP; it is not the case that unaddressed issues need to disappear down a black hole, but SC34’s only power comes from having strong government and user backing to give this maintenance the steroids it needs: this not only means monstering MS to continue through maintenance, but also (for governments) to provide adequate resources: staffing, delegates, and long-term support for participation at standards meetings.

I have more details at What is in the new draft of OOXML?. Brian Jones has a fairly detailed Narrative of the ISO/IEC DIS 29500 BRM Meeting that is very factual. I recommend readers take a lot of the other material on the web about the BRM with a large grain of salt.

Rick Jelliffe

AddThis Social Bookmark Button

Eve Maler and Jeanne el Andaloussi’s out-of-print book Developing SGML DTDs: from Text to Model to Markup has just been put online I see. (Through the magic of Docbook!)

Even though it looks dated in its SGML examples, it really is about a methodology for analysing and designing schemas (especially for literature, i.e. “documents” rather than “data”) that is just as useful today. We might call SGML XML, and we might use “MIME type” or “data type” instead of “notation”, but the development issues this book addresses never went away. Anyone who wants to be an expert in XML schemas and document analysis needs to be aware of it, IMHO.

A good taster might be Learning to recognize semantic components.

Rick Jelliffe

AddThis Social Bookmark Button

Bruce Byfield has a nice article A Field Guide to Free Software Supporters. On his typology I’d be in between 4) Softcore advocate and 5) Mainstream advocate.

What struck me when reading it was whether pretty well the same categories could also describe people’s attibutes to Standards (and Open Standards, Open APIs, Open Systems)? Not a bad fit, with different names sometimes. In that category I guess I would be somewhere between 6) Hardcore (see All Interface Technologies by Market Dominators should be QA-ed, ZRAND Standards!) and 3) the participating idealist (because the standards issues I participate in are the ones involved in my day-to-day jobs in the markup/industrial publishing industry).

Rick Jelliffe

AddThis Social Bookmark Button

I was chuffed to see the ODF Alliance quoting this blog in their new Alliance Response to Ecma’s Proposed Disposition of Comments on OOXML. And they seem particularly interested in getting good results on the Standards Australia issues AU-09, and AU-15, AU-23 which are issues I submitted.

I guess they love me now! Though not enough to mention me by name, I am the only person quoted who is left nameless merely one XML expert. Hmmm, “He who shall not be named”… Since Groklaw thinks that the mere linking to this blog with my name by collegues foreshadows bad things, it is only prudent. I suppose it will have to be a secret love.

Since they quote me, I hope it is not too much to look at their response.*

Procedural Irregularities

In their early material various claims are made which bear looking at in more depth. They say there are many “documented irregularities”, yet when ISO JTC1 looked at them they found no substance. Looking at the list on Wikipedia where is the actual evidence of this villainy?:

  • Portugal: a fixed working group size caused late-applicants to have sour grapes. Actually, the Portuguese already had expanded the size of that working group. Not chairs. The problem as such is the regularity not the irregularity, it seems: Sun and IBM didn’t like the rules. (Note the Wikipedia entry is biased.)
  • Sweden: MS withdrew within hours an mistaken inappropriate offer of support to 2 partners before the meetings and notified the Swedish body themselves before any votes. (Again the Wikipedia entry is biased: IIRC it was MS who reported it, not “it surfaced.”) Sweden ended up abstaining due a procedural SNAFU: a double count of a vote in a meeting where another meeting could not be convened in time. So what do we have? A cock-up, transparency, the correct channels notified, no votes affected: no smoking gun (unless there is material that hasn’t come out.)
  • In the Netherlands, the MS delegate voted one way, other people voted another way: again, a case of regularity not irregularity. (The Wikipedia entry is biased here:why is that substantial problem? Different national bodies have different rules depending on their bureaucratic culture and traditions apart from anything.)
  • In Switzerland, it seems discussions were limited to technical and editorial considerations. These are the only comments that can be considered by the BRM, as has been emphasized recently by Alex Brown, the BRM convenor. So the Swiss chairman had in fact completely legitimate view, as far as I can see, as far as what is in-scope for ballot comments; that other NBs might put out-of-scope material in their ballot responses might make them feel good but they don’t go anywhere. (The Wikipedia article does not mention the scope of ballot comments to provide some balance.)
  • Malaysia voting abstain is typical when there is no consensus. Australia did the same, it not an irregular procedure. If a NB submits their comments with the abstention, the comments get to the BRM and they become part of the mix, so no harm is done.
  • Cyprus joins late. The idea that one side is more remiss than the other in trying to stack SC34 is not evidenced by the numbers: they just came in different waves separated by a few months. Given that perhaps 2500 of the 3500 comments sent in by NBs are parroted comments from a mail-in campaign (i.e. not from a proper independent review) it would take a lot of chutzpah for the ODF Alliance to get too excited by this one.
  • Finally, in Norway MS asked its partners to participate. Again, no procedural irregularity at all.

I don’t know if pointing this out will have much effect. I think the point with the various bribery/corruption claims is that they have the necessary truthiness, so it doesn’t matter if none of them have any procedural irregularities.

5 Months?

ODF Alliance say there was only 5 months to review, yet there was a full year before then during the Ecma process for participation (e.g. by ODF Alliance and Ecma member IBM). Yet the draft was submitted in: December 2006 draft submitted and the ballot was in September 2007: that is at least 9 months. (And then there is the five months until the BRM for further looking at how to resolve the issues and the issues of other NBs.)

And after that comes the maintenance process, whatever form it will take: certainly it will have a pretty high premium on interoperability with ODF and other standards.

6,045 Pages

I have previously dealt with why raw page count is not a very fertile metric. There is so much duplication, so much whitespace and so many diagrams that the effective size for review is much smaller. Furthermore, the assumption that any large standard will not be reviewed with an international and national division of labour is, in my experience and certainly in this case, incorrect.

3520 Comments

The trouble with this number is that people then think “3520 flaws” rather than “750 individual issues and a lot of repetition”. Too many? In my blog On error rates in drafts of standards I have a good quote from Jim Melton, the editor of SQL, who has commented on his standards frequently getting thousands of comments. For a large standard, a good number of comments is an indication of real review, and says absolutely nothing good or bad about the general quality of the standard or the technology IMHO.

Seven Dwarfs

The ODF Alliance groups its response under 7 heads:

In short, the proposal does NOT address the critical need for: a.) review time; b.) harmonization, c.) a clear name; d.) a sound standard with no (new or old) technical errors; e.) interoperability; f.) support for legacy documents; and g.) consistency of “fixes.”

Lets have a look at each of them:

Review time

I have mentioned above that there is more review time than is often bandied about.

But the ODF Alliance argument here is that OOXML should be be standardized because of errors that were not found in DIS29500. This is a remarkably hopeful claim (perhaps a cunning plan): see falsifiability for a discussion on why it is shakey ground.

The strongest evidence would be if the (non-duplicate) flaw rates detected for DIS29500 were far in excess of the same for other standards. However, as the blog item above mentions, the numbers don’t go that way.

However, this is not to say that OOXML and ODF and PDF would not have been better submitted as Committee Drafts in the accelerated process to ISO/IEC JTC1 SC34. No-one is particularly enamored of any of the current fast-track processes.

Harmonisation

It is interesting that the ODF Alliance quotes Tim Bray that the world doesn’t need another way to express basic typesetting features. If it is so important, why didn’t ODF just adopt W3C CSS or ISO DSSSL conventions? Why did they adopt the odd automatic styles mechanism which no other standard uses? Now I think the ODF formating conventions are fine, and automatic styles are a good idea. But there is more than one way to make an omlette, and a good solution space is good for users.

My perspective is that harmonisation (which will take multiple forms: modularity, pluralism, base sets, extensions, mappings, round-trippability, feature-matching, convergence of component vocabularies, etc, not just the simplistic common use of a common syntax) will be best achieved by continued user pressure, both on MS and the ODF side, within a forum where neither side can stymie the legitimate needs of other.

Clear name

This is actually something that I have been pushing since early last year, in discussions with other SC34 people. It is part of the general observation that many of the problems with DIS29500 are not with the technology or the technical parts but can be fixed editorially: the scoping and conformance issues are examples. My point is not that “Office Open XML” is particularly confusing or that it should not continue as a brand name (not ISO’s business!), my point is rather that it is too similar to ODF/ODA/OpenOffice to be the name of the standard. I don’t know why the standard cannot have an extra part added to its name to be more descriptive. (And indeed if the plan to split out OPC to a separate part comes off, then the Ofiice Open XML really applies to the other parts so it may not be the best collective name.)

For example, the full name of the ISO Schematron is Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron.

But is this really a showstopper for the standard? Of course not: the brand OOXML is already out in the wild. And Alex Brown has indicated that this kind of issue might be at the bottom of the list for discussion at the BRM; it is the kind of thing where people are happy to spend days discussing, which Alex is clearly not going to allow. 120 people are not traveling from all parts of the world for a week to get the issues they have raised ignored because other people’s issues are taking a disproportionate amount of time.

Sound standard

This is where I (this blog) get quoted! The blog item was The design goals of XML.

Note the difference in approaches. My angle is “I think this is a problem, I hope it can be fixed.” Their angle is “He thinks this is a problem, therefore the whole process should be abandoned.”

I think there is a kind of bait-and-switch going on: to understand it you have to make the distinction in your mind between what a particular draft (e.g. DIS29500) says and the larger concept of what OOXML could be when fixed up (e.g. substantially the same, with the same design approaches, though different in details.) It is the difference between text and technology. Here is the ploy: first find a technical or editorial problem in the draft, then transfer this to OOXML as if it were intrinsic or necessary, then use it as evidence of the unreformability of OOXML, in which case there is no point fixing the draft since the whole thing stinks.

My POV, if anyone cares, is no different from what I wrote in 2005:

I read recently a criticism of the “Binary XML Infoset” project as polluting the stream. I believe the lesson to be learned from XML is not that “Everyone should use one format, it should be simple, it should be Unicode, it should use angle brackets” but the far more challenging “Respect-driven standards development produces really good and generally applicable results.”

Note in particular this:

when I read general, rather than technical, criticism of standards or standards bodies, I usually detect strategic sour grapes, where the organization or writer is trying to undermine a process that they cannot influence enough. XML wasn’t based on the mentality people who don’t or won’t use this are idiots but we want to add to the solution space.

All that being said, I think buried in this section is the germ of an entirely valid point: even things included for legacy reasons should be in standard notations. You have make a more specific judgment than legacy=good (as some Ecma some people are perhaps prone to) or legacy=bad (as some anti-ODF people are perhaps prone to).

For example, I have written about the integer measurement system EMU used in OOXML: this is unusual but useful and a common kind of thing to do (e.g. groff, PDF, etc). But I don’t see any reason for twips let alone half points, they are just a bunion and a carbuncle, if not vice versa. Are they showstoppers? Well, it would be really good to get gratuitous problems fixed now, rather than leaving it for maintenance. But it is a matter best practice, but not an actual error or gap.

Interoperability

Interoperability is a great motherhood word. No-one is perfect.

They complain that

While the proposers “agree that it is important for the specification to support multiple types of object linking,” they suggest changing oleLink(OLE Link) to oleLink(Generic Object Connection). And, instead of referencing the specific OLE2 connection they say to use any generic ‘embedded object’.

When we look at ODF we see they have an element draw:object-ole which has a definition represents objects which only have a binary representation, almost the same thing. So the ODF Alliance want to keep the reference to OLE (and make it a normative reference, which is probably dubious but I digress). Fair enough: lets make the spec better! But look at the use this issue is put to: the heading says “What is missing? Interoperability! Why ignore the re-use of existing standards?” but the use of existing standards is never mentioned in the text.

I suspect that the heading is a carry over from a previous draft, where the body text was changed as it was discovered that among the Editors Disposition of Comments are details of adding scores of references to the various standards used by OOXML (both in DIS29500 and in other proposed fixes.) But my point is that the conclusion is not supported by the evidence, and their reaction to the issues they raise is too strident and over-reacting.

Support for legacy documents

This begins with actually quite an interesting point, and the first really new things to consider. Should a new standard have deprecated material? Putting aside the general point that a fast-tracked standard is not a new standard but a review and rebadging of an an existing external standard, the comment is that OOXML is a different case than other standards where this mechanism has been used: like C++ these standards capture a living technology in which some parts are living and others are dieing, but the ODF Alliance thinks that compatibility or legacy options are only warranted when they reflect multiple previous implementations. I wonder whether the presence of compatibility options designed to handle old Word Perfect behaviours puts a spanner in the works for that argument?

From the interesting start, the material on this point rapidly descends, ultimately saying

However, from the details provided, it appears that Ecma is merely taking a subset of VML, giving it another name (DrawingML), and using it in places where VML was previously called for. What is deprecated
merely re-enters through the back door.

This is quite bizarre: VML and DrawingML are in different namespaces and I have not seen anything in the Editor’s Disposition of Comments about taking subsets of VML and renaming it. I’d love to know what in particular is meant by this. DrawingML is not something new, but part of the draft (VML had almost been entirely retired, the difference is that the Editor wants to completely retire it.) In particular, there is nothing in the section they quote (Response 92) about subsetting: there is only material on the mechanics of deprecating VML, removing references to it in favour of DrawingL, and enhancing DrawingML so that it can do every that VML did (for example, to support rich text comments); deprecating VML necessarily involves making sure that DrawingML has equivalent features, how else could it be? So the ODF Alliance comment here is completely wrong, perhaps they think they can get away with it because the Editor’s Disposition of Comments document is not generally available.

The background to all this is that France’s AFNOR in its comments asked that the standard be split up with all the core material in one part and all the deprecated functions, documented settings, VML etc in a second part. Many other NBs also asked for the standard to be split up and for OPC to be its own part. My suggestion, through Standards Australia, was to split into 9 parts for example. So ECMA’s proposal is to do both: a part for core, one for deprecated/legacy/VML material, and a part for OPC, but then to add various conformance classes for different application areas which would give the same conformance subset effect that having multiple parts would achieve. So splitting up is a straightforward and direct response to NB suggestions.

Consistency

Once the Editor’s initial Disposition of Comments document is out, then the issue of consistency rightly becomes important for reviewers. If the Editor accepts one comment with a particular fix on certain grounds, why not accept another comment with a similar fix on the same grounds? So now is exactly the time to be bringing up consistency issues. And there certainly might be inconsistent responses to different NB comments, where the NB comments are themselves incompatible.

It is the job of the BRM to work through as many of these these kind of issues as it can. The Editor can only say “Here is how I would solve this” and the BRM has to sort through the issues and contradictions. And ultimately it is the National Bodies who then decide whether the revised text of the standard passes their tests.

The ODF Alliance give two example of horrible inconsistent responses. One is concerned with which version of schemas is normative, with the choices being suggested of either the electronic version or neither. (I hope what will happen is that the schemas will be printed as an annex in the standard, and that many of the schema fragments in the standard will be removed. ) I don’t think they are very serious here, the standard will end up saying something, and that something will in all probability be whatever the BRM decided.

The other inconsistency concerns another one of the Standards Australia Issues I raised. I don’t see the contradiction here: one response concerns content-type labels, the other concerns how to locate executables. Maybe there is some deeper issue that has evaded me…I think there might be a confusion here between OOXML content types (which are expressed using MIME content type notation, and live in the [Content_Types].xml part) and relationship types (which are expressed using a URI syntax and live in the various .rels parts.)

Again, the reason to mention all this is not to say that it is not appropriate to bring up issues like consistency in the lead up to the BRM. My problem is in using these run-of-the-mill things that can happen in any standard as evidence that we should decide to disallow the revised OOXML spec ahead of fixing it.

They write:

Can we in good faith endorse a standard that is not technically sound with conflicting recommendations on technical remedies?

But hold on, who is asking for such an endorsement? The purpose of the BRM is to fix these, so that the identified tecnical unsoundnesses get addressed and that there are no conflicts in the editor’s instructions. Then, after these have been fixed, the National Bodies can respond by changing their ballot responses if they are satisfied.

I am sad if I may jeopardize the love of the ODF Alliance, but this document of theirs is so full of non sequiturs that I don’t see it as adding much light to the discussions. But perhaps the purpose of the document is not to join in any dialog but to try to withdraw participants from it.

[Update: I think if I make fun of poor efforts, I should also praise good efforts. After the disaster of the document above, I see the ODF Alliance has now put out another one OOXML: Top 10 Worst Responses to the NB Comments which is a much more respectable effort, raising reasonable issues this time, restraining itself from the dire and lazy mish-mash, and good-humoured rather than ranting, which is particularly welcome. Its only a document format. In a previous blog I mentioned the spin technique of “innoculation” with the example of list, but I don’t see new ODF Alliance document as that at all, but entirely appropriate, and the kind of things the BRM should be discussing and that non-armchair people should be thinking about. (Of course, I do make the same proviso as with the NB comments: if you parrot a set of points provided by a campaign, you are not doing an independent review of the standard draft but you are doing a review of the pre-fab talking points! If every NB comes with its own Top 10 Worst list, that allows much more coverage and improvement than just one: otherwise when the BRM takes 10 minutes to fix these 10, there will be four days left twiddling thumbs! :-) ) So, well done ODF Alliance, I hope this is a sign of things to come.]

Kurt Cagle

AddThis Social Bookmark Button

Over the last couple of years, I’ve worked extensively with Firefox, and while it still has its warts (and while I believe that its days of double digit rises in adoption are probably coming to a close) overall, I’ve found that it has become, for me anyway, my de facto browser into the web and the focus of most of the web applications (and extensions) that I’ve built in the last year. For that reason alone, if nothing else, I’ve been watching closely as Firefox 3.0 approaches its final release.

The second beta version of FF3 is now out, and I have to say that overall I’m feeling quite pleased with what I’m seeing, with a few caveats. Since I do generally dig into the application daily, my focus in trying it out (and in writing this review) is less on the immediate UI and functionality changes for the typical user and more on how its going to affect web software development. Thus, I ask that you forgive me for not talking about the new theme (okay, though not a radical departure from the old) or other user improvements or give you a lot of screenshots … I want to look a little more deeply under the hood.

M. David Peterson

AddThis Social Bookmark Button

Just noticed that the gang over @ Bungee Labs updated their site design, and couldn’t help but be inspired by the following graphic that greeted me upon my arrival,



Now *THAT’S* how to effectively tell your story in less words than exist in one of my average sentences. Nicely done, Bungee!

Kurt Cagle

AddThis Social Bookmark Button

A few years ago, I was briefly involved with a publishing company that was interested in packaging and producing eBooks. The challenges that we faced in trying to go from client submissions in Word, the occasional PDF and even straight text files proved to be daunting, largely because these works would in general place such a requirement on editors that it was not cost-effective enough to be a viable model. Most people working with Word have only a limited understanding and therefore use for word styles, and the notion of even more stringent structured documents was completely foreign to them.

Rick Jelliffe

AddThis Social Bookmark Button

I cannot think of a technical book that I have enjoyed more in the last decade than Yannis Haralambous’ new Fonts & Encodings from O’Reilly. It plonked on my desk this week, with a resounding bang: it has over 1,000 pages with many graphics.

The book really should be called “Fonts and their encoding” as it is not really about character sets at all, though Unicode appears throughout. It surveys the area of fonts, covering multiple platforms and systems, always wryly and clearly. Here is what I like in particular:

  • Reading this book you get the idea that you are encountering a world that would otherwise be almost closed to you: not just technical information but background and gossip. It is almost at the level of Ken Lunde’s CJKV Information Processing (perhaps the best technical book ever written for taking an inchoate mass of facts and constructing a clear and systematic survey), which is probably the highest praise I could give.
  • Haralambous’ style is delightful: Scott Horne’s translation does not attempt to lose the French accent (this is a translation of the original 2004 French Edition) but this is nothing but positive for the text. The result is a book that seems to have been written by a human not a droid. He seems to be a character like BIS’ Martin Bryan, who cannot talk for long without saying something really interesting.
  • Haralambous comes from a background of high-quality typesetting. One of the most tedious aspects of 2007 for me has been the interaction with people who know absolutely nothing about typesetting, even low quality typesetting, but who feel competent to be dogmatic on ODF and Open XML. His even-handedness and expertise are really admirable.
  • Haralambous is one of the instigators of the Omega project, which is a grafting of TeX, Unicode and OpenFont (= TrueType) fonts. As such he pays decent attention to fonts from all backgrounds: the last 400 pages of the book are appendixes on bitmap fonts, TeX fonts, PostScript fonts, TrueType fonts, MetaFont, and even a little section on Bezier curves. I see the book as really timely for the next generation of platform-independent Open Source publishing applications.
  • It is interesting to see how integrated XML is to the whole book. Notably, the lengthy sections on TrueType use Just van Rossum’s TTX XML-ization. The author really seems to get XML.
  • Apart from a great 70 page section on the History of Latin Typefaces, the book includes some good material on Arabic/Indic typesetting that I had not seen before. The treatment of CJK (Chinese/Japanese/Korean) issues I didn’t care for much: but it is a big area, and I think it would be great if a new edition of Lunde’s book could be prepared: CJK processing does not involve fiddling with glyphs much, so I can understand why there would not be much treatment of it here.

I haven’t read the sections on typographic programs yet: my license to FontLab is somewhere in storage but I haven’t used it for quite a while: just skimming the FontLab material here and it seems the book provides a lot of the information I didn’t have workable access to a decade ago. Cool!

The great thing about writing (and, hopefully, reading) a big fat survey book is that the gaps in the status quo become very evident. A decade ago, when I wrote my XML & SGML Cookbook, it became apparent that DTDs and grammars were not capable of representing many of the constraints and abstractions that document description languages needed: out of that idea eventually popped Schematron. Haralambous only briefly mentions it in this book, but his website has some papers where he describes his idea for typesetting based on textemes which comes out of his awareness of the gaps. It will be interesting to see what direction he takes there.

The only quibble I have about the book is that I would have liked to have seen more treatment of cutting edge technologies such as SIL’s work. However, the book’s strength is that it brings a modern European (in particular, a West continental European) perspective and I can understand that the line had to be drawn somewhere.

It is somewhat surprising to me to find a technical book where I think it would be more productive to have the book in my library rather than try to locate the information on the WWW. The local technical bookstores nowadays have computer sections that are full of product manuals and certification courses: finding a book that even has a sense of history and enjoyment of the subject matter is water in the desert.

I don’t know if this is an experiment by OReilly, to translate a book from their non-English operations, but it is really successful.

Kurt Cagle

AddThis Social Bookmark Button

Every so often, you come across a book that not only informs, but challenges your perceptions, leaving you seeing things in a way that you would not have before you started reading. I have a fair number of science fiction books that I’ve read over the years that left me in the major paradigm shift state after reading it (usually at about three in the morning), but its been rare in recent years that I’ve found a tech book that has done so. However, the book RESTful Web Services by Leonard Richardson and Sam Ruby (O’Reilly press, 2007) managed to do just that.

Kurt Cagle

AddThis Social Bookmark Button

It was perhaps inevitable - having turned the geospatial Earth into an animated, zoomable extravaganza, Google has turned its gaze skyward. With Google Sky, the tens of thousands of Hubble based images (as well as those of more prosaic Earth-bound telescopes) have been knitted into a seamless fabric that lets you explore the universe in myriads of ways - from zooming in on the Pinwheel nebula to charting the luminescent clouds of the Eagle hatchery.
Kurt Cagle

AddThis Social Bookmark Button

The Dow Jones Industrial Average (the DOW) did quite a dance today, with its peak to trough extending nearly 300 points before closing, pretty much at random, more or less where it started. I bring this up not to turn this column into an economic report about Wall Street (definitely out of bounds here, except perhaps in the discussion of Atom-based XML feeds retrieving DJ stats) but to discuss a bit about systems theory and to review a book that I think should be pretty much de required reading for XML architects.

I suspect that I’ve always been something of a systems theorist, and I’ve noticed that systems theory tends to attract architects like moths to a bright light (no comment about getting burned). You can tell the systems theorists out there - they are the ones that clandestinely like to play Sim City at work, who can readily tell you what the Austrian school of economics is despite not being an economist, who were getting nervous about calving ice shelves and CO2 concentrations long before Al Gore started doing his stage show. Some of us are scientists, some are programmers, some are environmentalists or economists, but the common thread that binds us together is that we’re the ones who never stopped asking “WHY?” as kids.

Rick Jelliffe

AddThis Social Bookmark Button

DonationCoder.com has a very good Word Processor Review by Zaine Ridling, divided into three tiers: Major Word Processors (Open Office, Office 2007, Word Perfect), Second Tier Word Processors (AbiWord, EIOffice, etc.) and Online Word Processors (Google Docs, etc.) that is well worth reading for an idea of the capabilities of each. The final Pro and Con tables are handy.

The predictable quibble I have is that the reviewer apparently believes that application features are disconnected from save formats. So while he opens with If ever a maxim fit, one size does not fit all applies accurately to word processors and diligently mentions the different feature sets of the different applications, these different features never need to save any information that ODF cannot handle, it seems.

I think the best resolutions is that if a document does use some features that a format cannot handle, the application should alert the user who can choose the appropriate format. For Office 2010, for example, a user could set ODF to be the default default, and OpenXML can be the fidelity default, for example. I think that is one good way to reconcile the basic ODF-wasn’t-designed-for-our-feature-set issue with the we-want-ODF-as-our-default-format issue. Rather than panicking ‘It is impossible to use ODF because it doesn’t support all these things” (which is clearly true for many, but hopefully not for most Office documents, presumably following one of the standard statistical patterns) on the one hand, or chanting “ODF gives you everything you need” on the other hand (which similarly is hopefully true for most, but certainly not all Office documents)

It would be interesting to also include the word processors from Adobe (FrameMaker), IBM and Lotus as well. And it would be interesting to also include validation reports where the XML-in-ZIP save formats were validated against their standard schemas, since validity is a great tool for determining whether an application is doing the right thing,

Rick Jelliffe

AddThis Social Bookmark Button

Bob is a really clear-thinking and enthusiastic guy, and one of most interesting to wine and dine with. His book Document Engineering is important for anyone who wants a better vision of where XML is leading us. I’ve just discovered the IT Conversations website, which has podcasts of various people of interest to me: Miguel de Icaza for example.

Bob’s podcast has much of interest. An idea that hadn’t registered with me before is that one of the drivers for (larger) business to adopt a document-engineering approach is because they need to componentize their business functions: a document doesn’t care whether it goes to Florence, Bangalore or Kinshasa. Globalization as a driver for XML: that’s a pretty strong driver.

Bob also has a blog with co-author Tim McGrath Doc or Die

Rick Jelliffe

AddThis Social Bookmark Button

People trying to figure out where they stand on the desirability of multiple overlapping standards for technologies, or who would scream if they hear the issue reduced to VHS versus Betamax one more time, might like to add this article Why China wants its own video standard onto their reading list.

Of course, the IP and licensing issues of MPEG have long been controversial; standards that are not royalty-free are entirely dubious, especially in the modern climate. I am writing this from Delhi, India, [which (outside my window at least) has the most beautiful greens of any city I have ever seen…I had heard of Assam gardens and so on but was not prepared for how vivid things are]; but from here China’s position against technologies with royalties that only rich countries (rich manufacturers, rich consumers) can afford is not just interesting or prudent, but clearly obvious.

Rick Jelliffe

AddThis Social Bookmark Button

I wasn’t there, but the XTech 2007 Conference seems to have its presentations online already: fast!

Scanning through them, one made me really happy. It was Henri’s talk on the WhatWG’s HTML 5 validation efforts. Actually, “I’ve won!” flashed through my mind. It was not because the HTML5 group had started to use multiple validation languages, along the layered or progressive lines I (and the DSDL rabble) have been advocating, nor even because they were using Schematron, nor even because Henri says that Schematron (and RELAX NG) while better than XSD were not as good as they expected (thereby giving me a challenge to show how they could do it in Schematron with the correct idiom, and thereby make me appear well smart).

No, what made me happy was a little line towards the end where the issue of generating usable user messages was raised (p41). This is the most important part of Schematron, not the use of paths or assertions or phases or flags or any of the mechanics, nice though they may be. The “big idea” behind Schematron, such as it is, is that the problem of validation is just as much (indeed, more) one of communicating constraints (and therefore unmatched constraints) to users as it is about representing them to machines. Validation is not just binary, or even a set of fixed outcomes: it is about determining, locating and communicating the status of a document and its parts.

This is especially because the user experiences the document often mediated through some user interface, not as elements and attributes: so validation messages that are given in terms of the elements and attributes rather than either the information model or the user interface will just be mystifying. And especially confusing when they give messages about where the problem was found, not what caused the problem: for example when there is a missing element and the error message is in terms of “Found unexpected XXX” rather than “YYY is missing”.

I am a bit of a broken record on this, but I think a relentless emphasis on the human user is really important for standards: XML succeeded by providing not only simplicity but native-language markup.

Simon St. Laurent

AddThis Social Bookmark Button

I’m here at the Web 2.0 Expo, a computer book editor surrounded by all kinds of possibilities for web-related books, articles, PDFs - pretty much everything here is publishable, and would interest someone. At the same time, though, there’s been a consistent message here: everyone out there knows what they want better than you know what they want. So….

M. David Peterson

AddThis Social Bookmark Button

Last Sunday the power supply on my *MUCH* beloved DevBox finally gave up the ghost.

Picasa Web Albums - xmlhacker - Dead DevBox

DSC00536.JPG

Death of the DevBox

As per the above photo, if not obvious, that’s a power supply half the size of my DevBox hanging off to the side. The machine itself is completely custom, right down to the screws that keep the sub-compact power supply and cooling system snugly fit inside. Finding a replacement is no easy task, and it was becoming more and more obvious that my last minute hack of desparation — ripping a power supply out of a nearby tower, pulling off the side of the DevBox, and plugging it into the motherboard — was not something that was safe and as such, something that I could expect to last for very much longer. Couple this with the fact that in its current state it was no longer a “portable” workstation (coupled with a flat-screen monitor and a reasonably sized ergonomic keyboard, you might be surprised at just how portable such a workstation can be) and it became all too obvious,

It was time for Timmy’s well deserved retirement. (< Yes, his name is Timmy. Long story… Don’t ask. ;-)

Anyone who knows me, knows that I have never claimed to be a Mac FanBoy. As per the intro to the photo collage of my first Mac purchase a year ago October,

Rick Jelliffe

AddThis Social Bookmark Button

Geekfodder!

Rob Cameron, who is a professor at Simon Fraser University, has released u8u16 in open source beta, a really exciting library which implements an “iconv” like transcoder (i.e. it converts data from one character set and encoding to another), and which uses the SIMD instructions that modern CPUs have.

I think I was the first person to write something on this technique, certainly on the Internet, in my blog item Using C++ Intrinsic Functions for Pipelined Text Processing a couple of years ago, but only because the idea was too obvious to people involved with DSP to write about, I gather: of course you can use instrinsic functions for text processing! My code just used C++ intrinsics as an optimization on top of C++ code. But Cameron takes it to another level: his code abstracts out the features of the most common SIMD devices so that his algorithms can be arranged to work on this abstraction and compile to a wide range of targets processors, and he can dispense with the code. He reports 4 to 25 times speed increases, depending on the data; which is very promising.

I would love to see an XML parser that combines Cameron’ SIMD work with the optimizations from IBM’s XML Screamer, which seem to increase the speed of Java processing by two or three fold. Cameron’s work is important because it gives a working abstraction that can inform decision-making on buiding SIMD-using capabilities into Java’s text processing.

M. David Peterson

AddThis Social Bookmark Button

Update: The first part of “Week 1 : The Zune Experience” (with more to follow later today) is now available @ http://dev.aol.com/blog/mdavidpeterson/2007/02/26/week-1-the-zune-experience

[Original Post]

So as I blogged about last Thursday, I received the Zune I was awarded for being one of the first 10 folks to create and publish a VHD-based instance of their rPath Linux-based project. In the 10 days since, I’ve realized a couple of things,

1) “WOW! You think maybe you could turn up the quality rating the next time you post a picture of yourself so you don’t look like a 14 year going through puberty?” Or is just the angle I’m looking at it again, this time from a different monitor?

Well, regardless, my apologies if I scared you, your children, love ones, or possibly any of your pets due to concerns over catching “Whatever the hell that is on his face! Beth, get some rubbing alcohol! John, *DON’T* touch the screen until we disinfect it!”

Yikes!

So, on to the next item on the list,

2) Zune ROCKS!!!

As made mentioned at the bottom of that same linked post,

M. David Peterson

AddThis Social Bookmark Button

So much to talk about, so little time, but none-the-less, let’s get this party started ;)

Amplee, IronPython, ASP.NET, WSGI, AtomicXML, and Xameleon Update

[Amplee@SWiK.net]

So both Sylvain and I have been jamming away at the integration of Amplee, IronPython, ASP.NET, WSGI, AtomicXML, and Xameleon. Attempting to merge together such a cross-section of various technologies, as you can imagine, has been interesting. None-the-less, we have things working pretty well at this stage, and have in the works an update to last weeks OSS XML Weekly Roundup, in which I will be providing all of the juicy new details in regards to progress. That said, if you would like to start peaking through the curtains to see what we have in store, please feel free: http://extf.googlecode.com/svn/branches/

Sylvain has already finished the first tutorial as it relates to getting Amplee running via IronPython and WSGI, and when he comes back online here in a few hours, we plan to continue forward fine tuning the API we are collaborative working on to integrate AtomicXML with Amplee via the Xameleon XML processing engine. And has he has pointed out at the bottom of the above linked post,

M. David Peterson

AddThis Social Bookmark Button

So this week I am working on a more hands on post that highlights the usage of Ruby, XML, and the .NET platform. The Gardens Point Ruby.NET development team recently released beta 0.6, and I’ve been getting my hands dirty, attempting to integrate Saxon on .NET-based XSLT 2.0 transformations. So that was a miserable failure. NOTE-TO-SELF: Never assume ANYTHING!

So I’m going to rethink this a bit, and will be back once I have something a bit more exciting to showcase that goes beyond what I assumed would be a no brainer via the latest Ruby.NET release. It may very well be a no-brainer, but to put this nicely, the IKVM.NET lack-of-documentation looks like the Encyclopedia Britannica compared to the Gardens Point Ruby.NET lack-of-documentation… YIKES!

Will update this post once I have things in a bit better shape than what is currently the case.

In the mean time, in attempting to get caught up with some email, I noticed this comment from Dr. Michael Kay on the IKVM.NET group list, and I haven’t been able to stop laughing since. It is most deserved as the Open Source XML Quote of the Week (which seems like an appropriate addition to each weekly post), and as such…

Open Source XML Quote of the Week


SourceForge.net: ikvm-developers

Thanks for the prompt reply (brilliant software, brilliant support, shame
about the documentation…)

Dr. Michael Kay to the IKVM.NET Mailing List

Back a bit later this evening with the mentioned code samples…

M. David Peterson

AddThis Social Bookmark Button

NOTE: While this is the first post in this weekly series, to stay in context with the weeks of the year I’ve chosen to start with week 5, which for those of us in whom use the Gregorian calendar, is the work week in which is now coming to a close.


So while there are several projects that I would like to bring to your attention, when at all possible, I am going to keep a theme attached to each week, to then wrap up with a highlight summary of those projects in which have recently updated with new releases, or have been brought to my attention and seem worth making note of but without any extended information beyond links to the projects SWiK.net entry and a short summary.

This week?

Aspect-Oriented Programming

Rick Jelliffe

AddThis Social Bookmark Button

Elliotte Rusty Harold has a review over at IBM DeveloperWorks XML in 2006 which is worth even a quick skim, because he identifies very clearly the split between grassroots technologies (good in his view) and pointy-haired-boss-imposed technologies (bad.)

The money quote:
Ten years ago, the grunt programmers and network admins were installing Web servers on surplus PCs reformatted with Linux while the CEOs and CTOs played golf with salespeople and mandated corporate-wide Exchange Server deployments. Those same low-level techies made XML a success by throwing out decades of legacy binary gook and replacing it with off-the-shelf, open source parsers. Today, these people are quietly installing REST, Atom, and RELAX NG.

This is a slightly romantic view, of course. C*Os don’t only spend their time arranging pointy-haired golf deals, but they are also the ones pushing for adoption of Open Standards (though, perhaps the push to Open Standards is as a result of golfing with IBM sales people rather than MS sales people!). And low-level techies are sometimes the most conservative of people; they may play the field early in their careers and strive to find the most fabulous technology to best handle a job, but sooner or later they fall in love, settle down, and don’t want a divorce.

Rick Jelliffe

AddThis Social Bookmark Button

Ken Holman sent me copy of the latest draft of the OASIS/UBL Methodology for Code-list and Value Validation, which is a pretty good use of Schematron. It looks like a neat and workable solution to a problem that is somewhere between baroque and a hard place using XSD.

Imagine you are a trading company: you have documents which various fields for countries: countries you can send from, countries you can send to, countries the US won’t allow you to export to, countries you can use as hubs, countries with regional offices, etc. And you also have lots of other documents with similar or different sets of countries. And countries are only the start: you also have product codes where different fields can have different sets of codes, and so on. And this may vary according to where the document came from (the Libyan branch office may have different rules from the Alaskan branch office). And, of course, the values of codes may have interdependencies, such as “the source must be different from the destination.”

So lots of uses of a standard vocabulary, but lots of local and changing subsets that are much closer to “business rules” than “datatypes”.

If you used XML Schemas, you could theoretically derive by restriction all the different subset codes, then use “redefine” on every top-level element that used the subsets. (You’d have to do this redefine on base types where possible, so that subsequent derived types would inherit the restriction, perhaps, except then you’d have to check that any subsequent derived types that themselves define restrictions are indeed subsets. Have a breakdown and a good cup of tea.)

With the Schematron approach, you select the items from the code list you want, and some magic tool provided by the methodology generates the Schematron code, which just uses simple XPaths (i.e. what processing software probably uses.) You could still use an XML Schema, just to constrain the lexical space very broadly, but the Schematron constraints would check the values against the list.

Rick Jelliffe

AddThis Social Bookmark Button

My company rolls its own version of Xerces for our products. We can add fixes or enhancements without fear of conlicts. Over the years, the list of things to do has decreased. Now it is just about down to removing HTML stuff, adding a SAX feature to ignore all entity references (editors need this) and customizing the horrible validation messages (humans need this.) One part of doing the in-house fork is running the program checkers on each Xerces release to gauge its quality.

Verdict? Pretty good.The best yet as far as automated software tests go: as a user of Xerces, it was a very encouraging result to me.

Rick Jelliffe

AddThis Social Bookmark Button

Rob Weir has done some interesting stats on XML parse time of real documents and the effect of increasing the elements and attribute names. The blog article is calledThe Celerity of Velocity. The result? Even though we expanded some NCNames to 32-times their original length, making a 5x increase in the average NCName length, it made no significant difference in parse time. There is no discernible slow down in parse time as the element and attribute names increase.

I don’t think he is claiming that this could happen forever or for all software, of course! Indeed, it might be the sign of crap software: if you went mad and allocated a 1K buffer for each name then copied the 1K of text startgin with each NCName you certainly would get constant parsing time regardless of name length.

Rob’s figures are of course difficult to accept. I would like them to be wrong. They seem to go against the kinds of stats that the Efficient XML proponents give. But a number is worth a thousand words.

Rick Jelliffe

AddThis Social Bookmark Button

Here is more of the fake coverage of the XML 2006 conference. Like Groundhog Day, we get to start from day one again and do it differently now that some more papers are up.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve just started going through the papers from XML 2006. I wish more people were putting their papers or slides up, so many are missing. Now that we have this new thing called the WWW, the readership for a paper is much more than just the conference participants; people like me on the other side of the world who weren’t there would love to read many papers. (I’d love to read the papers from Kitsis, Champion, Kay and Melton for example, who I all admire tremendously.) So here’s my fake realtime blog of the first day’s proceedings…

Rick Jelliffe

AddThis Social Bookmark Button

I’ve slagged off before in this blog about standards ideas emanating from ODF and Microsoft camps: just to prove I can be positive, courtesy of Dennis Ding’s Open Standards Updates blog comes an Open Standards Definition that looks pretty sane!

M. David Peterson

AddThis Social Bookmark Button

If it were possible to have TOO MANY XML/XSLT toys (which it’s not!), you could EASILY charge Todd Ditchendorf with a crime.

That said, and as suggested — It *AIN’T POSSIBLE*! (Dear Grammar Nazi(s), BYTE ME! ;) :D )

I’m like a phrickin’ kid in a candy story, I tell ya! — (Dear Grammar Nazi(s), Okay, I promise that was the last one (in this post, anyway.) ;) Much love in this heart of mine for you GN; *MUCH LOVE*! ;) :D )

So back to ditchnet.org candyland,

Exhibit A,

M. David Peterson

AddThis Social Bookmark Button

Four little letters…

LLUP

Life after email

As a social phenomenon, the end of email has been widely reported. The next generation doesn’t use it. As a technical phenomenon, spam is a persistent threat. Spam’s been a lot worse in the last couple of weeks (no doubt the reason I started thinking about these things); apparently the spammers have concocted a strategy that circumvents Bayesian filtering (it’s only temporary, I’m sure, but the next victory in spam filtering is only temporary too). �

I’ve noticed the same phenomenon. It’s getting really, *REALLY*, bad!

What’s next? IM, Wikis, web forums instead of email? Bleh!

Agreed!

Maybe I’m just too old to learn new tricks, but I want correspondence pushed to me (or I want the appearance of push, anyway) and I want to read and edit it locally, in the application of my choosing, not in some browser form

Agreed. Too much effort. The solution must be seamless, and work with the tools we already use for email-esque communication. In fact, the solution has to be developed in such a way that those with an established position in the email client/server market(s) can quickly, easily, and as mentioned (and is really the key, in my own opinion) seamlessly integrate with these tools such that the “switch” from the existing technologies (e.g. SMTP, POP, IMAP, proprietary protocols such as those used in Exchange for advanced workgroup/corporate communication/collaboration, etc..) may not even require a switch at all (i.e. a driver that allows each of these technologies to easily interop with any of the new required protocols), and if it does, will be as transparent as possible to the customer/employee, etc… who will be using it.

It occurs to me that with a little work, Atom might function as a replacement for POP/IMAP and the Atom publishing protocol might replace SMTP. I can see a glimmer of how I might move forward while mostly preserving a couple of decades of work habits. As usual, the social problems are larger than the technical ones

Yep, completely agree! Through the work I have been doing with LLUP, I have come to my own conclusions that there are a few additional off-the-shelf pieces necessary to complete the puzzle, but without a doubt, Atom and APP are the key behind all of this.

In fact, this was a point I brought out to Eve (Maler) a while back when Russ and I first spoke with her about LLUP. There have been a few people along the way who have insisted that “you guys are taking too long to finish this up” or “if this really was so simple, why not just finish it out and be done with it” to which the answer, as mentioned to Eve, is pretty straightforward,

M. David Peterson

AddThis Social Bookmark Button

Firstly, Sylvain Hellegouarch continues to astound me with his dedication to each of the projects we are working on together, as well as his ability to get stuff done. I’ll avoid making a list of all that he does, as

a) it would take too long to research,
b) it would take too long to write,
c) it would take too long to read,

Of course, by too long I don’t mean it wouldn’t be worth the effort, and instead, I can more easily and simply summarize this list by describing the impression you would be left with after reading this list,

One word (and one punctuation mark),

M. David Peterson

AddThis Social Bookmark Button

IEBlog : SSL, TLS and a Little ActiveX: How IE7 Strikes a Balance Between Security and Compatibility

Obsolete controls disabled through ActiveX opt-in

An important part of the ActiveX opt-in feature is doing good housekeeping of the ActiveX controls that come with Windows. Many sites will benefit from IE7’s new native XMLHTTP control and sites can continue to use the MSXML 6.0 and 3.0 controls. The MSXML 5.0 control will not be enabled by default. The WMP 6.4 player is also disabled because its been replaced by the WMP 7 generation controls. As we can infer from HD Moore’s month of browser bugs, using the newer controls and leaving older controls disabled helps reduce the chances of user being exposed to a security or stability issue in an older control.

Since this should be a straightforward change for most sites, we’re asking for your help in moving your pages towards the native object XMLHTTP, the latest version of MSXML or the newer WMP control. In the best case scenario, the change might be to simply swap in the native object for XMLHTTP or the newer CLSID for the current WMP control.

There was a time that I had every desire and intention to stay closely attached with the development of IE7 and the RSS Web Feed engine via forums, blogs, and in some cases, email communication.

Why did that change?

ADD. My desire to overcome my ADD tendencies and actually place my primary focus on one of a bazillion and a half projects I have rolling around in my head at any given second, of any given day, month, week, year, and etc..

In other words,

Kurt Cagle

AddThis Social Bookmark Button

I’ve been thinking about the Kudzu principle a lot lately. This particular “rule” was something I first observed in about 1999, and it goes something like this - XML, once introduced into a system, will over time continue to expand into that system. Kudzu was originally introduced into the American Southeast in the 1800s, in order to provide a way to more readily secure the loose clay soil that’s so predominant there. Unfortunately, like many invasive plants it very quickly expanded beyond its original boundaries and became one of the most aggressive weeds in the region.

XML is a mechanism for abstraction. Unlike OOP typed objects, however, the abstraction does not place specific requirements upon the local mechanism for implementation - instead XML forms a document object model where each particular element represents what in other languages would be considered classes or class properties (either implicitly declared - i.e., a string object, or explicitly declared). Because of this, it becomes possible to manipulate that particular object model using a generic set of commands that in general are unaware of the underlying semantics of the given class or property.

M. David Peterson

AddThis Social Bookmark Button

Squarespace - Blogging Evolved

Bloggers. Independent professionals. Small businesses. Picky people who need to maintain a web presence, who want exacting control over their site, and powerful publishing features that cover everything from blogs to files. Anyone who is sick of bargain bin services and is ready for an elite solution to their publishing needs. No technical skill is required.

A (short, I promise! :) Story,

A while back, and after listening to several recommendations, Russ (Miles) decided to host http://www.russmiles.com with SquareSpace (yes, you can host personal domains with SquareSpace. A BIG++ in my opinion.) There was a (short lived) problem however,

They were still emitting Atom 0.3 instead of Atom 1.0.

The rest of the story goes like this,

Hari K. Gottipati

AddThis Social Bookmark Button

InformationWeek reviewed online Ajax applications in 6 different categories: Calendar, Email, Info Manager, Spreadsheets, Webtops, Word processors. In their definite perception, it is evident that Google is the emergent conqueror in 4 out of 6 categories. Google lost the race in Webtops category and it is behind Zoho writer in Word processor category. Some of these outcomes won’t make sense for me as the way I looked at the applications. In my decisive point of view, Google is only winner in two categories: Mail, Info Manager and Zoho is also the winner in two categories: Spreadsheet, Writer. Anyway take a look at the results+my pick:

Feature Winner Runner Also Available Also Available My Pick
Calendar Google Calendar 30 Boxes CalendarHub Kiko 30 Boxes
E-Mail GMail Yahoo Mail AOL Mail Windows Live Mail GMail
Info Manager Google Notebook Backpack Voo2do TimeTracker Google Notebook
Spreadsheet Google Spreadsheets Zoho Sheet Num Sum iRows Zoho Sheet
Webtop Pageflakes and YouOS Goowy Protopage Windows Live Pageflakes
Word Processor Zoho Writer Writely ajaxWrite Writeboard Zoho Writer leads in Ajax versions, but ThinkFree(Java Applet version) is the best

Let me explain why I picked(some of those) differently:

Calendar: I like Google calendar, 30 Boxes along with Kiko. But 30 Boxes is the emergent winner in my opinion. I like the RSS feed feature in 30 Boxes(it supports many more feature such as Flickr, Google search etc) where you can add that to calendar. Google calendar doesn’t provide these features. What surprised me about Kiko is, it was on sale on eBay and the makers are no longer interested in taking Kiko further.

E-Mail: I am not a big fan of thread approach in GMail as it always combine received mails and sent mails together in single thread which some times confuses me. But the GMail’s greatness is simple look & feel and text ads which never bothers me. Also the GTalk integration with GMail is impressive. I like Yahoo mail interface better than GMail because of its drag & drop feature and I am more relaxed with folders than labels. But the major drawback in Yahoo mail is annoying ads which always bothers me and each time ad changes it distracts my concentration from e-mail. Same with Windows live mail. So my vote is for GMail, but I have to admit that at the moment it has lot of problems. At times it says “loading” and never loads unless you refresh the browser. Also GTalk in GMail tries to reconnect frequently and never works over proxy.

SpreadSheet: I like Zoho Sheet as it supports Charts. Since charts are involved with most of my spread sheets, I have to say Zoho Sheet is the winner. I oppose that Google Spreadsheet is the winner as there is no charting feature.

Info Manager: I agree with the InformationWeek results. I like Google Notebook as it is so convenient to store from any web page with the help of browser plug-in that sits as small, discrete icon in the browser. It also lets you store it directly when you highlight a section of a Web page by right-clicking on it and selecting the option to store to notebook. With the other applications in this category one has to visit their site to store the information.

Webtop: I enjoy pageflakes as it is simple, convenient and it has number of applications to choose. Pageflakes is developed by using Microsoft’s Atlas. I like YouOS, but it is more of a browser based desktop to manage Webtops and an IDE to develop Webtops. YouOS is developed by using Dojo toolkit. I will say Windows live gadgets are not upto the mark and I like Yahoo widgets than Live gadgets, but Yahoo widgets are desktop based.

Word Processor: I am surprised that ThinkFree could not make in the list. I assume InformationWeek did not even consider the ThinkFree online office as their Ajax version is not good, but their Java(Applet based) version has much advanced features which you cannot find with other online Ajax based processors. Since this is a Ajax applications comparison, they must have subsequently disregarded it. But its a worth to try it. It also proves that what cannot be done with Ajax can be done with Java(if you think initial loading of the Applet and forcing the browser to have Java is not a concern) . In Ajax applications, I accept that Zoho Writer is better than Writely as it has cool features such as word to HTML conversion.

Maps: Even though InformationWeek discarded this category, let me touch this too. If you consider the look and feel, again Google steals the show. But as a developer I worry about the features, not the look and feel. Compared to Google, Yahoo has number of APIs(not only Java Script APIs, it has Flash/Flex APIs) to mash up with maps. You can mash up maps with their Traffic API, Flickr API, Local Search API, Upcoming.org API, RSS feeds API. On the other hand, Microsoft has Brid’s eye images which is missing in Google maps. As per the reliable sources Google is currently(secretively) working on Bird’s eye maps.

What’s your pick among these applications? Google or you also say “Na!”. Share your thoughts in comments.

David A. Chappell

AddThis Social Bookmark Button

This week’s issue of eWeek features a comparative review of ESB products, including open-source offerings. eWeek Labs Director Jim Rapoza summarizes: “The Sonic ESB platform defined the ESB category, and Version 7.0 is the most mature and capable ESB available; Sonic ESB, coupled with the Sonic SOA Suite, is a powerful services platform.”

The feature can be found online at http://www.eweek.com/article2/0,1895,1997940,00.asp. It has nice things to say about each product, but also has a few digs for each one. I have included some excerpts from the article that I found interesting -

M. David Peterson

AddThis Social Bookmark Button

An ABSOLUTE GEM (Ruby? ;)) of a presentation that MUST NOT be missed comes from none other than Dave Johnson,

TriXML2006-BeyondBlogging.pdf (application/pdf Object)

Enjoy!

Jim Alateras

AddThis Social Bookmark Button

The BPEL4People white paper contributed by IBM and SAP describes how to support People Activities within the scope of the existing BPEL standards. It does not introduce any new constructs but it does suggest that future versions of the BPEL standard may be extended to directly support these concepts. The paper introduces the principle of a manual task which is executed by a human participant. This task, which is part of a long-running business process, fundamentally suspends the process until it is completed by the corresponding participant. In addition the paper presents the notion of the task list that is used to hold tasks (or people activities) for a participant, which may represent a specific person, a group of people, a collection of roles etc.

People links are used to bind a group of people to a business process similar to the manner that partner links are used to bind web services to processes. The people links are resolved at runtime to select the specific person/group of people that can execute a particular people activity. It may result in some query on an organizational directory.

As mentioned previously when the business process engine encounters a person activity it will suspend the business process until the person completes the associated task. (not always the case as there maybe parallel paths in the business process but it will stall one thread of the business process). In order to resume the business process the user or ‘task list agent’ needs to notify the business process engine when it has successfully/unsuccessfully completed the task. The task list agent should support a number of others features including query available tasks, claim task, revoke task and fail task.

During my time at Intalio, many years ago, we developed a set of processes on top of our business process engine (in that time it was based on BPML but has now moved to support BPEL) to support exactly the principles outlined in this paper. These processes were distributed with the engine providing support for people activity out-of-the-box. Although, there is no need to extend the BPEL language to support people activity I can also see the advantage of having language support for these type of activities.

The BPEL4People paper also covers the user interface dimension but that will have to be the subject on another blog entry.

M. David Peterson

AddThis Social Bookmark Button

Opera 9.0 Final released - Desktop Team - by Desktop Team

Hi All
We just released Opera 9.0 Final on opera.com!
Thanks to all for helping us test the product.

Hey cool! A little “out from nowhere comes the final release”, but still cool.

With this in mind, and the fact that they announced they would include support for XSLT 1.0 and XPath 1.0, and given Opera is a standards-focused company,

A + B =

XSLT processing failed!

Huh? Why would it fail? I’m using fully compliant XML and XSLT/XPath 1.0.

Lets see what the error console says,

XSLT - http://browserbasedxml.com/SessionConfig.xsl
attribute at line 13, column 40
Error: invalid expression: document(my:UserAgentInvalidDoc)
Call to undefined function: document

> Call to undefined function: document < ?

Hmmmm....

We will not stop work now that 9.0 is out and we do continue to fix issues. Stay tuned, the show goes on

Okay… well, I guess a few commercials to pay the bills is understandable.

For now, I guess we stay-tuned, and use XMLHttpRequest scripted transformations to gain access to the data we want to transform.

But Opera, no document function?

Ouch! That one hurts!

Rick Jelliffe

AddThis Social Bookmark Button

Eliminating final is a nice article by Elliotte Rusty Harold. It is in interesting exercise to take his arguments about arguments to method calls and apply them to the slightly higher level of web interfaces and loosely-coupled applications (including applictions distributed over time). What technology fits in to this level like no other? Schematron.

Rick Jelliffe

AddThis Social Bookmark Button

Glad to see the National Toilet Map is producing XHTML. The incontinent are best to avoid my suburb Darlinghurst. If you cannot, or you are out of Internet access and the map is not cached in your browser (and many homeless people are out of caches), then please go to the West of the suburb before continuing., Away from my place. Thank you.

Rick Jelliffe

AddThis Social Bookmark Button

Peter Sefton’s recent blogs are some of the few sensible, non-partisan things I have read about Open XML and ODF in recent times. He has some really good, detailed posts on Word 2007 recently and lists.

Peter is far from an anti-Microsoft partisan: indeed, he is probably a poster boy for the kind of developers Microsoft needs to attract with Open XML and Word 2007. He is one of the key players in the free ICE Integrated Content Environment project.

Peter says “the big point that always seems to get missed when people talk about word processing formats” is “use styles.” Styles represent a sweet spot between crappy hodgepodge unusable office documents and retargetable information as allowed (with more effort) by XML with all its information-protection mechanisms like validation.

Peter’s comments on list interoperability with ODF and Open XML are, I think, a real litmus test for how seriously we should take either (or both) as formats.

Rick Jelliffe

AddThis Social Bookmark Button

I’m heading off to Seoul tomorrow for the ISO SC34 meeting. The Working Group I’m in (WG1) looks after ISO DSDL and office document formats. I’ve been embarrassing my Korean friends asking why they made James Brown their King, and whether their capital’s anthem is “Get on up”; soooo childish I know.

The office document format are an very entertaining sideshow at the moment. ODF is a format derived from Sun’s Star Office product and now being taken up by IBM; it is being standardized through ISO by fast tracking an OASIS specification. Open XML is a set of formats derived from Microsoft’s Office 2007 (but being retrofited to old Office products); it is being proposed for standardization through ISO by fasttracking an ECMA specification. Both use ZIP files (for which there is no open standard!), support common media types, MathML and Dublin Core metadata. Both are XML languages: ODF has a RELAX NG schema, and I expect Open XML will have both XSD and RELAX NG schemas.

They are generating lots of media attention, FUD and lobbeying; but it ODF and Open XML both represent a victory for universal, ubiquitous, standard generalized markup, which is what SC 34 is in large part about. I see Gartner has estimated a less than 70% chance of ISO ratifying two XML office formats. What rubbish. I’ll know more next week.

Ultimately, it is not WG1 or SC34 that makes the decision. It is the national votes of each of the voting members of ISO: the national standards organizations like Standards Australia, ANSI, and so on. While local committees may feel that Microsoft has been conspicuous in their absense, so have the other big companies in recent years: the standards participation focus shifted to W3C and OASIS. But these committees are not stacked with anti-Microsoft (or anti-Sun) people, but with organizations who need good interchange and also need an XML retrieval for legacy documents in proprietary formats (.DOC, etc.). So I find it very difficult to agree with Gartner’s 70%; I’d put it the other way, with a 70% likelihood of success, at least.

Rick Jelliffe

AddThis Social Bookmark Button

A new draft of Open XML came out on my birthday. 4081 pages of PDF, and very impressive for anyone who has worked on specification and standards. Two things stick out: first how horrible XML Schema fragments are when stuck inline to document structure; second, how the implementation-neutral tone of the introduction is at odds with the elements for various kinds of Active X embedded objects. I suspect people would be a lot more comfortable if the elements for Active X embedded objects were in a different namespace, and gathered into an appendix of some kind. Antiques and curios. It will be interesting to see what the extensibility strategy will be (it hasnt been released in this draft.)

On the technical merits, well actually I dont know if they matter much. I say potato. Exporting to HTML or XHTML gives people base-level interoperability for most documents, which neither ODF nor Open XML will challenge; at the high end the solution is exporting to XML using a domain-specific schema (e.g. S1000D for military & aerospace) and not ODF or Open XML at all; in the casual middle we will have ISO ODF available, perhaps as the interchange format of choice, as well as ISO Open XML (if it is accepted) for when you need to track MS Offices capabilities closely. I think there is substantial value in a standard XML format for MS Office documents even within organizations that will mandate ODF for interchange and archiving. The availability of the alternatives reduces the need for ODF or Open XML to be the one true interchange format.

Probably coming from the industrial publishing background biases me here: the need for dumbed down interchange formats is real sure enough, but the need for intricate close-to-the-metal feature-exposing typesetting feature access is also important for different contexts. Words binary formats and RTFs weaknesses have long held Microsofts applications back from being happily usable in serious industrial publishing systems (or, at least, have often held back the people who adopted them.)

Rick Jelliffe

AddThis Social Bookmark Button

I’ve previously called for Sun to Open Source at least the unprofitable parts of Java in this blog. Sun announced some kind of intention to do something last week, and they have been moving in this area for a time, for example with the various projects at javadesktop.org. Tim Bray wonders why there has been some hostile reaction. I wouldn’t call Richard Stallman’s The Curious Incident of Sun in the Night-Time hostile, but I did have much the same feeling that the Linux announcement (though really great, three cheers for all concerned and Sun!) was not the thing we want to pop our corks for.

It is not surprising we are economical with our enthusiasm. Simon complains that Sun are “not pretending it was open source Java yet”, but Tim Bray calls it an “OSS license”. I suppose in the sense of a license to make Java easier to run with OSS operating systems, not in the sense of an license that makes Java itself OSS. A little confusing.

Rick Jelliffe

AddThis Social Bookmark Button

My favourite technical blogs at MS at the moment are

  • Jensen Harris: An Office User Interface Blog which spruiks and roots and toots about the new GUI for Office. (Java and Linux developers really need to look at this to get an idea of how far Java and Linux will need to come to meet the new bar; I’ve mentioned the Substance and Flamingo projects at javadesktop.org before.) Jensen writes well and has a light touch.
  • Dare Obasanjo is a good omen for Microsoft’s intellectual vitality. Likeable, knowledgable, pro-active and on-message about his own projects, the honesty of his comments on other areas gives credibility to his comments on his own projects. I didn’t read any Microsoft technical blogs until I started reading Dare’s blog, which overcame my suspicion that the blogs would be reformatted press releases.
  • Brian Jones: Open XML Formats is a goldmine of interesting information, though comments like Let’s allow people to choose the formats they want. I’m not sure anyone is opposed to choice. seems a tad insincere unless MS distrubutes an ODF plugin with Office. (My attitude on ODF versus so-called OpenXML is little different to the Groklaw-style cynics or the Peterson-style enthusiasts: I welcome both but have a big eyeroll thinking of the twenty years of missed opportunities which Microsoft has cheated its users out of by not providing an XML interface until recently: I remember trying out their appalling SGML Author for Word more than a decade ago and wishing they just had a simple mini-SGML version of RTF instead, like the Rainbow DTD. I hope the “Open” in “Open XML” refers to a change of thinking in MicroSoft management in favour of agressive interoperability.)
Rick Jelliffe

AddThis Social Bookmark Button

Fun software at http://www.faceresearch.com/ allows you to age or youth-ify your face, turn it into an El Greco painting, or morph it into a Manga hero.

Here is me: rick_jelliffe2.jpg

And me as a Manga hero: RickManga.jpg

Rick Jelliffe

AddThis Social Bookmark Button

Five years ago I looked at Eclipse and NetBeans as rich client platforms (RCPs) and thought both were crap for the purpose. Slow, ugly, immature, incomplete and bringing little but complexity to the table. I looked at both of them again anew last week.

The executive summary: both now look wonderful for Rich Client Platforms, you would need some pretty strong particular reasons to build your own framework now; Eclipse has the edge if your market is Java developers who probably have Eclipse already and your application can be made a suite-enhancing plug-in; Eclipse also has the edge if you want to provide a plug-in and a standalone version; NetBeans has the edge if you need Swing or all the OpenSorce libraries which use Swing—the wonderful SwiXML (which lets you specify Swing interfaces in XML, and uses reflection and the JavaBeans conventions to be very small and very fast) and the Substance look and feel library (whose Flamingo sub-project Topologi has contributed our BreacrumbBar control to) being high on my list.

Rick Jelliffe

AddThis Social Bookmark Button

I’m writing this blog on a Firefox browser on Windows XP. Works fine.

Actually, I’m not. The Firefox is running on an Ubuntu Linux operating system, which is being hosted through the free VMWare private virtualization software on top of Windows XP. Unix on top of Windows! VMWare promote a system where you package up configured applications with the ideal operating or desktop environment for them, as virtual appliances.

With all the talk of thin clients versus fat, this surely is the fattest kind of client! Not just a fat application, but the whole operating system with it! This is a great idea for software deployment

M. David Peterson

AddThis Social Bookmark Button

ongoing � The Cost of AJAX

So saying “AJAX is expensive” (or that it’s cheap) is like saying “A mountain bike is slower than a battle tank” (or that it’s faster). The truth depends on what you’re doing with it. In the case of web sites, it depends on how many fetches you do and where you have to go to get the data to satisfy them.

I would start with the above, move on to “The Real AJAX Upside“, and then call yourself a better, more informed hacker because of it. (Or a better, more informed Mort, if in fact such a term is more appropriate in your particular case (would you admit it if it was?))

Rick Jelliffe

AddThis Social Bookmark Button

The W3C XML Schema WG is looking at an XSD 1.1, with so far only the mildest of changes. Maybe that is for the better. At one stage I heard they were considering putting in guard statements on content model particles, using the streaming XPath subset that makes KEY/KEYREF so sad.

I was initially excited by the prospect of more Schematron-isms creeping inexorably into the infrastructure, but I changed my mind on consideration. In fact, it seems the worst thing to do: only compounding XSD’s tendency for extra complication without much power. Instead, I think the W3C XSD WG should attempt to align XSD content models with RELAX NG’s more powerful content models. I was heartened in this regard to see that Michael Sperberg McQueen, an leading XSD conspirator, had a nice paper (at Extreme XML?) on the parser technique adopted by RELAX NG implementations: I think what holds back rapprochement is in some degree unsureness about the theoretical implications.

I just read a great 2004 paper on this Expressiveness and Complexity of XML Schemas by Martens, Neven, Schwentick and Jan Bex. Boy, what a great paper! It is really nice to see a paper that has strong theory, strong awareness of the real world, and a willingness to empirically sample the web.

The gist of the paper is that XSD is only a slight advance on DTD in expressive power, but there is a way that would improve its power (impacting the UPA Unique Particle Attribution and the EDC Element Declarations Consistant constraints) and move it closer to the power of RELAX NG (unranked regular tree languages).

But the paper has a lot of other interesting things to offer: it samples over 800 XSD schemas and finds that few actually use anything more than DTDs could have provided. The ones that did use something more, only used 1 level local element declarations it seems: unexpected confirmation of the idea in my blog on PVL yesterday that a parent/child single step path was good enough for validation, apparantly not for 80/20 but for 99/1

Distrurbingly, 70% of the schemas were not correct in some way: the sample was 2 years ago and some tools that were notorious generators of bad schemas have been corrected now, so I hope things have improved, but still…sheesh!

But finally the paper gets onto a lengthy discussion of having streaming XPaths either as the left-hand side of the grammar or as a guard inside a grammar, as far as I can work out. In other words, this paper both discusses the kind of alignment with RELAX NG that I favour but also discusses the alternative(?) of using ancestor-based XPaths, and surprisingly fits them into RELAX NG too. Nail my hat to the ceiling! Streaming XPath guards and RELAX NG convergence are not as antithetical, perhaps, as they might appear.

Advertisement
O'Reilly Media

© 2013, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.