May 2006 Archives

Rick Jelliffe

AddThis Social Bookmark Button

The ISO SC 34 meeting here in Korea has been sweetness and light so far. Contrary to Groklaw’s claims, Microsoft has not attempted to prevent ODF by underhand methods AFAICS. Contrary to Gartner, it looks like Open XML will proceed through ISO fast tracking to national vote without incident too AFAICS. ODF has gained a lot in reputation by its ISO standardization and raised the bar. Open XML will similarly gain a lot by reaching or surmounting the same bar. Everyone’s a winner, like the Hot Chocolate song. (I am sure Sun/Microsoft/IBM hate having to go through the mechanics and uncertainties of standardization; I am sure many in SC34 don’t like standardizing schemas at the SC34 level, contrary to long-running policy, especially with the low-review, fast-tracking and PAS loopholes. But it is good for us.)

A nice phrase came up yesterday: ISO standardization of an existing standard represents a second round of openness.

Rick Jelliffe

AddThis Social Bookmark Button

A really common problem facing people moving over from SGML to XML (and yes, there are still industries such as aerospace that are still thoroughly SGML!) and from XML DTDs to XML Schemas (including RELAX NG, Schematron, XSD) is the unwillingness to forgo entities references for special characters. ISO defines a whole lot of special characters: © and so on.

In XML, you can define entities for special characters up in the internal subset of the prolog and use them in element values of attribute values. Or you can have entity declarations in external parameter entites that form part of the DTD. So if you get rid of external DTDs, you also get unresolved entity references in your information set.

M. David Peterson

AddThis Social Bookmark Button

Brian Jones: Open XML Formats : Spreadsheet performance - Shared Formulas

via a comment to the above linked post, Biff (unfortunately I don’t have a link (Biff, if you happen to read this and have a preferred link, let me know and I will update this post)) has this to say in regards to the first quoted piece from Mike:

“Since this blog is about the so-called power of Xml and associated API, how do you achieve this without a running Excel instance?” — Mike

Perhaps invest in better coders? This appears to be very sensible and straightforward optimization, not rocket science by any stretch of imagination. C’mon, do you believe every user out there must suffer from performance drops because your coders cannot get their act together? Bleh.

I have to say that I totally digg someone who can say it like it is…

Folks, software development is not an easy task. We all do, or at very least should understand this, right?

Rick Jelliffe

AddThis Social Bookmark Button

I’m heading off to Seoul tomorrow for the ISO SC34 meeting. The Working Group I’m in (WG1) looks after ISO DSDL and office document formats. I’ve been embarrassing my Korean friends asking why they made James Brown their King, and whether their capital’s anthem is “Get on up”; soooo childish I know.

The office document format are an very entertaining sideshow at the moment. ODF is a format derived from Sun’s Star Office product and now being taken up by IBM; it is being standardized through ISO by fast tracking an OASIS specification. Open XML is a set of formats derived from Microsoft’s Office 2007 (but being retrofited to old Office products); it is being proposed for standardization through ISO by fasttracking an ECMA specification. Both use ZIP files (for which there is no open standard!), support common media types, MathML and Dublin Core metadata. Both are XML languages: ODF has a RELAX NG schema, and I expect Open XML will have both XSD and RELAX NG schemas.

They are generating lots of media attention, FUD and lobbeying; but it ODF and Open XML both represent a victory for universal, ubiquitous, standard generalized markup, which is what SC 34 is in large part about. I see Gartner has estimated a less than 70% chance of ISO ratifying two XML office formats. What rubbish. I’ll know more next week.

Ultimately, it is not WG1 or SC34 that makes the decision. It is the national votes of each of the voting members of ISO: the national standards organizations like Standards Australia, ANSI, and so on. While local committees may feel that Microsoft has been conspicuous in their absense, so have the other big companies in recent years: the standards participation focus shifted to W3C and OASIS. But these committees are not stacked with anti-Microsoft (or anti-Sun) people, but with organizations who need good interchange and also need an XML retrieval for legacy documents in proprietary formats (.DOC, etc.). So I find it very difficult to agree with Gartner’s 70%; I’d put it the other way, with a 70% likelihood of success, at least.

M. David Peterson

AddThis Social Bookmark Button

House panel approves net neutrality bill | InfoWorld | News | 2006-05-25 | By Grant Gross, IDG News Service

Okay, so its only a small step, its an important step none-the-less.

But whats up with this:

M. David Peterson

AddThis Social Bookmark Button

Thanks for rushing to the rescue! :)

As per my last post:

M. David Peterson

AddThis Social Bookmark Button

Update: [May 26th, 2006 @ 15:06 MDT] The Phantom is back, and back with vengeance (or maybe just a link that you should really check out in regards to this same subject matter.)

Gracias, Phantom. :)

Update: [May 25th, 2006 @ 16:34 MDT] And just like that, *POOF*, via a phantom commenter we discover a source of documentation in regards to Mozilla’s Storage API, which is kind of what I was hinting at when I mentioned code diving. However, as the above linked document points out,

Rick Jelliffe

AddThis Social Bookmark Button

A new draft of Open XML came out on my birthday. 4081 pages of PDF, and very impressive for anyone who has worked on specification and standards. Two things stick out: first how horrible XML Schema fragments are when stuck inline to document structure; second, how the implementation-neutral tone of the introduction is at odds with the elements for various kinds of Active X embedded objects. I suspect people would be a lot more comfortable if the elements for Active X embedded objects were in a different namespace, and gathered into an appendix of some kind. Antiques and curios. It will be interesting to see what the extensibility strategy will be (it hasnt been released in this draft.)

On the technical merits, well actually I dont know if they matter much. I say potato. Exporting to HTML or XHTML gives people base-level interoperability for most documents, which neither ODF nor Open XML will challenge; at the high end the solution is exporting to XML using a domain-specific schema (e.g. S1000D for military & aerospace) and not ODF or Open XML at all; in the casual middle we will have ISO ODF available, perhaps as the interchange format of choice, as well as ISO Open XML (if it is accepted) for when you need to track MS Offices capabilities closely. I think there is substantial value in a standard XML format for MS Office documents even within organizations that will mandate ODF for interchange and archiving. The availability of the alternatives reduces the need for ODF or Open XML to be the one true interchange format.

Probably coming from the industrial publishing background biases me here: the need for dumbed down interchange formats is real sure enough, but the need for intricate close-to-the-metal feature-exposing typesetting feature access is also important for different contexts. Words binary formats and RTFs weaknesses have long held Microsofts applications back from being happily usable in serious industrial publishing systems (or, at least, have often held back the people who adopted them.)

Rick Jelliffe

AddThis Social Bookmark Button

Peter Heneback has a good article “Advanced XML validation” at IBM’s Developerworks site. They have a few worthwhile Ogbuji articles on Schematron too.

Rick Jelliffe

AddThis Social Bookmark Button

I’ve previously called for Sun to Open Source at least the unprofitable parts of Java in this blog. Sun announced some kind of intention to do something last week, and they have been moving in this area for a time, for example with the various projects at javadesktop.org. Tim Bray wonders why there has been some hostile reaction. I wouldn’t call Richard Stallman’s The Curious Incident of Sun in the Night-Time hostile, but I did have much the same feeling that the Linux announcement (though really great, three cheers for all concerned and Sun!) was not the thing we want to pop our corks for.

It is not surprising we are economical with our enthusiasm. Simon complains that Sun are “not pretending it was open source Java yet”, but Tim Bray calls it an “OSS license”. I suppose in the sense of a license to make Java easier to run with OSS operating systems, not in the sense of an license that makes Java itself OSS. A little confusing.

Rick Jelliffe

AddThis Social Bookmark Button

How delightful it is to be confounded, when something that shouldn’t work does work well, and when we are forced to admit the world is not boring, predictable and under control. The success of XML is such a thing, as was the success of HTML. I have come up with a new little theory about the success; maybe it is not new, I forget so much: undoubtedly it is obvious to someone else and has been raised and pooh-poohed before.

Lets imagine we take a database (a set of facts) and a set of human comments on those facts and a set of metadata about the whole thing. We’ll call that our data. Now lets categorise the relation between an information item and another information as:

  • intimate
  • strong
  • moderately strong
  • connected
  • similar

Such categorization is in addition to the labels on the data. Even though these relations are very general, they are enough for a human to clarify a lot of semantics based on the labels. Not nearly as clear as “has a” or “is a” relations, or grouping as bags or sets, or labeling as about or description or topic. Much more fuzzy. Nothing like what goes on with relational data or RDF.

But those categories are just what XML markup provides. An attribute suggests intimiate relationship. A child is strongly related to its parent. A successor element is moderately strongly related to its predecessor. A referenced or linked to element is connected to its parent. Two information items with the same name have some kind of semantic similarity, which increases the more that their context grows. A programmer uses these to glean the meaning of the markup.

Now the relations I suggest may not be perfect; you could improve on them undoubtedly. And they may not be reliable guides, in that there’s nowt as queer as folk and folk make schemas. And the categories do not necessarily correspond to neat or orthogonal logical or linguistic categories. They don’t need to. But surely we can reverse engineer something about how humans work from their artifacts. Are there some kind of quasi-linguistic properties that make XML successful, apart from the obvious reasons of representational power, internationalization power and corporate power?

Of course, there are other ways to look at it: elements/attributes as noun/verb, as substance/accident, and so on. Those operations can also be at work.

Michael(tm) Smith

AddThis Social Bookmark Button

Related link: http://barcamp.org/BarCampAmsterdamII

I’m at >Bar Camp Amsterdam II. Sorry, I don’t have time to write much,
but here’s some photos. You can find more at Flickr, tag: href="http://flickr.com/photos/tags/barcampamsterdamii/"
>barcampamsterdamii.

src="http://static.flickr.com/54/149792518_4d5e538c62_m.jpg"
width="240"
height="180"
border="0"
alt="iRex iLiad e-ink device"
title="iRex iLiad e-ink device"
style="border: 1px solid rgb(204, 204,
204); padding: 4px; margin-right: 8pt; margin-bottom: 0pt;"/>

>iRex iLiad e-ink device
src="http://static.flickr.com/56/149792515_744da4170e_m.jpg"
width="240"
height="180"
border="0"
alt="Werner from iRex"
title="Werner from iRex"
style="border: 1px solid rgb(204, 204,
204); padding: 4px; margin-right: 8pt; margin-bottom: 0pt;"/>

>Werner from iRex
src="http://static.flickr.com/54/149798604_bbc440a84b_m.jpg"
width="240"
height="180"
border="0"
alt="Avi Bryant from Dabble DB"
title="Avi Bryant from Dabble DB"
style="border: 1px solid rgb(204, 204,
204); padding: 4px; margin-right: 8pt; margin-bottom: 0pt;"/>

>Avi Bryant from Dabble DB
src="http://static.flickr.com/48/149792516_4e2cd2798d_m.jpg"
width="240"
height="180"
border="0"
alt="Freeman Murray from Pune, India"
title="Freeman Murray from Pune, India"
style="border: 1px solid rgb(204, 204,
204); padding: 4px; margin-right: 8pt; margin-bottom: 0pt;"/>

>Freeman Murray from Pune, India
src="http://static.flickr.com/56/149792517_0e0e12982f_m.jpg"
width="240"
height="180"
border="0"
alt="Ryan King from Technorati"
title="Ryan King from Technorati"
style="border: 1px solid rgb(204, 204,
204); padding: 4px; margin-right: 8pt; margin-bottom: 0pt;"/>

>Ryan King from Technorati
src="http://static.flickr.com/45/149771003_4c8bdff299_m.jpg"
width="240"
height="180"
border="0"
alt="Iolaire McKinnon"
title="Iolaire McKinnon"
style="border: 1px solid rgb(204, 204,
204); padding: 4px; margin-right: 8pt; margin-bottom: 0pt;"/>

>Iolaire McKinnon
Andrew Savikas

AddThis Social Bookmark Button

I’ve been a big fan of XMLMind as a DocBook editor since first hearing about it several months ago, and it looks like I’m not the only one — I was quite surprised to see XMLMind is currently the top result on Google for “docbook editor”, especially without the marketing budget of Oxygen, Epic, or Stylus.

The folks at Pixware really found a nice balance between being reasonably friendly for non-techies, and still powerful for advanced users. I love that I can work in a WYSIWYG (CSS-based) environment, and rarely need to use the mouse. And with a very usable free version, they’re definitely worth looking into as an authoring tool.

M. David Peterson

AddThis Social Bookmark Button

via a post from earlier today, Brian Jones, a program manager working on the XML functionality and file formats in MS Office, both reports, to then extend with additional commentary writes,

Draft 1.3 of the Ecma Office Open XML formats standard

Wow, we finally have an updated draft of the Ecma Office Open XML formats standard! http://www.ecma-international.org/news/TC45_current_work/TC45-2006-50.htm I’ve been waiting for a long time to be able to share all the great work that’s been going on in Ecma TC45, and it’s so awesome that we have a new public draft. I can’t wait to hear what everyone thinks. If you go to that site, you’ll see three different downloads:
Draft 1.3 of the spec - The big download is the spec itself in PDF form. It’s about 25 megabytes and is around 4000 pages.
Draft 1.3 of the spec in the Open XML format - Alternatively, you can download the .docx version of the spec. Once Beta 2 comes out, you can open it that way (although opening 4000 pages of content with beta software may be slightly problematic )
Schemas - The schema files are also available for download. They are available in a ZIP file, that also contains an index.htm file that describes each xsd

We’ve been working really hard over the past 5 months bringing this standard along. There is still a lot of work to do, but you’ll see pretty clearly that we’ve made a ton of progress over the initial submission from last year. We have weekly 2 hour phone conferences (they are actually at 6am my time which is not ideal ), as well as 3 day face to face meetings about every 2 months. The contributions from everyone has just been outstanding. It’s so awesome to work with such a diverse group of people. While the initial submission was made by Microsoft, it’s now completely in Ecma’s control and we’ve had a lot of help from Apple, Barclays Capital, BP, The British Library, Essilor, Intel, Microsoft, NextPage, Novell, Statoil, and Toshiba.

***Note*** Remember that this is just a draft. Some sections of the spec are much further along than others, so keep that in mind while you are looking through the spec. If you are in an area that looks like there isn’t much information, odds are we just haven’t gotten to that yet.

A lot of valuable information in both this post, and as I have recently come to discover over the past couple of weeks, in Brians overall blog entries in general. If you don’t already, I would HIGHLY recommend (as does Rick Jelliffe from several days before me in this post) adding either the Atom 1.0 or RSS 2.0 feed to your feed reading mechanism of choice.

The world is filled with all types of document formats, and opinions as to which one is better for one purpose or another, but in the end content is King. If we could design one document format that could persist from now until the end of time, if that format were to never contain any content, it simply would not matter.

That said,

M. David Peterson

AddThis Social Bookmark Button

Update: This just in…

via a recent blog entry from Anne van Kesteren

Attribution

Hereby my apologies to everyone who had to waste his time by writing a rant, because the Web APIs WG and probably Dean and myself in particular (being the editors) didn’t get the attribution right. This was fixed quite soon after the first draft was released in the editor’s draft of XMLHttpRequest, but you can’t change the published version. Sjoerd just told me we made the frontpage of XML.com with that. Great! The current draft reads: “Special thanks also to the Microsoft employees who first implemented the XMLHttpRequest interface, which was first widely deployed by the Windows Internet Explorer browser.”

So it seems that I can now make some changes to my text to instead read as follows:

It seems that some folks have a hard time understanding are human and make mistakes just like the rest of us. Like me, for example. (more often than I wish I would :)

1 - How to say thank you to someone who has helped them build a better web. From all of us Microsoft, Thanks! :)
2 - How to We all recognize that it was you and your incredibly talented and dedicated teams of software developers who both designed, built, and released all of which is commonly refered in “modern” terms as AJAX.
3 - That all they did wasThanks for letting us copy yourwork, without requiring us even to ask. We didn’t mean to come across as if we were taking credit for your work as if it was their our idea in the first place. But sometimes the way the system is designed, correcting obvious mistakes is impossible in a way that doesn’t make it seem like this was an intentional act of negligence.

I’m notw one of them again. From all of us, Thank you. ! I We use your work in pretty much every aspect of my our web development life these days, so I we can quite easily state that without your efforts, I we couldn’t do what I we enjoy most in my our life.

Building cool software. :)

Of course there’s LOTS of other stuff that you’ve developed that we all seem to take for granted. The list is long, so for now please know that your efforts have not gone unnoticed.

At least by some of us anyway. In fact, we all take notice of these things, even if we don’t always make it known.

From ALL of us, Thanks again :)

[Credit to Dare Obasanjo for bringing this matter to the the attention of his readers.] (see how easy that is That feels much better to get this all straightened out :))

[NOTE: Thanks Anne! I’m glad to see that I was mistaken with my concerns :)]

Update: It’s unfortunate that the world is filled with people who can’t seem to understand the concept of learning about life, and instead preaching to others their opinions as if they are so far beyond what anybody else could possibly understand in regards to just how incredibly brilliant they are, and anybody else is not.

Sorry for the need to turn off the comments, as it seems that some of you might have something of value to add to this overall conversation. Sadly, the current trend of visitors have ruined things for the rest of you… as they usually tend to do.

Rick Jelliffe

AddThis Social Bookmark Button

Dare Obasanjo quotes Tim Ewald’s comment

I’ve started thinking about my schema not as the definition of what this system needs right now but as the definition of what the data should look like if it’s present instead.

Glad to see this, because it gets us to the start of where Schematron is useful: if the XSD schema just expresses only the most invariant of the invariants of the schema, then what do we use to express the constraints appropriate in t particular phases of a document’t life or history or evolution? The “contract” is just one kind of phase.

There was some debate (Roger Costello was a champion) of default openness in XSD. But I have seen increasing use of the idea that the master/base schema has to be as loose as possible: indeed, derivation by restriction is fragile and impractical without it, because if you need a derived schema to have something optional you need to go back to the master to make it optional. A friend reports that he has taken to calling super-loose base schemas “vocabularies” and “ontologies” to avoid confusing people who expect a schema to be really prescriptive.

Rick Jelliffe

AddThis Social Bookmark Button

My favourite technical blogs at MS at the moment are

  • Jensen Harris: An Office User Interface Blog which spruiks and roots and toots about the new GUI for Office. (Java and Linux developers really need to look at this to get an idea of how far Java and Linux will need to come to meet the new bar; I’ve mentioned the Substance and Flamingo projects at javadesktop.org before.) Jensen writes well and has a light touch.
  • Dare Obasanjo is a good omen for Microsoft’s intellectual vitality. Likeable, knowledgable, pro-active and on-message about his own projects, the honesty of his comments on other areas gives credibility to his comments on his own projects. I didn’t read any Microsoft technical blogs until I started reading Dare’s blog, which overcame my suspicion that the blogs would be reformatted press releases.
  • Brian Jones: Open XML Formats is a goldmine of interesting information, though comments like Let’s allow people to choose the formats they want. I’m not sure anyone is opposed to choice. seems a tad insincere unless MS distrubutes an ODF plugin with Office. (My attitude on ODF versus so-called OpenXML is little different to the Groklaw-style cynics or the Peterson-style enthusiasts: I welcome both but have a big eyeroll thinking of the twenty years of missed opportunities which Microsoft has cheated its users out of by not providing an XML interface until recently: I remember trying out their appalling SGML Author for Word more than a decade ago and wishing they just had a simple mini-SGML version of RTF instead, like the Rainbow DTD. I hope the “Open” in “Open XML” refers to a change of thinking in MicroSoft management in favour of agressive interoperability.)
Rick Jelliffe

AddThis Social Bookmark Button

This week I have been watching a collegue apply mashup techniques to revitalize a quite large (25 pages, each with mulitple frames, plus various popups) old Java web application which was initially designed almost ten years ago, but its user interface has remained basically the same (except for graphics and minor changes) since about 2000, with HTML designed to work on IE 5 and quite a lot of use of JavaScript.

The application is pretty well structured three-tier-on-one-server system, backend DBMN, application logic middleware one-servlet-per-frame generating XML, then presentation level servlets one-servlet-per-frame converting the XML to HTML. My collegue’s challenge has been to create an entirely new interface, more based on blogging systems but using XHTML, CSSZenGarden-ish CSS techniques, reduced JavaScript only for menus, almost no use of HTML tables (CSS positioning instead), and removing statefullness from the client end as much as possible. And no change of the Java code.

The biggest change is that the Java servlets previously corresponded to frames. Now there is effiectively one big XSLT script which potentially queries any of the servlets to produce the mashed up page. The programmer intends to refactor the big XSLT: the trouble with a single script is that it become difficult to have multiple programmers working on the code. But the project itself is fairly interesting: turning a c. 2000 state of the art HTML interface into a c. 2006 state-of-the-art XHTML interface with utterly different user interface from every aspect, but with no change to the XML-generating Java.

So even though the XML was designed with the idea “what information is needed by this frame in this application” once all the information for the application is exposed as XML by all the servlets, the information can be mashed up into an entirely new interface. It is the universal access to all the data that is the key here: that the data is exposed as a URL from a servlet using XML enables it too, but the fact the the data is open reduces the need to alter the Java “application logic”: indeed the “application logic” performed by the middle no longer reflects the application logic as perceived by the user, the connection between middleware servlet and frame no longer holding.

So this is not AJAX nor mashups in the sense of client-side combination of independent data sources. But the excitement about client-side interactions often means more JavaScript in effect, and the fact that JavaScript programmers frequently are keen to reduce the amount of JavaScript in a system design speaks volumes. The option of using server-side mashups generating beautiful XHTML with aggressive use of CSS and JavaScript for minimal dynamic interface controls shouldn’t be lost in the AJAX noise.

M. David Peterson

AddThis Social Bookmark Button

PLEASE NOTE: I made a mistake. After careful reanalysis, I now believe the original analysis I made regarding Tim Bray’s blog entries that I added later and labeled an “Update” contains innaccuracies that require that I remove the content from this entry, annotate a new file with proper information stating this was an error that I can no longer stand comfortably behind as the author and have made this publicly available to ensure proper derefrencing can be made.

No one asked me, or even suggested to me that I remove these comments. I did this on my own accord based on my own decision that this was something I could no longer stand behind, but yet must take full responsibilty for the innaccurate content in a public manner to ensure that this information can be properly propogated.

I also owe Tim an apology. This was not his mistake, and instead mine.

Tim, my apologies. I took things too far out of context, without applying enough care to ensure that my final evaluation was, in fact, something I could continue to stand behind with any level of integrity. I couldn’t. It was not a deliberate mistake, but a mistake none-the-less.

Again, my apologies.

The rest of this entry (which was the original post before the additions mentioned were made) I both can, and do stand behind, as I believe that it contains accurate, well researched information. Obviously there are some references that a few folks may not be all that happy about, including Tim. But the content I have now dereferenced was not something that belonged here anymore… I hope you can understand my reasoning for both removing it, annotating it, and making this publically available to ensure that the innaccuracies can be properly referenced and propogated as necessary.


[Original Post]
Consortiuminfo.org - On the Art (?) of Disinformation: telling the Big Lie

The offense of the Big Lie on the personal level is its assumption that, “I can lie to you and you won’t catch me.” Taken to the marketplace, and included in letters to government agencies, the effect is pernicious. As a result, exposing the Big Lies is both important and necessary - and hence the reason for blog entries such as this.

This is becoming silly. We no longer live in a world where software is compared on a feature-by-feature basis, and instead debates from one blog to the next by folks who are dissecting legal documents attempting to find “fraudulent” statements to extend the idea that, in fact, the real pioneers, the real hero’s, and real men and women fighting for the rights of the common man and common women who, if Microsoft had their way, would be left shoe-less, shirtless, penny-less, and servants to the Almighty Micro-God, bowing down to their every demand just because “thats what we’re supposed to do…. The GREAT ONE has spoken, and told it to be so.”

First off…

Kurt Cagle

AddThis Social Bookmark Button

I had an opportunity about a month ago to work with the Microsoft Internet Explorer team to help improve the browser. It was an extraordinarily tempting offer, and it was largely due to family pressures on my part that I reluctantly decided that it was just not possible to do it. The interview was exciting, I had a chance to talk first hand to a number of senior people with the team, and it has left me with a considerably changed impression of both Microsoft and their developers’ aspirations in producing the best product possible. If circumstances has been a little different (if I hadn’t moved to Canada late last year) then I suspect this would have been a Microsoft post you’d be reading now.

One thing that I realized, however, was that the Internet Explorer team has an amazing opportunity if they seize it now. Through a number of circumstances, one piece of technology that was never incorporated into the IE browser was a module capable of handling XHTML. Now, this may seem to be a fairly trivial omission - XHTML isn’t exactly blazing through the commercial sky yet as a must have technology (though its getting there) - but I’ve come to believe that in fact XHTML may be the key to one of the biggest problems that they face with IE - the problem of vendor legacy.

Rick Jelliffe

AddThis Social Bookmark Button

Structure Document Complexity Metric

The Structured Document Complexity Metric asks the question “How complex is this document set or schema?” for the purposes of project estimation and management. The metric works for a sampled set of documents (XML or SGML) and for grammar-based schemas as well. A schema that perfectly describes a set of document instances will have the same metric as that set, and so can be used to judge how optimal a schema is.

Rick Jelliffe

AddThis Social Bookmark Button

XML Mapping Additions Ratio

This metric asks the question “How many fields are in the intended schema that are not in the original schema?” It can be used to judge whether adopting a standard schema unchanged will have the (possibly unintended) consequence of adding extra elements or attributes to support.

Rick Jelliffe

AddThis Social Bookmark Button

XML Mapping Completeness Ratio

This metric asks the question “How complete a mapping can be made from a document in one schema to a document in another schema?” It can be used to judge whether some intended schema is capable of holding the information in an existing schema.

Rick Jelliffe

AddThis Social Bookmark Button

Production Count

This metric asks the question “How complex is this schema?” (or DTD).

Rick Jelliffe

AddThis Social Bookmark Button

Evidence-based management needs comprehensible information; metrics are distilled facts: not a bad fit.

Here is a series of blogs giving a metric that can be useful in many areas of XML project management, from verifying the suitability of adopting a particular schema, to making sure that only work and capabilites arising from business requirements are being carried out, to estimating the price variation that a schema change may entail.

Everyone using XML already uses a metric: well-formedness! Validity is also a metric. (I am simplifying away the difference between a metric and a measure in these blogs: pedants please lower your hackles!) But the metrics for XML on the Web are either concerned with communications and information theory, or are based on programming complexity measures, or are a little polluted by voodoo ideology about good structures and bad structures; I don’t buy into the latter, at least not at the current state of knowledge. But there is a need for a good set of metrics for XML project management, scoping and to inform XML schema governanc, so I thought people might be interested in some of the metrics I have been developing and using.

They all address different, but to me vitally important, aspects of XML projects, and most are, I hope, common sense. Of course, you can make up your own metrics as well: but I think it is good to at least have a basic vocabulary of XML metrics to use or adapt or decry as appropriate.

Element and Attribute Count

This most basic and coarse metric asks the question “How many element and attribute names are there?”

Rick Jelliffe

AddThis Social Bookmark Button

Fun software at http://www.faceresearch.com/ allows you to age or youth-ify your face, turn it into an El Greco painting, or morph it into a Manga hero.

Here is me: rick_jelliffe2.jpg

And me as a Manga hero: RickManga.jpg

Dan Zambonini

AddThis Social Bookmark Button

Last year, I suggested that the Web 2.0 hype could be partly responsible for a lack of interest in the Semantic Web.

Google Trends can’t prove this, but the results of search volume for “Web 2.0″ against “RDF” shows a worrying state of play for the Semantic Web crowd.

Kurt Cagle

AddThis Social Bookmark Button

It’s been a few months since I’ve addressed the issue of the Massachusetts IT Department’s decision to consider the OASIS Open Document Format as their primary format for interoperability and generation of documents. The last couple of weeks has seen a number of significant new developments that all seem to point to the very real likelihood that ODF will in fact win the day.

According to a series of articles on Groklaw MA Asks: Can Anybody Out There Make MS Office Interoperate with ODF? and OpenDocument Foundation to MA: We Have a Plugin, the Massachusetts IT committee raised the question about whether Microsoft could produce some form of plug-in that could be added to the current version of Office to open, display and save ODF content. They hemmed and hawed, arguing (as they had before the EU) that they were missing critical documentation that needed to be available in order to generate such a plug-in and that it would take an undue burden of developer time during a period where they were moving close to the full release of the next version of Office.

M. David Peterson

AddThis Social Bookmark Button

I just finished up a comment to Matthew Russell’s recent “Coming Soon: Gecko-driven Google Office (and Operating System)?” post, a post which has proven to be quite popular. It’s an excellent, thought provoking piece of which I would encourage you to read if you haven’t already, including a lot of great comments that follow from a lot of folks that bring out a lot of important points. (yes, a lot of the comments are mine, but they stem from the comments of others of which in and of themselves were fantastic, thought provoking comments…)

I’ve decided to pull my last comment out and republish it as its own post because I believe the overall theme/topic that it covers (Net-Neutrality) is of MAJOR significance. If you have a few minutes to read this, I ask that you would.

Thanks!

M. David Peterson

AddThis Social Bookmark Button

Saxon diaries :: Wrapping the .NET DOM

A user of Saxon on .NET, Don Burden, has been doing some performance tests:

https://sourceforge.net/forum/forum.php?thread_id=1493510&forum_id=94027

At first sight the figures are not especially good: 227 transformations per second against 613 for the System.Xml.Xsl transformer. However, closer analysis shows that a great deal of the cost is in converting the System.Xml DOM into a Saxon tree prior to performing the real transformation. This isn’t really a surprise - the API documentation contains some clear warnings about the cost of doing this (it’s far better, when you can, to build a native Saxon tree directly from raw XML - the same is true for the Java product).

As would be expected, there is TONS of fantastic information contained in this entry from Dr. Kay. If you are a .NET developer who uses XSLT to any great extent, or a Java developer looking to expand your horizons while building from your existing Java code-base, this article is an absolute must read!

M. David Peterson

AddThis Social Bookmark Button

In response to the question:

What is the advantage of using xslt 2.0 versus xslt 1.0?

posted yesterday, May 9th, 2006 to XSL-List, Dr. Michael Kay responds:

Features or benefits?

At the “features” level: masses. The quick wins tend to be grouping
capability, multiple output files, regular expressions, date and time
handling, stylesheet functions, temporary trees, sequence data types. The
language has roughly doubled in size, so it’s hard to give a short answer.

At the “benefits” level:

(a) productivity, through

(i) generally fewer lines of code for the same task
(ii) faster debugging cycle because of type checking (especially with
schema-awareness)
(iii) faster learning curve because there’s less need for weird
workarounds to common problems

(b) robustness, again through type checking

(c) applicability to a wider range of tasks, e.g. up-conversion.

(d) performance, particularly for tasks such as grouping that are now
supported by built-in capability.

Michael Kay
http://www.saxonica.com/

Michael(tm) Smith

AddThis Social Bookmark Button

Related link: http://xtech06.usefulinc.com/schedule

There’s a first-rate lineup of presentations at this year’s XTech conference(May 16-19 at the Grand Hotel Krasnapolsky in Amsterdam). The final day will include a Mobile Web Morning with four separate presentations about developing Web apps for access from mobile devices. I’ll be presenting one of them. Here are links to info about the whole set:

SAX creator David Megginson will be chairing the first pair of talks of the morning (mine and Håkon Lie’s), and Håkon Lie will be chairing the second pair..

I’ll also be chairing a separate set of talks a day prior, on May 18:

All of the Mobile W