November 2007 Archives

Kurt Cagle

AddThis Social Bookmark Button

Quick! Do you use the Compound Document Format?! You, know, CDF … surely you use CDF, right?

Chances are pretty good that you have no idea about what I’m talking about. Everyone knows Microsoft’s word document format and Adobe’s PDF, chances are pretty good that if you’re reading this on XML.com you’ve heard of ODF and OOXML, especially after the fairly rancorous discussions about ISO status for these two formats. Yet CDF, hmmmm … that’s a rough one. Didn’t it belong to Corel, once upon a time?

Rick Jelliffe

AddThis Social Bookmark Button

I was pleased to see that Debbie Lapeyre is presenting a tutorial session Introductory Schematron at XML 2007 in Boston next week (Dec 3-5). I’ve looked over the online course notes, and it looks a really good introduction. If you want a good introduction to Schematron, this is good opportunity for you. Now that XML Schemas 1.1 also has adopted a stripped down version of Schematron’s assertions, it is even more worthwhile getting to know.

Other sessions that looking interesting to me include the XML/XSLT hardware sessions and Miguel de Icaza’s session.

I won’t be attending, because I am going to the ISO/IEC SC34 meeting in Kyoto that weekend, sponsored kindly by Allette Systems and Topologi, for WG1 issues. It seems that our recent jumping up and down made some national bodies vote, and the last set of ballots had a quorum (just). I hope we can slough off the document format single-issue people to an new working group, which UK has proposed. That would free up the grid lock.

The voting problem has meant that I have had no desire to push ahead Schematron’s evolution, but there are quite a few issues that I want to raise with WG1 members (they probably cannot be discussed formally, because the WG has agenda rules to make sure that delegates get adequate backgrounding time): I want to propose that the next version of ISO Schematron has an official XSLT2 query language binding and a better library import mechanism (there are some editorial fixes too); and users have requested that ISO Schematron Validation Report Language (SVRL, which has become quite popular) should allow more information from the schema. SC34 tasked me to look at an ISO standard for ZIP: my approaches to PKWARE have not been answered, so it looks like we will have to go ahead without their involvement which would be a shame.

I also have a discussion draft for a set of abstract application profiles for office applications I want to run up the Kyoto flagpole. There is clearly a strong need for application profiles for office applications, if we want interoperability: interoperability requires both that an application provides a minimum set of features and that it warns when exporting documents with more than those features (where no graceful degradation is possible.) The need is so strong that people expected that somehow document format standards would provide it, which has lead to a lot of heated criticism that that oranges should be apples. An example of an abstract application profile might be “Unstructured Word Processor with Western and Asian text support with medium quality (styles, hyphenation, kerning, autonumbering, nested tables, hyperlinks, standard images, running headers) but no complex script support. Software reports when features more than this profile allows are used.” By defining a set of features, it helps procurement specify minimum requirements and it gives software developers a series of bars which they can demonstrate their software reach when trying to win contracts. And by making the checks (or validation) ability part of the profile, it promotes interoperability.

Unfortunately, while I am out of Sydney Standards Australia has scheduled the first meeting to discuss OOXML and our response to it. ECMA has released (to national bodies) the first hundreds of responses to the national body comments, and it would be interesting to discuss these. Actually, Standards Australia has taken a very strong line that only issues we raised in our comments would be discussed (and our comments in turn had a requirement that only issues with an Australian angle made it to our comments lists, which precluded us parroting the form-letter material from an American company.)

This SC34 meeting has another angle: it marks a changing of the guard for several key players: ANSI’s Dr Jim Mason has been the chair for is it 20 years now, and has brought a cool head and enormous procedural experience (he has seen a lot of hijinks over the years I imagine); BSA’s Martin Bryan has been the chair of WG1 since it came out and has been particularly involved recently in several DSDL standards such as ISO DSR; Canadian Standard’s Ken Holman has provided the Secretariat over the last few years, an outstanding effort in view of his multiple cancer onslaughts (successfully repelled, I am happy to say) and especially in view of the recent fast-track palava. I have i the past urged that companies that champion strategic technologies especially in fast-tracking should contribute to secretariat expenses: Microsoft and IBM are doing the wrong thing by not doing this, perhaps out of some spurious daintiness that there would be some conflict of interest (but it is not buying a standard, because NBs vote not the Secretariat!).

These three come from SGML days and have the strong user-oriented bias that makes ISO-created standards uniquely different from standards from boutique bodies where commercial companies are first-class citizens.

Michael Day

AddThis Social Bookmark Button

For those of you interested in generating high quality PDFs from web content using CSS, a Google Tech Talk on the subject given by Hakon Lie and myself is now available for your viewing pleasure. Alternatively, if you are in Boston next Tuesday evening, why not drop in for the XML 2007 lightning rounds, where I’ll be talking about printing with CSS for exactly 6 minutes and 40 seconds.

M. David Peterson

AddThis Social Bookmark Button

*FANTASTIC* commentary from Eric Larson regarding the vectors of communication,

The IonBlog

This reflects the real world in that there are times where casual conversations occur in a casual location. In other words, friends go to mall for a casual good time. At the same time though, the mall does not define the conversation as casual. What defines the communication is the tone and the subject matter. You can have conversations anywhere that are casual. What makes the determination regarding the tone of communication are those involved. Again, this means pushing the interaction and communication to occur between users instead of at a location.

Conceptually this is pretty simple stuff, but in practice it is very complicated. Users need to be able to extend their personality and persona on the web while still allowing the social barriers to be present. Users need to have the ability to reveal different sets of information that define who they are. They also need to reliably reveal private details to specific people. This needs to be simple and automatic from the user’s perspective, which is very difficult.

Really the issue is essentially how can we create systems that let people express who they are and have that persona effect their communication without forcing users to add each and every bit to the picture of who they are. It is a complex problem, but as we continue to push and pull the structure of the web, people will eventually gain the tools they need to improve communication on the web.

I should point out that Eric works with us @ 3rd&Urban, and it’s the OSS 3rd&Urban platform in which we are building and extending our Blip Messaging system. Eric has already found some killer ways to extend the Blip Messaging infrastructure, features that you will see when the platform launches on January 1st. After reading this piece, I can’t wait to see where he takes things from here! :D

M. David Peterson

AddThis Social Bookmark Button

ongoing ? On Communication

You see, if you draw the right graph, maybe you’ll see the gaping hole in it, the Next Big Thing.

Communication-graph.png



I don’t know, Tim, but am I the only one that sees blips on a radar screen? ;-)



Who wouldn’t want to expand the human communication spectrum?

Absolutely!

Why aren’t more people thinking about this stuff?

TheyWe are. But instead of thinking and talking (Update: That process has been going on for quite some time now), we’re building and delivering.

BTW… Have I ever mentioned just how much I enjoy releasing projects on January 1st of each year? ;-)

(Something tells me 2008 is going to be a big year.)

M. David Peterson

AddThis Social Bookmark Button

[2:15pm] elarson: xmlhacker: you should patent the use of capital letters in association with the ‘:D’ emoticon during communication in order to protect your communications intellectual property ;-)

The above quote comes from Eric Larson in a recent IRC conversation after his recent post on “Patent Reform” ensued a conversation related to the 1-Click ordeal from 7 years ago (I was working on the Microsoft Passport team at the time, so those aware of that overall situation will understand what I mean when I say I had a first hand look at just how much fun dealing with the ugliness of the patent system truly is.) Those of you who have ever exchanged an email with me or have been a recipient of an email from one of the mailing lists I have posted to might recognize what Eric is referring to. For those who have not, at the bottom of all of my email communications you will find the following at the beginning of my signature,

/M:D

… which is intended to represent my first and middle initial, my first initial being implicitly bound to xmlns:M=”urn:publicid:Peterson:Patronymic+Surname:EN:1.2″ (<- If you don’t get it, don’t bother asking. It’s not all that clever *OR* funny ;-)) using an XML-ish namespace extension-based syntax.

Of course to those of you who have emoticons turned on in your email reader the above will translate to,

Rick Jelliffe

AddThis Social Bookmark Button

The surprising decision of the ODF Foundation founders that ODF has foundered and that what we really need is a format that no-one uses &,dash;I guess that is real neutrality— has triggered a few interesting blogs recently. Rob Weir has one Document Format FUD: A Guide for the Perplexed (insert obvious joke here) and Gary Edwards gives his rationale in a comment here.

Edwards main bone is that he (and his mob) think that the main game has to be to develop an MS Exchange/SharePoint alternative, because the fight for an Office killer has been lost, both because the applications aren’t good enough and because the standard has not been completed along the lines needed, according to their strategy. His point is pretty cogent, but I think wrong, because even if an alternative to OOXML was developed to allow alternatives to stop the MS Exchange/Sharepoint monopoly of Edwards’ prophecy, there is nothing stopping MS from reading or writing the other standard language, once it got traction. A nice little open source project, perhaps, giving MS agility and plausible deniability for lock-in.

I think both the anti-OOXML pro-ODF people and the new anti-ODF, pro-CDF people both get it wrong, when they think that file formats can change the balance significantly. Open file formats allow conversion to and from different traditions or technical streams: they reduce technical barriers to adoption and to substitution, which in turn means that other quality and cost aspects dominate (as they should.)

However, quote from Edwards is good (though I am not sure who he is addressing):

OBTW, where were you when this was going down? If there’s anything we learned in Massachusetts it’s that if we’re going to defeat Microsoft, it will be in the trenches, with real world solutions that are competitive alternatives to MS-OOXML. Blogging MS to death isn’t going to get the job done my friend. When the call goes out for real world solutions, as it did with the Massachusetts RFi, you’ve got to show up with more than your keyboard and blog.

(IMHO there is a requirement for pro-active laws on long-term superprofits. Without this kind of legislative action to correct markets, file formats are just fiddling while Rome burns.)

Rick Jelliffe

AddThis Social Bookmark Button

From two different sources this week comes the news that ISO and IEC have found there is no substance that they can find to the scare tactics on IPR in the OOXML draft IDIS 29500.

This came to me first in an email from a Standards Australia official, then also in Alex Brown’s BRM FAQ which says

4.1 Will IPR issues be discussed at the BRM?

No. IPR issues in this process are the exclusive preserve of the ITTF. IPR decisions have previously been delegated by all the ISO and IEC members (NBs) to the CEOs of IEC and ISO, and they in turn have examined them and found no outstanding problems. NBs seeking reassurance in such matters must pursue them through other avenues than the BRM.

Now it is good to be clear here: if you want OOXML to be a specification that allows complete reverse engineering of MS Office 2007, then you will find a lot of shortcomings with DIS 29500, particularly in that it just ignores things happening outside the XML such as in media files. However, that is explicitly not the purpose of DIS 29500: its purpose is to document a file format which is the native format of Office 2007 and has been designed to expose as XML all the information previously carried in MS’ closed, binary and/or proprietary formats (with some antiquitites and bugs cleaned up, and with some recent parts as befits a living standard.) The worries about IPR often relate to these non-DIS29500 aspects, which belong to some other debate (though certainly not to no debate.)

Rick Jelliffe

AddThis Social Bookmark Button

There is something about XML that makes people go crazy: in particular, people trying to make standards: its that ol’ tag fever agin Maude. I think I know what that thing is: the emphasis on standards = good combined with the desire for complete schemas and the idea that organizing schemas by namespace is the way to shoehorn requirements (rather than being a way of expressing results).

The result: vocabularies where unnecessary order and structuring constraints are given. You can tell when a standard schema is over-specified, because people using it will just snip out the low-level elements they need and plonk these in their own home-made container elements.

I have noticed this in a few schemas I have been working with recently: in fact, the trend I notice is that people start off with their own home-made schema, then “adopt” the standard by finding any elements that have close semantics to their home-made elements, and changing the name of the home-made element to the standard name. SVG in ODF looks like an example of this, and there is another standard I have been working with recently that has the same issue: when you adopt arbitrary portions of a cohesive standard, are you really using or abusing that standard?

I suppose there is a case to be made that transitional schemas should be treated seriously.

One software engineering idea that has stuck with me over the last years (which I wrote about in The XML & SGML Cookbook) is the twinning of cohesion and coupling. Basically, that when some information is highly coherent (think of Eve Maler’s Information Units) i.e., it belongs together semantically and would not make much sense in isolation, it deserves an official container.

Conversely, you should try to reduce coupling of information that is not cohesive.

A rule of thumb for many situations is that industry standard groups (and, indeed, inhouse schema developers), may be well advised to standardize data elements eagerly but container elements suspiciously: standardize the jellybeans not the jars. The next bloke may likes your jellybeans but have his own jars.

Various approaches to do this come to mind: think in terms of creating a vocabulary rather than a language; split your industry standard in two, with the tightly coupled elements in one normative section and the loosely-coupled elements in another non-normative section, perhaps with different namespaces even; use open content models and order-independence for loosely-coupled elements.

Another upside for this approach, is that it reduces the number of trivial issues for committee members to get excited about.

M. David Peterson

AddThis Social Bookmark Button

Apparently not anymore.

Microsoft XML Team’s WebLog : Chris Lovett Interview

As for XSLT 2.0 - we’ve heard from customers and understand the improvements in XSLT 2.0 over XSLT 1.0, but right now we’re in the middle of a big strategic investment in LINQ and EDM for the future of the data programming platform which we think will create major improvements in programming against all types of data.

Some advice to those of you considering upgrading to VS.NET 2008: Don’t waste your time.

Oh, and regarding,

But we are always re-evaluating our technology investments so if your readers want to ramp up their volume on XSLT 2.0 please ask them to drop us a line with their comments.

Drop you a line? Some advice to those who think it might actually make a difference: I’ve tried that. As already mentioned, don’t waste your time.

NOTE-TO-SELF: When folks you have reason to trust such as Mike Champion and Alex Barnett start leaving any given team @ MSFT, take this as a sign: Don’t waste your time trying to get through to the Neanderthals they used to report to. Quite obviously they no longer report to these fools for a reason.

DISCLAIMER: I have no clue why Mike or Alex left the Microsoft XML team. I only know that when they left all the goodness they brought to the XML team left with them.

Trust is hard thing to earn, Microsoft. No doubt I’m not the only one on this planet who no longer feels trust is something you are worthy of. At least not as it relates to the XML team. Fortunately for the rest of us we have better options. e.g. Saxonica and Oxygen. And no doubt with MSFT no longer “threatening” to release an XSLT 2.0 processor and tools to support that processor there are others with a clear vision of the future who will step in and begin building more/better/faster processors, more/better/faster tools, and ultimately leave MSFT realizing that losing people’s trust is really a bad business decision to make, though I doubt you’re going to hear Dr. Kay or George Christian Bina complaining anytime soon as their business opportunities just got a whole lot bigger.

Folks, if you want the best XML processing and development tools on the planet, don’t bother wasting your time OR your money w/ MSFT. Look elsewhere. At least that’s my opinion. No doubt you have your own.

M. David Peterson

AddThis Social Bookmark Button

Campaign Widget | Creative Commons

Help support CC by putting this widget on your website, blog, or social network!


M. David Peterson

AddThis Social Bookmark Button

Bitch, Moan, Cry, and in other forms make a complete and total fool of yourself in public.

But it’s worth it…

Desktop Team - by Desktop Team

Added support for the XSLT document() function (poke to xmlhacker ;-) )

*YES*!!! :D

So here’s the thing: I haven’t made it any secret here on XML.com and elsewhere that I am a *HUGE* fan of Opera. In fact, the only thing that has kept me from proclaiming Opera as the undisputed winner** in the Browser Wars is the fact that they’ve been missing support for some key pieces of the XSLT spec. That has now changed, and while there are bound to be bugs (this is a weekly build), I am now making it official,

Opera is the *KING* of the Web Browser world. Nothing else even comes close.

Believe it! ;-)

Thank you, Opera!!! :D

Next Up: The Top 10 Reasons Why Opera Is The Best Browser On The Planet.

** In fact, they’ve been the undisputed winner since about 1997. How and why they don’t pwnz the browser market is a complete and total mystery to me. My new task in life: To help change that by evangelizing each and every product that comes from the *KING* of the Web Browser company: Opera. If you haven’t already, please download Opera (stable, beta, latest weekly w/ document() function support) and when it asks if you would like to make it your default browser > Say *YES!*.

Update: Phreakin’ beautiful,

What’s even more beautiful is that Opera’s XSLT error reporting tool is so good, I was able to pin point a silly little mistake in my code that was preventing the above link from transforming correctly (had the output set to ‘xml’, but was using the HTML public identifier literal. Changing the output method from xml to html fixed the problem.) Of course, you might wonder why this little error didn’t get caught by some of the other browsers, and the answer is quite simple,

Opera is a *STANDARDS COMPLIANT* browser company. They write their code to comply with the rules specified in the related standards doc. And yes, it really is that simple.

Thanks for the kick a$$ *STANDARDS COMPLIANT* browser and browser dev/debugging tools, Opera!!! :D

Rick Jelliffe

AddThis Social Bookmark Button

Noah Mendelsohn on XML-DEV got me thinking again about the obvious dual of the recent blogs here: how to convert Schematron into XSD. Putting aside the natural question of why we might want to do this, here’s my stab at an answer.

Because Schematron is more powerful and more general than XSD, and because it uses different abstractions (phases and patterns rather than a grammar), it is not possible to convert every arbitrary Schematron schema into a useful schema in XSD.

However, it is certainly possible to devise some conventions which allow translation to some extent.

I’ll leave out the abstract declarations, but here is an example for HTML of how it might look. First lets treat the different types of complex content as if they were just like facets.

<sch:pattern role="Declarations">
<sch:rule context="xhtml:html"  role="element-declaration">
   <sch:rule extends="container"  role="element-content-type" />
   <sch:rule extends="metadata" role="attribute-group-reference" />
</sch:rule>

<sch:rule context="xhtml:head"  role="element-declaration">
   <sch:rule extends="container"  role="element-content-type" />
   <sch:rule extends="metadata" role="attribute-group-reference" />
</sch:rule>

<sch:rule context="xhtml:meta"  role="element-declaration">
   <sch:rule extends="empty"  role="element-content-type" />
</sch:rule>

<sch:rule context="xhtml:p"  role="element-declaration">
   <sch:rule extends="mixed"  role="element-content-type" />
   <sch:rule extends="metadata" role="attribute-group-reference" />
   <sch:rule extends="css" role="attribute-group-reference" />
</sch:rule>

<sch:rule context="xhtml:span"  role="element-declaration">
   <sch:rule extends="text"    role="element-content-type" /> <!-- for example-->
   <sch:rule extends="css" role="attribute-group-reference" />
</sch:rule>

<sch:rule context="xhtml:img"  role="element-declaration">
   <sch:rule extends="empty"  role="element-content-type" />
   <sch:rule extends="css" role="attribute-group-reference" />
</sch:rule>

<sch:rule context="xhtml:img/@width"  role="attribute-declaration">
   <sch:rule extends="dimension-type"  role="simple-type" />
</sch:rule>

<sch:rule context="xhtml:img/@height"  role="attribute-declaration">
   <sch:rule extends="dimension-type"  role="attribute-simple-type" />
</sch:rule>

</sch:pattern>

This gives us the framework. In this pattern, we use abstract rules for a mix-in style of multiple inheritance.

You should be able to see how each of these can be mechanically converted into partial XSD declarations for elements and attributes, such as

<xsd;element name="html">
   <xsd:complexType mixed="false" >
      &xsd:group ref="open" />
     &attributeGroup ref="metadata" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="meta">
   <xsd:complexType mixed="false" >
     &attributeGroup ref="open" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="head">
   <xsd:complexType mixed="false" >
      &xsd:group ref="open" />
     &attributeGroup ref="metadata" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="p">
   <xsd:complexType mixed="true" >
      &xsd:group ref="open" />
     &attributeGroup ref="metadata" />
     &attributeGroup ref="cvv" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="span">
   <xsd:complexType mixed="true" >
      &xsd:attributeGroup ref="css" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="img">
   <xsd:complexType mixed="false" >
       <attribute name="height" type="dimension-type" />
       <attribute name="width" type="dimension-type" />
   <xsd:complexType>
</xsd:element>

So far so good. To get better control of optionality of attributes, we could extend a pattern with a different name, I suppose. But the idea is the same: we use the sch:role attribute to provide the metadata needed to allow an effective transformation.

What about content models? In the version above, any element with subelements just has an “open” content model, presumably declared with some wildcard.

This is where abstract patterns can come in. First lets alter the schema we generate to make groups with the same name as the element.

<xsd;element name="html">
   <xsd:complexType mixed="false" >
      &xsd:group ref="html" />
     &attributeGroup ref="metadata" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="meta">
   <xsd:complexType mixed="false" >
     &attributeGroup ref="open" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="head">
   <xsd:complexType mixed="false" >
      &xsd:group ref="head" />
     &attributeGroup ref="metadata" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="p">
   <xsd:complexType mixed="true" >
      &xsd:group ref="p" />
     &attributeGroup ref="metadata" />
     &attributeGroup ref="cvv" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="span">
   <xsd:complexType mixed="true" >
      &xsd:attributeGroup ref="css" />
   <xsd:complexType>
</xsd:element>

<xsd;element name="img">
   <xsd:complexType mixed="false" >
       <attribute name="height" type="dimension-type" />
       <attribute name="width" type="dimension-type" />
   <xsd:complexType>
</xsd:element>

We could have:

<sch:pattern is-a="container"   role="group-declaration">
  <sch:param  name="context" value="xhtml:html"/>
  <sch:param  name="content-model" value="( xhtml:head?, xhtml:body )"/>
  <sch:param  name="required-children" value="xhtml:body"/>
  <sch:param  name="optional-children" value="xhtml:head"/>
</sch:pattern>

<sch:pattern is-a="container"   role="group-declaration">
  <sch:param  name="context" value="xhtml:head"/>
  <sch:param  name="content-model" value="( xhtml:title, xhtml:meta*, (xhtml:style | xhtml:script)* " />
  <sch:param  name="required-children" value="xhtml:title"/>
  <sch:param  name="optional-children" value="xhtml:meta | xhtml:style | xhtml:script"/>
</sch:pattern>

<sch:pattern is-a="container"   role="group-declaration">
  <sch:param  name="context" value="xhtml:p"/>
  <sch:param  name="content-model" value="( i | b | span | img )* "/>
  <sch:param  name="required-children" value=""/>
  <sch:param  name="optional-children" value="i | b | span | img"/>
</sch:pattern>

The purpose of abstract patterns is to allow patterns to be parameterized to bring out and name the parts of the patterns of interest in the schema-developer’s world-view. Create your own schema language!

In this case, we have three uses of abstract patterns, and the role attribute tells our notional converter program that we want to generate a xsd:group from these. (We are giving the element content model in the conventional syntax, so some converter would have to translate between syntaxes, it is just mechanical.) The final two parameters give (actually repeat) the information in the content model in a way that would be more conducive for use in an XPath.

So our convert program would take these and generate

<xsd:group name="html">
   <xsd:sequence>
      <xsd:element ref="head" minOccurs="0" />
      <xsd:element ref="body" minOccurs="0" />
  </xsd:sequence>
</xsd:group>

and so on. Of course, there would be many other approaches possible. But you should be able to see from this example how Schematron can, in fact, be highly declarative, if that is what you want.

Rick Jelliffe

AddThis Social Bookmark Button

I have two public speaking engagements in Sydney this week at the Open Standards ‘07 Conference. This is a conference mainly concerned with data interchange standards to allow open systems, in particular XML of course: so sessions on HL7, UBL, EDI, this year with a good focus on government as well as business standards.

On Wednesday I have a half-day tutorial Office Document Standards which will be an under-the-hood look at OOXML and ODF.

And on Friday 2:30 I have a paper The Drive to Openness: Open Source, Open Standards, Open Systems, which looks at the drivers for openness and the connection with the need for transparency and better governance.

So if you are there, please feel free to have a chat!

Rick Jelliffe

AddThis Social Bookmark Button

Brian Reid, the old fuddy-duddy fighting back, turns out to be the Brian Reid, of Scribe fame. Scribe was an early word-processing application that was one of the first practical and public tools for showing that descriptive markup, rather than procedural markup, was workable.

The story is that IBM’s Charles Goldfarb was due to make his big presentation on their GML system to a conference in Switzerland in 1981, but Reid had a paper at the same conference which turned out to present a lot of the same material: stolen thunder! GML morphed into SGML now XML, while Scribe influenced the direction of word processors by showing the practicality of styles (think CSS!).

Reid revisited his 1981 paper at an SGML/XML conference keynote in 1998, which is still online though large (10meg PPT?) The paper includes some interesting thoughts on why markup is wrong-headed.

Reid’s dispute with Google is quite interesting to me. Last year or so, I was looking around to see whether there were any interesting opportunities in Sydney, and Google contacted me to come in for an interview. When I arrived, it turned out to be a rather odd interview for a programming job which was pretty much unconnected with anything I had been doing for the last 20 years, but the questions would have been good for a recent graduate, sort of like interviewing Donald Duck for a position as an egg, if you know what I mean; the people seemed super nice, but I think there was a bit of mutual mystification as to why I was there. The impression I got was very much of a mono-culture: the founders wanted people like the founders. It seemed that standards were not on the company’s horizon at all.

But perhaps this is a new way to break up monopolies: everyone above 35 goes to one company, all the rest go to another!

Simon St. Laurent

AddThis Social Bookmark Button

You don’t have to follow The Oil Drum to know that energy prices just keep climbing. Even if supply holds up, huge demand will make prices a problem for a long time to come. Can the Internet help reduce that demand?

M. David Peterson

AddThis Social Bookmark Button

Watch to the very end to understand the title.

TED | Talks | Larry Lessig: How creativity is being strangled by the law (video)

Larry Lessig gets TEDsters to their feet, whooping and whistling, following this elegant presentation of “three stories and an argument.” The Net’s most adored lawyer brings together John Philip Sousa, celestial copyrights, and the “ASCAP cartel” to build a case for creative freedom. He pins down the key shortcomings of our dusty, pre-digital intellectual property laws, and reveals how bad laws beget bad code. Then, in an homage to cutting-edge artistry, he throws in some of the most hilarious remixes you’ve ever seen.

M. David Peterson

AddThis Social Bookmark Button

Open Handset Alliance

Android™ will deliver a complete set of software for mobile devices: an operating system, middleware and key mobile applications. On November 12, we will release an early look at the Android Software Development Kit (SDK) to allow developers to build rich mobile applications.

Open
Android was built from the ground-up to enable developers to create compelling mobile applications that take full advantage of all a handset has to offer. It is built to be truly open. For example, an application could call upon any of the phone’s core functionality such as making calls, sending text messages, or using the camera, allowing developers to create richer and more cohesive experiences for users. Android is built on the open Linux Kernel. Furthermore, it utilizes a custom virtual machine that has been designed to optimize memory and hardware resources in a mobile environment. Android will be open source; it can be liberally extended to incorporate new cutting edge technologies as they emerge. The platform will continue to evolve as the developer community works together to build innovative mobile applications.

All applications are created equal
Android does not differentiate between the phone’s core applications and third-party applications. They can all be built to have equal access to a phone’s capabilities providing users with a broad spectrum of applications and services. With devices built on the Android Platform, users will be able to fully tailor the phone to their interests. They can swap out the phone’s homescreen, the style of the dialer, or any of the applications. They can even instruct their phones to use their favorite photo viewing application to handle the viewing of all photos.

Breaking down application boundaries
Android breaks down the barriers to building new and innovative applications. For example, a developer can combine information from the web with data on an individual’s mobile phone — such as the user’s contacts, calendar, or geographic location — to provide a more relevant user experience. With Android, a developer could build an application that enables users to view the location of their friends and be alerted when they are in the vicinity giving them a chance to connect.

Fast & easy application development
Android provides access to a wide range of useful libraries and tools that can be used to build rich applications. For example, Android enables developers to obtain the location of the device, and allow devices to communicate with one another enabling rich peer-to-peer social applications. In addition, Android includes a full set of tools that have been built from the ground up alongside the platform providing developers with high productivity and deep insight into their applications.

M. David Peterson

AddThis Social Bookmark Button

As per the title, this is the first thing that came to mind when I read the following post from earlier today from Dimitre Novatchev regarding transitive closures,

M. David Peterson

AddThis Social Bookmark Button

Just came across what seems like an interesting podcast,

OAuth with Larry Halff, Eran Hammer-Lahav and Chris Messina - Bungee Connect Developer Network

Overview
Three of the minds behind the OAuth initiative, Chris Messina, Larry Halff and Eran Hammer-Lahav, join us to tell us about this emerging “open protocol to allow secure API authentication in a simple and standard method from desktop and web applications.“
54:32, 25 MB

Nice! Am listening to it now and it definitely seems worth a listen.

Update: Yo Alex: You’re right, I do need a haircut.

/me is adding “get a damn! haircut” to my task list for the day. ;-)

Rick Jelliffe

AddThis Social Bookmark Button

Now we come to the most interesting part: how do we generate Schematron schemas that implement the constraints from an XML Schema? A question often comes up, of whether Schematron is strictly more powerful than XML Schemas or just often so; some academics have made tentative opinions, and the conclusion I had reached was that probably it was not: for any implementation in Schematron you could probably make a content model that was so baroque and monstrous that Schematron would not capture some aspect of it. But it you would have to try hard.

However, with Schematron using XPath2 or XSLT2 as their query language (rather than the default XSLT1) things are much clearer: I think there is a really simple technique available that captures all the cardinality, optionality, and sequence constraints.

The Regular Expression technique

The technique? Convert a content model into a regular expression; make a string contain as space-separated tokens each element name found in the instance; then validate that string against the regular expression! The regular expression language used in XSLT2 allows sequence, choice, cardinality, repetition, and these are the same as in XSD. You don’t need a special FDA library if you have the regular expression library. (The special cases of xsi:type and substitution groups are no problem: xsi:type could be handled by another pattern because the element must be still valid against the declared type; while substitution groups can be handled during the prior pre-processing and be long gone by this stage. nillibility is not something I have thought about much: it certainly can be done but I don’t know the impact on the Xpaths.)

So, if we wanted to, we could implement this in our XSD to Schematron converter and say hooray.

But the problem is that even though we could validate all the constraints, we would get lousy diagnostics. What would be the point? I guess it would still be useful as a fallback, as another phase for confidence building and to check if anything had fallen between the cracks of the method we will be using, but it is not so interesting to me that we have it in our plans to implement. If someone else wants to implement it in XSLT and contribute it, it might be a fun and small project!

We could break the regular expression in various interesting ways, however: we could make one version of it that made everything optional, and so implement feasible validation, as found for example in Jing for RELAX NG. (However, there might be ambiguity issues here, so it might not work each time.) Feasible validation is an approach I came up with a couple of years ago, based on the idea that it can be useful to validate only certain constraints: very often you might markup a document to fill in the metadata at the last stage, so you don’t want validation to fail because of some problem at the start of an element’s children when you are working on subsequent elements. A validator should not dictate a workflow!

Validators in editors frequently implement partial validation, where they don’t complain about child elements missing at the end of a content model. This is partial validation: it is useful if you are entering the document in element order, but not otherwise.

Now another approach with the regular expression method would be to break it into smaller expressions, for example a string of three tokens with anything allowed before or after: trigrams/ Indeed, that is something pretty similar to what we do later, in effect, but not using regular expressions.

A more Schematron-ish way

So what is the Schematron-ish way to approach the problem? Well, it is to concentrate on two things: first, What is the most useful way of expressing and organizing diagnostics to help the user? and second, What is the model of user interaction built into the schema? Actually, in my opinion, you cannot answer the first without answering the second, and the second dictates the first.

Rather than talk theory, I’ll show you the approach and you should be able to figure out what I mean by user interaction and so on, with these use cases.

Use Cases

  • The user wants to check for typos: names that are spelled incorrectly
  • The user wants to check for containment: that elements and attribute belong to the correct parents
  • The user wants to check that all required elements and attributes are present
  • The user wants to check that each element is in the required position

This is another example of progressive validation. It allows the user to systematically find certain kinds of mistakes, and partitions them off. Because Schematron will usually report all the errors it finds anywhere in a document, it has an advantage that it is very easy to see systematic errors, if they are presented together; grammar-based validators often just die at the first error. But assertion-based schemas using paths may generated too many diagnostics, as the same error causes multiple assertions to fail.

So Schematron has a feature called phases. Phases let you group some patterns together, give them a name, and then you can instruct the validator to only validate the patterns in the that phase. This allows workflows, progressive validation, incremental markup, transformation checking, variant document types, and so on. Very useful.

Each of these use-cases may take one or more patterns to implement, however, we will make a phase for each of them. (Actually, we have gone phase craaazy, which will be in a later posting.) Here is the phase declarations to validate just the typos, for example:

<sch:phase id="phase-typo">

	<sch:active pattern="Element_Name_Typo">

				Pattern for checking for typos in element names.

	</sch:active>

	<sch:active pattern="Attribute_Name_Typo">

				Pattern for checking for typos in attribute names.

	</sch:active>

	<sch:p>This phase has all the patterns for checking typos in names.

</sch:phase>

As you can see, we are not validating using a state machine or similar grammar system at all.

The patterns

Here are the patterns; we have factored out the guts to make the commonality between this boilerplate more obvious.

 <!-- pattern 5: Element name typos Elements	-->

<xsl:comment>

			============================================================

		                          ELEMENT NAMES 

			============================================================

</xsl:comment>

<sch:pattern id="Element_Name_Typo">

	<sch:title>Typos in Element names

	<xsl:call-template name="generate-elements-typo-checking-rule"/>

</sch:pattern>

<sch:pattern id="Element_Name_Expected">

	<sch:title>Expected in Element names

	<xsl:call-template name="generate-elements-expected-checking-rule"/>

</sch:pattern>

<sch:pattern id="Element_Name_Required">

	<sch:title>Required in Element names

	<xsl:call-template name="generate-elements-required-checking-rule"/>

</sch:pattern>

<!-- pattern 6: Attributes name typos Attributes	-->

<xsl:comment>

			============================================================

	                         Attributes NAMES

			============================================================

</xsl:comment>

<sch:pattern id="Attribute_Name_Typo">

	<sch:title>Typos in Attributes names

	<xsl:call-template name="generate-attributes-typo-checking-rule"/>

</sch:pattern>

<sch:pattern id="Attribute_Name_Expected">

	<sch:title>Expected in Attributes names

	<xsl:call-template name="generate-attributes-expected-checking-rule"/>

</sch:pattern>

<sch:pattern id="Attribute_Name_Required">

	<sch:title>Required in Attributes names

	<xsl:call-template name="generate-attributes-required-checking-rule"/>

</sch:pattern>

<xsl:comment>

Typos

The typo patterns are very easy. Here is the one for elements.

<xsl:template name="generate-elements-typo-checking-rule">

	<xsl:for-each select="//xs:element[@name]">

		<xsl:sort select="@name"/>

		<sch:rule context="{@name}">

			<sch:assert test="true()">
			The <sch:name/> element is defined in this schema.</sch:assert>

		</sch:rule>

	</xsl:for-each>

	<sch:rule context="*">

		<sch:report test="true()" diagnostics="typo-element">
		Only elements declared in the schema may be used.</sch:report>

	</sch:rule>

</xsl:template>

In this case, we generate a rule for each element, but with only a vacuous true() assertion test; there is still a useful assertion behind it though, in the assertion text. Elements with typos fall through and are caught by the wildcard test of the last rule.

And finally, here is a simple diagnostics element, to report the miscreant.

	<sch:diagnostic id="typo-element">
		The following element was found <sch:name/>.
	</sch:diagnostic>

We’ll continue with handing more of the use cases in another blog.

Rick Jelliffe

AddThis Social Bookmark Button

A year ago I wrote in this blog a precursor to this series Converting Content Models to Schematron, in which I outlined one approach. This blog item is an update on that, in particular for special cases, clearing the decks with them leaves us free to look at XML content models:

  • Empty elements
  • Text content (untyped)
  • Element content
  • XSD ALL content models

Empty Elements

Empty elements are easy. (Update: 2007-11-09)

<xsl:template match="xs:element[xs:complexType
                    [not(xs:simpleContent)]
                    [not(@mixed='true')]
                    [not(.//xs:element)]]"        priority="100">
	<sch:rule>
		<xsl:call-template name="generate-element-context"/>
		<xsl:comment>Check Empty Elements: They can't have
			1, text nodes 2, elements 3, comments 4, processing-instructions </xsl:comment>
		<sch:assert test="count(*|processing-instruction()|comment()|text()) = 0" diagnostics="d1">
		Element <sch:name/> should have no content.</sch:assert>
	</sch:rule>
</xsl:template>

Text Elements (Untyped)

Text elements are easy too.

<xsl:template match="xs:element[xs:complexType[xs:simpleContent]]" priority="99">
	<sch:rule>
		<xsl:call-template name="generate-element-context"/>
		<xsl:comment>Check Text Only: They can't have
			1, elements </xsl:comment>
		<sch:assert test="count(*) = 0" diagnostics="d1">
		Element <sch:name/> should have text content and attributes only, but no sub-elements.
		(They may have procesing instructions and comments.0</sch:assert>
	</sch:rule>
</xsl:template>

Element Content

For element content elements, we’ll just check that they don’t have text, for this pattern. (We will check whether the elements it has are allowed in a different pattern, in a future blog.)

<xsl:template match="xs:element
                    [xs:complexType[not(@mixed='true')][not(xs:simpleContent)]]" priority="98">
	<sch:rule>
		<xsl:call-template name="generate-element-context"/>
		<xsl:comment>Check None Text found: They can't have
			1, any text content </xsl:comment>
		<sch:assert test="string-length(normalize-space(string-join(text(), ''))) = 0" diagnostics="d1">
		Element <sch:name/> should have no text content.</sch:assert>
	</sch:rule>
</xsl:template>

The ALL Content Model

The ALL content model, in XSD, is a way of saying that all the elements are
required (or optional) but they can be in any order. To do this with a grammar runs the risk of a combinatorial explosion, but the ALL content model is very straightforward to implement in Schematron, but we have to break it into its component assertions.

FIrst, the ALL content model is closed (we don’t implement wildcards.) So we count that the total number of elements is equal to the sum of the counts of the allowed elements. If the element requires all A, B and C, then we count(A) + count(B) + count(C) = count(*) which is another example of how in Schematron you solve many problems by counting.


<xsl:template match="xs:element[.//xs:all]" priority="90">

<xsl:comment>======= Handle XS:ALL ========</xsl:comment>

<sch:rule>

	<xsl:call-template name="generate-element-context"/>

	<xsl:comment>check allowed elements</xsl:comment>

	<sch:assert  >

		<xsl:attribute name="test">

			<!-- get names of each allowed element -->

			<xsl:for-each select=".//xs:all/xs:element">

				<xsl:text>count(</xsl:text>

				<xsl:value-of select="if (@name) then @name else @ref" />

				<xsl:text>)</xsl:text>

				<xsl:if test="following-sibling::xs:element"> + </xsl:if>

			</xsl:for-each>

			<xsl:text> = count(*)</xsl:text>

		</xsl:attribute>

			The element <xsl:value-of select ="@name"/> can only have the following elements:

		<!-- get names of each allowed element -->

		<xsl:for-each select=".//xs:all/xs:element">

			<xsl:value-of select="if (@name) then @name else @ref" />

			<xsl:if test="following-sibling::xs:element">, </xsl:if>

		</xsl:for-each>.

	</sch:assert>

Next we generate an assertion that each element only occurs with the cardinality of the maxOccurs and minOccurs.

	<xsl:for-each select=".//xs:all/xs:element">

		<xsl:variable name="ancestor-element" select="ancestor::xs:element/@name"/>

		<xsl:variable name="element-name" select="if (@name) then @name else @ref"/>

		<xsl:variable name="MAXOccurs" select="if (@maxOccurs) then @maxOccurs else '1'"/>

		<xsl:variable name="MINOccurs" select="if (@minOccurs) then @minOccurs else '1'"/>

		<xsl:choose>

			<xsl:when test="$MAXOccurs = $MINOccurs">

				<sch:assert diagnostics="{concat('d2-',$ancestor-element,'-',$element-name)}">

					<xsl:attribute name="test">

							count(<xsl:value-of select="$element-name"/>) = <xsl:value-of select="$MAXOccurs"/>

					</xsl:attribute>

						There should be <xsl:value-of select="$MAXOccurs"/> of element <xsl:value-of select="$element-name"/>

				</sch:assert>

			</xsl:when>

			<xsl:otherwise>

				<sch:assert  >

					<xsl:attribute name="test">

							count(<xsl:value-of select="$element-name"/>) <= <xsl:value-of select="$MAXOccurs"/>

					</xsl:attribute>

						There should be at most <xsl:value-of select="$MAXOccurs"/> of element <xsl:value-of select="$element-name"/>

				</sch:assert>

				<sch:assert diagnostics="{concat('d2-',$ancestor-element,'-',$element-name)}">

					<xsl:attribute name="test">

							count(<xsl:value-of select="$element-name"/>) >= <xsl:value-of select="$MINOccurs"/>

					</xsl:attribute>

						There should be at least <xsl:value-of select="$MINOccurs"/> of element <xsl:value-of select="$element-name"/>

				</sch:assert>

			</xsl:otherwise>

		</xsl:choose>

	</xsl:for-each>

</sch:rule>

</xsl:template>

So every element with an ALL type only requires a single rule to implement.

Now we want to add some more information for better diagnostics, so for each of the count rules we implement

<sch:assert diagnostics="{concat('d2-',$ancestor-element,'-',$element-name)}">

and we generate the corresponding diagnostics to give an actual count of the overpopulation:

 <xsl:for-each select="xs:element[.//xs:all]//xs:all/xs:element">

	<xsl:variable name="ancestor-element" select="ancestor::xs:element/@name"/>

	<xsl:variable name="element-name" select="if (@name) then @name else @ref"/>

	<sch:diagnostic id="{concat('d2-',$ancestor-element,'-',$element-name)}">  elements were found

</xsl:for-each>

In Schematron , we make a distinction between the assertion text, which is a positive statement of what is true, and diagnostics, which give extra help to humans. Very often people new to Schematron want to put diagnostic messages as the assertion text. (Indeed, some of the programmers working on this project did it, so it is not an obvious thing sometimes.) To get the idea, think about what happens if you want to generate a paper document with the schema printed out, with one bullet point per assertion: the diagnostics information would not make much sense, while usually good assertions would be perfectly readable and useful for domain experts.

Housekeeping

Finally, here are a couple of useful housekeeping elements, to be used in the same pattern as above: these give warnings about which element declarations are actually handled, to prove the converter.

<xsl:template match="xs:element[@ref]" priority="1" >
	<xsl:message>PROGRAMMING ERROR: trying to process an element reference.</xsl:message>
</xsl:template>

<xsl:template match="xs:element" >
	<xsl:message>I don't know how to handle this kind of element declaration yet.</xsl:message>
</xsl:template>
Rick Jelliffe

AddThis Social Bookmark Button

There are three rules concerning documents with xs:ID and xs:IDREF.

First, they must contain token values, that accord with the XML naming conventions. We check this already as part of the simple type checking. (The empty string is not allowed.)

Second, no attribute of type ID can have the same value as another attribute of type ID.

Third, for every attribute of type IDREF there must be an attribute of type ID with the same value. (There can be multiple IDREFs with the same value, but one only ID with that value.) That is what this entry is about.

Here is how to check IDREFs. First of all, we make three variables collecting all element declarations which have IDREF attributes. Then we make three variables containing all the element declarations which have ID attributes. Then we make a list with just the distinct IDs, just to make life easier. (There are other ways to do this, of course.)

	<xsl:variable name="idref-list">

		<root>

			<xsl:for-each select="//xs:attribute[@name][@type='xs:IDREF']">

				<xsl:sort select="@name"/>

				<idref><xsl:value-of select="@name"/></idref>

			</xsl:for-each>

		</root>

	</xsl:variable>

	<xsl:variable name="id-list">

		<root>

			<xsl:for-each select="//xs:attribute[@name][@type='xs:ID']">

				<xsl:sort select="@name"/>

				<id><xsl:value-of select="@name"/></id>

			</xsl:for-each>

		</root>

	</xsl:variable>

	<xsl:variable name="id-distinct-list">

		<root>

			<xsl:for-each select="$id-list/root/id">

				<xsl:if test="position() = 1 or . != preceding-sibling::id[1]">

					<id><xsl:value-of select="."/></id>

				</xsl:if>

			</xsl:for-each>

		</root>

	</xsl:variable>

Now we have all our input data nicely available in variables, Generating IDREF rules is easy. For each attribute that can contain an IDREF we check it against each attribute that can contain an ID. (Now this would be better factored out into an abstract rule, but it is easier to read this.)


	<xsl:for-each select="$idref-list/root/idref">

		<xsl:if test="position() = 1 or . != preceding-sibling::idref[1]">

			<sch:rule context="*/@{.}">

				<sch:assert>

					<xsl:attribute name="test">

						<xsl:for-each select="$id-distinct-list/root/id">

							<xsl:text>//@</xsl:text>

							<xsl:value-of select="."/>

							<xsl:text> = . </xsl:text>

							<xsl:if test="position() != last()"> or </xsl:if>

						</xsl:for-each>

					</xsl:attribute>

					Element <sch:name/> 's IDRef hasn't been found. IDRef: <sch:value-of select="."/>.

				</sch:assert>

			</sch:rule>

		</xsl:if>

	</xsl:for-each>

You can get an idea from this how the ID uniqueness checking could be generated. KEY/KEYREF and UNIQUENESS checks in XSD already use XPath, and don’t use types, so they also should be straightforward to integrate.

Jennifer Golbeck

AddThis Social Bookmark Button

Ok. So perhaps this is not a conspiracy because it’s out in the open, but ebay’s role in keeping feedback ratings artificially high is something worth discussing.

My argument is not about retaliatory feedback, but let’s discuss that briefly. Anyone who has used eBay much knows that feedback retaliation happens. You get treated badly, you leave feedback that says so, and the recipient leaves you bad feedback, sometimes even lying. This is a disincentive for leaving anything negative in the first place. eBay could take steps to make the system more fair, but they don’t. In fact, they have an incentive to leave the system exactly like it is. Retaliation discourages people from leaving bad feedback, and less bad feedback makes the entire marketplace look more trustworthy.

But that could be a bit of me creating a conspiracy, and perhaps eBay has better intentions than the previous paragraph gives them credit for. I considered this a possibility until one of my most recent forays into the depths of their system.

I was sold a counterfeit item on eBay. I paid about $100 for it and when it arrived it was obviously fake. After the seller did not respond to my emails, I filed a claim with Papal (which is owned by eBay for those of you who are not familiar). They offer “seller protection” designed to make sure you don’t get ripped off. Papal sent some messages back and forth and after about a month told me I would need to get the item appraised and send them evidence that it was counterfeit. This can cost hundreds of dollars, and discourages a cheated buyer from proceeding with the process, but let’s allow that it is necessary. On principle, I continued and found someone to certify that I had received a fake. After more than two months of fighting, PayPal finally resolved the dispute on my behalf and sent me a refund. That is all well and good. However, what really got me was the email they sent notifying me of all this:

PayPal has received the item in dispute. A refund will be issued to your
PayPal account within 5 business days.

PayPal regrets any inconvenience you may have experienced.

This claim has been resolved amicably. Please consider this when leaving feedback for this seller.

Thank you for your cooperation.

Sincerely,

Protection Services Department

“Amicably”? What was amicable about this claim? I spent a lot of money and effort trying to get a refund that the seller refused me for months. Why would PayPal tell me to consider this claim amicable when I leave feedback? Well, they have the same incentive as before. A marketplace with no negative feedback looks safer. But none of us want to participate in a system where a seller who regularly sends out counterfeit items is ranked highly, simply because eventually buyers can get their money back by the actions of a third party.

I do not hold out any hope that eBay, who is thriving, will correct their ways. I think it could eventually lead to a third party system (and several have popped up) for creating real and honest feedback about buyers and sellers. I would certainly use such a service - I might even pay for it - because I want to truly know how much to trust people I interact with. I don’t want to be falsely reassured that everything will be ok, even though that seems to be the tactic eBay is betting on for continued success.

Advertisement