April 2007 Archives

Rick Jelliffe

AddThis Social Bookmark Button

As governments and organizations increasingly adopt document standards, such as ODF, for data interchange and to allow non-Microsoft products a better level-playing field, designers of documents and forms will need to alter their approaches so that users can print or display the forms on their particular without concern. Here are some rough tips, and I’d welcome more.

  • The first is to treat paper pages more like HTML pages: expect that there will be some variability, rather than requiring absolute positions. This will cause challenges for machine-read forms, though.
  • Don’t overfill the page. Allow space so that any fairly full paragraphs of text, as displayed in your system, might take an extra line on someone else’ system.
  • Don’t overfill table cells. If you are specifying the absolute size of table cells, make sure that the largest word plus some extra can fit in the smallest cell: don’t size columns to tightly fit the word size in a paragraph.
  • Use common fonts
  • Use styles
  • Use ems or font-size relative units where possible.
  • Train your operators so that they commence bolding or italication on actual word boundaries, not the space before or after.
  • Allow generous space around graphics, because some systems may make different borders.
  • Consider PDF and HTML forms as well.
  • View the forms on the top two applications that users will be expected to use.
Rick Jelliffe

AddThis Social Bookmark Button

I’m just heading off to Thailand for a week: I am speaking at a seminar on Monday “Interoperable ICT Systems
Seminar” with speakers from NECTEC, CompTIA and Microsoft, with me as Dr Strangelove. James Clark has threatened to be there and ask hard questions: scary! He lives in Thailand has been promoting open source software there for several years.

Going over the Open Office Office Open XML schemas to prepare for the seminar, the I’ve been struck with the similarity with early 90s SGML “big system” similarities: the HyTime era. Interesting to see old approaches reborn: the HTML generation of systems went a different way…small documents, no link integrity control, no reuse of links, no semantic labelling, indirection handled by servers not documents: MIME, HTTP, REST, the WWW was about how you could take lots of small dumb documents and build a big dumb eco-system, which turned out to be a fine and practical approach for many things.

I remember Dave Peterson suggesting that tables as we know them (HTML-style, CALS-style) were bad because they mixed presentation with content, for example: instead the data should be maintained in a separate semantical structure, and included by reference; so in SpreadsheetML, data and strings can be maintained separately.

Elliot Kimber has often argued that there are many “difficult” problems with handling large dynamic document sets that go away with a suitable, simple indirection method: hence his XIndirect, and indeed OASIS SGML/XML catalogs and even ISO DSDL’s Document Schema Renaming Language (DSRL) which comes through Martin Bryan; the relationship system in the Open Packaging Conventions seem similar.

It is an interesting thought, though: at what point of complexity/maintainability does it become a requirement to add extra levels of indirection? I can see that both extremes are appealing: the one that says “just make do with simplicity” and the other that says “build in moderate indirection because it is easier to have it there when you need it and impossible to retrofit.”

Kurt Cagle

AddThis Social Bookmark Button

Adobe announced a few days ago that they would be open sourcing the FLEX API and framework. I’ve found it amusing and instructive over the years to realize just how open source licenses are increasingly being used by companies as a business weapon, as a means of gaining (or keeping) control over a market at the cost of losing software license fees.

Certainly, this is the case here. Adobe and Microsoft have long been engaged in a quiet cold war that has, at its base, control of the way that information is presented - how documents are laid out and fonts are displayed, how vector graphics work in two, three and four dimensions (assuming time as the fourth), how we build user interfaces for everything from game programming to advertisements to forms. Adobe, Microsoft and the W3C have each established differing approaches to this problem of presentation, the first two by creating proprietary standards and technologies on top of them, the last by creating open standards and encouraging the use of these standards by others to build the technologies.

Niel M. Bornstein

AddThis Social Bookmark Button

I’ve never been to XTech myself, although I almost delivered a tutorial at its previous incarnation, XML Europe, back in 2004 or so. But I happen to know that conference chair Edd Dumbill and his committee have put together another fantastic program. If there’s any way you can attend, you should register by May 2 for XTech 2007 in Paris, May 15-18. I wish I could be there too.

M. David Peterson

AddThis Social Bookmark Button

// @author RussMiles.com - Home - Why Rails is not yet ready for SOA…

I am most definitely a Rails advocate. Not to the point of religious fanaticism, as you sometimes see around the Ruby on Rails camp, but definitely a massive fan. Rails, to me, is honestly a technology that makes it very easy to create great web applications and, funnily enough, services[3].

So why do I say that Rails is not ready yet to be great for SOA? Well, the key in my title is the word ‘yet’.

More goodness at the above linked title…

Thanks for the review, Russ!

Rick Jelliffe

AddThis Social Bookmark Button

Data structures people like to think of an XML document as superficially a rooted tree of the type called an Attribute Value Tree (AVT) and, when you add IDREFs, a kind of ordered, directed graph. This puts the emphasis on the element structure. But of course an XML document is more than that: it is also a tree of entities, a tree of notations, a tree of character set encodings, and so on. A relational person might see tables of atomic values split up and regrouped according to keys. An SGML or markup person would see it in terms of linear text which has had various range annotations to provide metadata; the element ranges being synchronous (i.e. no overlap) also means that they can be viewed as a tree, however there is no reason why a subrange actually relates in any semantic way to a containing range: the element is a property of the text not the other way around.

There are particular, admittedly niche, areas where the synchronous restriction galls. So there have been various systems for concurrent markup proposed. Many of these go outside the meager resources that XML allows back towards the parsing power of SGML, and some even extend SGML. I was looking over Michael McQueen’s Rabbit/Duck grammars which deals with validating concurrent structures: I wondered about how Schematron could be used.

Lets take the most common case of overlapping markup: bold and italic because it is easy to visualize: we want:

THE GRAPHS OF WRATH

where a brave but naive soul would mark this up as

THE <i>GRAPHS <b>OF </i>WRATH</b>

but the XML markup has to be

THE <i>GRAPHS <b>OF </b></i><b>WRATH</b>

Lets make a constraint that there can only be one “phrase” of bold in our text. An odd constraints, but it relates to grouping arbitrary elements together. First, lets use markup to indicate connection, with an IDREFattribute called join.

THE <i>GRAPHS <b id="b1">OF </b></i><b join="b1">WRATH</b>

Note that we now have represented the concurrent structures, but @join does not require that the sections be contiguous: interrupted or dispersed sections are possible too! Now for the Schematron schema

<rule context=" $context ">
                   <assert test="count(b[not(@join)]=1) and count(b[@join][@join!=../b[@id]/@id)=0 )">
                   There should be exactly one phrase of bold. This should be marked up with one or more
                   b elements, but one of those b elements has an id and the other have join attributes.
                  </assert>
</rule>

The same approach can be extended for different occurrence and position constraints.

M. David Peterson

AddThis Social Bookmark Button

Update: BTW, on the *HIGHLY* off-chance you are unaware of who Jeni Tennison is (I realize the chances are basically zero, but as I once proclaimed suggested may have partially fabricated completely lied about for the sake of pretending I knew what I was talking about on XSL-List, a new XSLT developer is born once every 4 minutes (or maybe it was 4 hours… I can’t remember as it was an on-the-fly-lie, and we all know on-the-fly-lies leave our memory banks about the same time they enter), and with this “fact” in mind), maybe Jeni’s search link on Amazon.com will help you get up to speed.

Oh, and while you’re there, pick up this one, that one, and at very least this one as well, though if you have the means, I would recommend buying all of them, and then set aside a solid year or two to learn how to write code the way God intended us to write code: The Correct Way (AKA XSLT, Functional Programming, and/or Lisp, Scheme, Haskell (still need to learn Haskell myself, but all the cool kids think it’s great, so I think it’s time I start following the in-crowd and start learning a thing or two about it. ;-) and so forth.

[Original Post]
“And I think to myself… What a won-der-ful Wwoooorrrrllldd….”

Hello, David Carlisle! - O’Reilly XML Blog

That final piece slips into place…

Jeni Tennison | April 23, 2007 02:24 PM

and via http://www.jenitennison.com/blog/node/1

So I finally have just enough spare capacity to start blogging. And if James Clark and David Carlisle can join the party late, why not me. (Yes, M. David Peterson, your eveel plan is coming to fruition.)

Ohhhhhhhh, Yeahhhhhhh!!!!!!” (or is it “Bwah, haa, haa, haa, haa, haaahhh”? Pick one and run with it ;-))

Welcome to the blogosphere, Jeni! Not like I really have to say this, but >> SUBSCRIBED! :D Oh, and *GREAT* design! Good layout, nice choice of color and contrast. Of course, given that it’s you, that’s to be expected, but still worth pointing out none-the-less. :)

Tommie? Wendell? Feel like a little Christmas gift giving in July? (or anytime sooner? :D :D :D) You know, for the children (a group in which I proudly count myself amongst. ;-))

Thanks in advance for your considerations! :D

Kurt Cagle

AddThis Social Bookmark Button

This has been one of the rougher weeks I’ve faced technologically speaking. A technical glitch forced the server that housed most of the Understandingxml.com content for the last year to reformat itself, wiping out much of what’s been there for the last several months. Fortunately, much of this content was also echoed on this site, so not a huge amount has been lost, but it will take a while before all of my essays have been migrated back. For those of you who’ve wondered where Understandingxml.com has gone, well - it will be back up in time. I’ll be using XForms.org as my primary blogging point for a while, until we can get things sorted out with UXML.

It happens periodically that I get hit with what I’ve come to term a “chaos storm”, where technology in general seems to experience extreme entropy around me. When I was first learning computers in high school, I’d actually held off even though programming appealed to me because I had absolutely no aptitude whatsoever with electronics, and even several years later when I was working on a degree in physics I chose fairly esoteric theoretical areas because the experimental physics professors quietly discouraged me from getting anywhere close to a laboratory.

AddThis Social Bookmark Button

I attended the Web 2.0 Expo last week, representing the AOL Developer Community. One thing that stands out for me is — not only is XML experiencing a kind of “renaissance” (renewed interest in XSLT, application of microformats as a mechanism for creating the uncapitalized “semantic web,” revived XML-related standards activity, etc.) — but in a very real sense, XML has become pervasive on the Web. It’s become a natural part of every Web developer’s toolkit.

In a sense, you can no longer put “XML” on your resume in the list of technologies you understand. Yes, it’s been that way for a while, but what I mean is that today there are new complexities, new mechanisms which utilize XML, and these are moving to the forefront, becoming a “standard” means for distributing data and interfacing applications on the Web. Hence, for a Web developer to say “I know XML” will prompt a well-deserved “well, duh!” response from any other Web developer.

Even in cases where the technologies themselves aren’t brand new, their application is growing. For example, ProgrammableWeb.com founder John Musser presented a slide in his “API And Mashup Best Practices” session that suggested that large companies that have APIs increasingly consider it critical to offer a REST version of their APIs. 68% of the APIs were accessible using REST, compared with 40% using SOAP, with Javascript, XML-RPC, Atom, and proprietary interfaces all in the single digits. (The totals exceed 100% because many APIs provide multiple interface methods.) The conclusion is that the user community increasingly expects to be able to access APIs using REST, and in response vendors are making the effort to provide a REST interface to their APIs. REST apparently is considered the most efficient and easiest-to-work-with API interface by developers.

There was not a single session (as far as I’m aware) that was “about” XML or XSLT or REST. There was a session about microformats. Yet XML as a data transport and/or application interface device was an element in almost every code-centric session I attended.

Interesting!

M. David Peterson

AddThis Social Bookmark Button

Update: PLEASE NOTE: I’ve turned off comments to this post as,

1) I don’t have anymore invites right at this moment.
2) I probably should have stated “send me an email to m.davidATxmlhacker.com” instead of hyperlinking “first request” with a mailto: link to the same address. That’s my bad. The next batch I get I will update the top of this post with something more explicit as to what you need to do to get the invitation I have available.
3) Each additional comment add’s another point to the popularity of this post. Neither the content or related conversation is worth being labeled and broadcast to the world in various places as being labeled as something they need to check out because of it’s popularity. At the current comment rate this post would maintain the #1 spot on the front page of the blogs section of O’ReillyNet from now until Christmas 2009. For that reason alone it really isn’t fair to leave the comments open. There’s a lot of interesting blog entries on O’ReillyNet. As much as I appreciate each of you who have spent the time to leave a comment, this post isn’t one of them.

When I get a new batch of invites, again, I will update the top of this post at which point the first person who emails me with the specified and proper subject will get the invite.

Update: The time is now 2:42 P.M. Mountain Daylight Time on Friday, April the 27th, 2007. I’ve got one invite left from the latest batch of invites I was given. I realize that a *TON* (<- *WOW* <- is all I have to say. Well that + the fact that (Skype + YouTube) = Nothing even close to what Joost will ultimately be sold for, that I can promise you!) of you have followed up the original post, some begging, some demanding (NOTE: Not an approach I would personally recommend, but to each his/her own ;-), and some who are willing to trade an assorted base of items in return for the rights to an invitation that was already long gone before their email arrived.

With this last bit in mind, to keep things exciting and to see how many of you are actually paying attention, the first email to arrive after the above stamped time with the subject line: “Yes, I actually read the post” gets it.

Update: I was unable to access Joost all of yesterday, so was only able to just now send out the invite to Dan Arbel who just barely beat out the next request by a few minutes.

I’ll update the top of this same post when I receive a new batch of invites, so stay tuned…

[Original Post]
The first request that arrives in my inbox gets it.

M. David Peterson

AddThis Social Bookmark Button

[IronPython] pybench results for CPython and IronPython

For me, the difference between python’s dynamicity and Boo is simply
that python allows for a more exploratory way of development.

One of things that I have come to absolutely *LOVE* about IronPython is the interactive capabilities the ‘intellisensed’ console facilitates. Programming in “real-time” (AKA Read Evaluate Print Loop, or REPL) as opposed to statically compiling an application to then run the result has got to be the most powerful programming pattern we dev folk have in our development tool bags. Until just now, however, I hadn’t thought of it in terms of “Exploratory Programming”, but Luis has nailed it right on the head, as that’s exactly what progamming in Python and/or any other dynamic language-based development environment is all about.

Nice!

Michael Day

AddThis Social Bookmark Button

It is a truth universally acknowledged that “DTDs don’t support namespaces”. Or to be a little more pedantic, that DTDs don’t support namespaces in their full generality. However, one might as well say that XML 1.0 does not support namespaces. Given that the specification of Namespaces in XML augments XML 1.0, it seems more reasonable to ask why don’t namespaces support DTDs?

Rick Jelliffe

AddThis Social Bookmark Button

Different applications on different systems use different fonts, fonts with the same family name but different metrics, different hyphenation algorithms, hyphenation setting defaults, hyphenation dictionaries, different size spaces, different line-breaking algorithms, different widow/orphan/keeptogether rules, and different co-ordinate space measures.

This means that even if a document is saved as XML which completely captures all the page and style settings and so on, and even if the receiving system has the same generic fonts and even all the same compliment of “muffin borders” and other art are available on the receiver and sender systems, a document moved from vendor’s A application on Platform A cannot be expected to open up with line-for-line or (for multiple page documents) even page-for-page fidelity on vendor B’s applications or even on vendor A’s application on platform B. Even with good matching, a word here or there will break or hyphenate differently, a line will break differently over a page, and so on.

This is particularly noticeable on short measures: particular table cells. Unless all the cells in the table are each wider than their content, with no multi-line cells, there is every chance that lines may break differently.

Note that this will happen regardless of whether you are using ODF or Open XML: it is not the limits of the XML representation as much as that applications have different code inside them. If you want exact fidelity, the current state-of-the-art is you have to pretty much use the same application (and platform) to open the document that it was created in.

What can you do to minimize this?

Well, for a start you need to set your expectations appropriately: an HTML page looks different on every different browser and OS and depending on the window size too…do you really need exact line and page fidelity? The HTML experience is strongly that it is better to have presentation-independent design, allowing flexibility, in order to get the benefit of re-target-ability.

Strategies for coping with these issues have dominated SGML/looseleaf publishing systems: it is not simple. One thing is to ween yourself of page and line dependencies: use section numbers to refer to things, and IDs, not page numbers. Never hard-code page numbers or line numbers, but use references and variables.

In your typesetting specs, make your widow/orphan control move paragraphs over the page readily (if you expect there will be additions) so that there is plenty of whitespace at the bottom of the previous page, so that typing a few extra words here and there will not cause repagination. If you do this well, then it also makes using the ocassional forced page breaks more workable.

There are mixed strategies: send PDF as well as the XML document, and use the PDF as much as possible, until there needs to be editing. Or send HTML as well to discourage page-centricism.

Another strategy is to clearly separate out those pages that must not break, and treat them as artwork, included from external documents. Pages contain examples of forms in particular are better handled as graphics, when included in general documents.

And there are procedures to take as well. For example, if someone sends you a document and you open it in application B, first go through all the tables and resize the text so that it breaks the same as the PDF. Of course, this relies on your document using styles: but if you don’t use styles you are probably messed up anyway (because there are many ways to do the same thing, and they may result in different results: for example, on some systems a bold space is bigger than a plain roman space!).

I remember that Word Perfect had a (patented) feature where it would adjust fonts size and table borders for optimal layout. This is exactly the kind of thing that would be needed if we want to get better guaranteed fidelity at the line and page level between applications.

Is infidelity ever forgivable?

So remember, there are three kinds of fidelity: fidelity because the document has all the information used by the producing and receiving applications, fidelity because the applications have the same resources available to them, and fidelity because the producing and receiving applications have the same algorithms and defaults. When looking at the various claims (Len Bullard mentions Spy versus Spy) made by MS on Open XML and” fidelity”, and ODF people on “interoperability” we need to interpret them in the hard light of the Dirty Little Secret.

Governments and procurement projects need to be quite clear that whenever they insist on page fidelity, they are probably in fact locking themselves into one vendor’s tools, in which case it becomes a debate on features, quality, price, training, etc. In a limited sense, everything *except* interchangeability.

M. David Peterson

AddThis Social Bookmark Button

For those unaware, Mr. MathML himself (AKA David Carlisle) has hit the ground of the blogosphere in what I can only term as a full out sprint…

David Carlisle: Hello World

Hello World

I thought I’d start a blog…..

SWEET! Welcome, David!

Now if I could only convince Wendell (Piez) and (B.) Tommie (Usdin) to start a blog, my eveel plan to get the who’s-who in the land of XSLT** blogging will be nearly complete (Jeni Tennison being the final piece of the XSLT who’s-who puzzle, though there are few others on my “list” as well… “Bwah, haa, haa, haa, haaaaa…. ” ;-)


** Not that I had anything to do with David, or anyone else for that matter, starting his blog. I guess it’s more of a check-list of people I wish would blog more so than a plan. But it’ an “eveel” check-list, that’s for certain! ;-)

M. David Peterson

AddThis Social Bookmark Button

I accidentally blew up the wrong EC2 instance. That same EC2 instance had, amongst other things, Planet XSLTransformations on it. I forgot to set a cron job for S3Sync to back-up that particular directory.

Damn.

Fortunately Google cached the FOAF feed for the site. As such, I created a quick-and-dirty FOAF<2>PlanetPlanet initialization file. Maybe you can make use of it as well? Don’t know, but just in case, here it is…

FOAF2PlanetPlanetInitializationFile

A FOAF to PlanetPlanet Conversion Utility

Introduction

This module will convert a FOAF file to a PlanetPlanet? initilization file. This is an XSLT 2.0-based solution.

Details
Repository Location: http://xslt.googlecode.com/svn/trunk/Modules/WebFeed/FOAF-to-Planet.ini/

Inventory

init.xml
FOAF2Planet.xsl
Any number of FOAF files.

Enjoy!

Rick Jelliffe

AddThis Social Bookmark Button

I just caught up with the interesting news item of a fortnight ago that Malaysian standards body Sirim Bhd has “suspended the process for approving the Open Document Format (ODF)… as a Malaysian standard.”

What was particularly interesting was the reason given: “Ariffin said some TC/G/4 members had taken to belittling other members who did not share their … views, both during committee meetings and in personal blogs. These … members were also attempting to short-circuit the normal consensus process for adopting a document standard, he said. ” “”There has been unprofessional conduct and a lack of ethical standards among some members of the technical committee. ”

Now I don’t know anything more than the article and various blogs around the place claim. But if this represents a trend by standards bodies to get tough on personal attacks and so try to bring back a more professional and civil level of discourse, then that is great.

A cooling-off period may seem a strange approach if we have the idea of standards committees as being like courts of law that judge technologies. But in fact they are like formalized conversations. Courtrooms are the world of accusation and defense; standards procedures are the world of dialog: questions, answers, suggestions, tentative questions, and so on. Committee procedures stop hectoring and bullying, and make sure that member’s voices are heard: this is frequently boring, but nevertheless a Good Thing.) The aim of a conversation is the meeting of minds: I have mentioned before that the ISO process is one of consensus, of trying to find win-win positions, and I think the same is generally true of national bodies.** (Of course, not every conversation is constructive, or cannot be dominated, so I don’t want to take the analogy too far!) IIRC, ISO suspended the 802.3 committee because it did not seem to be constructively engaging with China for a similar timeout; it is unusual, and salutatory but pragmatic.

Sirim’s boss Dr Ariffin makes some very interesting side points too: I hadn’t heard* his attributed view, for example, that ‘a mandatory standard would constitute an illicit non-tariff barrier against software products using other document formats.’

I think his reported position that ‘a standard can only be mandatory when public health or safety is at stake” has a nuance: it relies on the distinction between voluntary standards (where users decide to adopt) and mandatory standards (where the state forbids anything else, say due to treaty obligations, and may police). Governments, as organizations, can still restrict themselves to a voluntary standard as part of IT strategy or other policy, all other things being equal, however: that is a different issue.

I will be in Malaysia (and Philippines and Thailand) in mid-May, presenting some seminars on Open XML and the standards process, so maybe I’ll get some better information then.

Keith Fahlgren

AddThis Social Bookmark Button

The first interoperability session for Atom Publishing Protocol implementations (both clients and servers) was a success. The best news was that many of the clients and servers were able to interoperate with little to no tweaking despite never having met before. Check out the (evolving) grid of success and failures for details. More than 20 implementors attended the event, held yesterday and today at Google, as well as Lisa Dusseault, the IETF Area Director for APP.

Simon St. Laurent

AddThis Social Bookmark Button

I’m here at the Web 2.0 Expo, a computer book editor surrounded by all kinds of possibilities for web-related books, articles, PDFs - pretty much everything here is publishable, and would interest someone. At the same time, though, there’s been a consistent message here: everyone out there knows what they want better than you know what they want. So….

M. David Peterson

AddThis Social Bookmark Button

So I’m at the APP Interop at Google today, and discovered that, in fact, it is possible for two Zunes to be in the same place at the same time.

Proof,

Rick Jelliffe

AddThis Social Bookmark Button

Give me a child until he is seven and I will give you the man. I wonder whether my enthusiasm for plurality springs from my childhood indoctrination by the book Fattypuffs and Thinifers. Not so much Maximalist versus Minimalist, but Monolithist versus Layerist, perhaps.

My dear old Dad had a patient, an old lady who used to work in the railway kitchens who, a little intoxicated by her medication, presented him every week with an enormous multilayered chocolate cake, about three times as high as round and both splendid and terrifying. I think from that I learned that even a highly layered thing can be too big and fat. Perhaps people who make unnecessarily large technologies never had the benefit of the cooking of eccentrics: perhaps we should be feeding our children more, or, at least, more miraculously architected, cakes.

Rick Jelliffe

AddThis Social Bookmark Button

Ecma 376 Office Open XML’ DrawingML uses an odd measure called the EMU: short for English Metric Unit. There are 36000 EMUs per cm, 91440 EMUs per inch.

The reason for this may become clearer if I note that, using the Adobe “big point” of 72 points per inch (rather than the old 72.72), there is 1270 emu per point. Err, maybe not…

What about this then: 36000 and 91440 are divisible by 2,3,4,5,6, and multiples?

Still no idea? Well, representing numbers in computers is frought with errors every time you have to have anything that requires fractions, or with multiplication or division by numbers that are not 2^n. That even can includes multiplying by 0.5. Computer scientists spent a lot of their early time investigating various techniques to overcome these problems: in a branch of mathematics (or is it engineering?) called numerical methods.

These errors are small by themselves, but when you have, for example, long sequences of calculations such as graphics object where one segment is positioned using the result of the last segment, the accumulated error can increase. In publishing, misalignent can have a serious effect when there is some kind of multi-color printing: you can get registration errors.

One way to circumvent the problem is to move to integer (whole number) arithmetic: you find some convenient small measure that can be multiplied so that you don’t need to use floating-point numbers. When you do divide, you throw away the remainder, because it is below the precision you are supporting; but because the data frequently is aligned to grid positions (1/2 inch, etc) there will be no loss of precision from data capture (what the user sees) to the internal representation. Now armed with this perspective, lets imagine a set of criteria for a typesetting system or vector graphics system:

* use a small unit to allow implementation in integer arithmetic
* this unit should allow allow exact whole divisions (no remainder) of the common measures of modern English-speaking countries’ typesetting: the cm, the inch, and the point. So a half inch, 10.5 points, or a third of a CM are all exact (within the bounds of the system)
* the unit should be small enough to allow non-”English” measurements with, say, 0.01% precision (or do I mean inaccuracy?): the continental diderot or the Japanese Q system for example

If you take these kinds of criteria, and work through the numbers you get something like EMU. They are used by Ecma 376’s DrawingML for ‘high precision co-ordinates” in certain places. The rest of the time, people can use locale-dependent measures.

So if EMU is a reasonable technical approach, is it a reasonable measure to appear in a standard? To my mind, this falls in exactly the same bucket as SpreadsheetMLs use of numeric indexes, though there are accuracy issues as well as performance issues. I think it comes down to the purpose of the standard: when the purpose of the standard is too allow high-quality typesetting and graphics and to reflect the triggering application, I think the exact numbers such as EMU may win. However, when the purpose is to allow data interchange and human/read and writability, then using SI and locale-dependent measurements will win.

The EMU issue is also a interesting one from a standardization viewpoint: there is a kind of premise that supporting a standard (obviously the specific application-independent alternative is SVG-in-ODF in this case, but this applies to systems supporting Open XML too) involves adding functionality or adjusting superficial details (names of elements and attributes, use of property elements rather than attributes, and so on): this is, I think, the view that underlies Tim Bray’s comment (from memory) “how many ways do we need to say some text is bold or italic”? However, there are other changes that go to implementation: converting to and from SVG (as it is) presumably entails foregoing give up exact import and export of data in the “high precision coordinate” system. The difference would be minimal, a rare pixel here or there, I’d expect.

Like the data indexes, I don’t particuarly know why Open XML couldn’t support both the common notations as well as the optimized one. Best of both worlds. But EMUs are a rational solution to a particular set of design criteria, it seems to me: and the name English Metric Units that has caused alarm seems less alarming when understood as just a descriptive name and not a reference to something external.

M. David Peterson

AddThis Social Bookmark Button

Last Sunday the power supply on my *MUCH* beloved DevBox finally gave up the ghost.

Picasa Web Albums - xmlhacker - Dead DevBox

DSC00536.JPG

Death of the DevBox

As per the above photo, if not obvious, that’s a power supply half the size of my DevBox hanging off to the side. The machine itself is completely custom, right down to the screws that keep the sub-compact power supply and cooling system snugly fit inside. Finding a replacement is no easy task, and it was becoming more and more obvious that my last minute hack of desparation — ripping a power supply out of a nearby tower, pulling off the side of the DevBox, and plugging it into the motherboard — was not something that was safe and as such, something that I could expect to last for very much longer. Couple this with the fact that in its current state it was no longer a “portable” workstation (coupled with a flat-screen monitor and a reasonably sized ergonomic keyboard, you might be surprised at just how portable such a workstation can be) and it became all too obvious,

It was time for Timmy’s well deserved retirement. (< Yes, his name is Timmy. Long story… Don’t ask. ;-)

Anyone who knows me, knows that I have never claimed to be a Mac FanBoy. As per the intro to the photo collage of my first Mac purchase a year ago October,

Jim Alateras

AddThis Social Bookmark Button

OASIS members have approved Web Services Business Process Execution Language (WS-BPEL) as an OASIS standard. WS-BPEL is a core piece of WS-*, which defines an object model and associated grammar for web service orchestration.

Rick Jelliffe

AddThis Social Bookmark Button

Since 1999 I have been putting out a diagram “A Family Tree of Schema Languages for XML”. Here is version 7: I have redrawn it because it runs onto an A3 sheet.

The extra size has allowed me to make the lines less confusing and to add

* the complete set of ISO DSDL schema languages
* the ISO Topic Map set of constraint languages
* the RDF family of schema languages
* the newer and proposed schema-related languages
* some misc older languages, or languages that didn’t fit
* a new section for other languages
* more of the schema languages that I invented during my time at Academia Sinica, in fine print


SchemaFamilyTree-small.png

Here is a higher resolution version.

A PDF is available for download here.

M. David Peterson

AddThis Social Bookmark Button

So in preparation for this, which is also in preparation for the launch of something even bigger, I’ve been pretty much heads down in both code and some other things of which I can’t really speak about at the moment, so I am coming to this news flash just a tad bit late. But regardless, this is a little too big to just ignore.

I’ve obviously taken a somewhat playful attitude with the fact that I am under NDA with Bungee, but none-the-less, I can’t really speak all that much in regards to my own knowledge about what they have going on behind the curtain. If you will be at the Web 2.0 Expo, you’ll find out soon enough, and if not; well, with the stir they are bound to create, again: You’ll find out soon enough. ;-) That said,

M. David Peterson

AddThis Social Bookmark Button

only this, and nothing more: Lack of elegance is viral

Many years ago, I made my living writing Perl CGI scripts, but the state of the art has made some serious advances since then. PHP is not one of them.

M. David Peterson

AddThis Social Bookmark Button

I was out for a walk earlier this afternoon and came across this incredible performance by what I believe is the Bellevue Christian Concert Choir (I spoke with some of the students afterwords (pictured below) and they mentioned they were from Bellevue Christian School.)

AddThis Social Bookmark Button

It’s been an XSL kind of weekend for me, thus far. First, an associate from the AOL Developer Community pointed me to the “Ficlets enhanced author feed, an XSL scraper hack” post at the 0xDECAFBAD blog. Then, in the May issue of Dr. Dobb’s Journal, I saw the article “XSL Transformations: A delivery medium for executable content over the Internet”.

My interest in XSL has been on the increase, after several years of lull — driven mostly by the fact that I was too busy with work, and none of the work required XSL. Then M. David Peterson’s “Solving FizzBuzz in XSLT 1.0″, along with the talk about XSLT 2.0, reawakened my interest.

FizzBuzz in XIM?

What I really wanted to accomplish, and hence be able to write about, was that I had created a XML “program” written in the Minimal Imperative Language XIM (see the Dr. Dobb’s article) that would perform FizzBuzz.

Alas, it is not to be — not this weekend anyway. XIM looked straightforward, starting with the example program from the article:

<?xml version="1.0"?>
<program>
  <vars>
    <var_declare name="fact"> 1 </var_declare>
    <var_declare name="last"> 0 </var_declare>
    <var_declare name="nb"> 5 </var_declare>
  </vars>
  <main>
    <assign varn="last">
      <var_use name="nb"/>
    </assign>
    <while>
      <condition>
        <boolop opname="gt">
          <var_use name="last"/>
          <num> 1</num>
        </boolop>
      </condition>
      <statement_list>
        <assign varn="fact">
          <op opname="*">
            <var_use name="fact"/>
            <var_use name="last"/>
          </op>
        </assign>
        <assign varn="last">
          <op opname="-">
            <var_use name="last"/>
            <num> 1</num>
          </op>
        </assign>
      </statement_list>
    </while>
    <end/> <!-- program termination -->
  </main>
</program>

But my attempts to convert this into something that could print a variable each time the loop is executed did not succeed. And studying the 779-line XSL file that performs the processing implied that I’d have to change that too, in order to print variables. [Note: You can get the XSL and the sample XML in the May 2007 source code zip file: 0705.zip]

It’s an interesting project. But I couldn’t accomplish it yet.

On to looking at what Les Orchard has come up with his Ficlets enhanced author feed — that XSL is a mere 93 lines long, a size that maybe I can digest before the weekend ends…

M. David Peterson

AddThis Social Bookmark Button

James Clark is blogging!

Welcome, James!

But I’m not letting this opportunity pass. James Clark has taken up blogging and with a bang too!

Welcome back, James!

Suddenly, life just feels more complete. This *ROCKS*! :D

Rick Jelliffe

AddThis Social Bookmark Button

Regular grammars, as used by W3C XML Schemas (XSD), are very good for representing some kinds of patterns in documents. XPaths, as used by ISO Schematron, is very good for locating and testing many other kinds of patterns. One of the reasons that the XML Schemas specification is so difficult is that, after the “XSD schema for XSD schemas” has been taken into consideration, there are still many more constraints left over; and these have to be written up in natural language. The result is that developers and implementers of XSD are left without a standard executable validator.

Here is how you would do it in Schematron. In fact, you could typeset the schema into a useable specification, because it allows rich text and various kinds of linking. Download file

The schema only has a couple of constraint sets, from the several dozen required, but it shows the kinds of thing that Schematron is good for. If you are making up a schema for a public specification, for example and industry standard, and you are finding you have more than a handful of constraints that cannot be expressed in W3C XML Schemas, consider formalizing them in an ISO Schematron schema. The XPaths not only clarify the meaning of the natural language, but they also allow validation with a Schematron validator (which is usually built from a couple of XSLT scripts.)

Michael Day

AddThis Social Bookmark Button