August 2002 Archives

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://lists.xml.org/archives/xml-dev/200208/msg01544.html

Miles Sabin brilliantly writes “All in all, I think the costs of standardization have been wildly
underestimated,” without noting the corollary that the benefits of standardization are often elusive.

The benefits of XML’s standardized syntax seem pretty clear at this point, despite some annoying aspects that just won’t seem to settle. Markup syntax seems to belong to the side of technology that benefits strongly from network effects - things like low-level protocols. Beyond that, there are lots of questions, especially as the market for any given standard grows smaller.

For the past few years, any time questions arose about how to make XML useful, the usual answer was “well, everyone needs to get together and hammer out an agreement about what vocabulary to use.” It sounds so simple, if you’re fond of committees. Of course, gathering to produce agreements sets up the usual set of political arguments, as participants assume different roles, use their clout, often close the doors to outsiders, and make compromises that either leave much-desired features out or put too many features in.

Sometimes the process has a breakout hit - HTML is pretty widely understood, though even that has been a long slow process - but the notion of setting up an organization to determine the one true approach to a particular information representation problem seems, well, laughable. Once a proposal reaches a certain critical mass, adherence to standards is crucial to using them, but reaching that point doesn’t seem to have anything directly to with the standardization process.

Looking past the difficulties of getting a standard created, standards themselves create problems. Standards can be traps that limit organizations from applying their own understandings of problems, or burdensome contraptions that require organizations to provide information they may or may not consider important.

Another strange phenomenon is the pile-up of proposed standards that seems to be happening as more organizations join the Web Services universe. While Web Services in some ways reopen the world to anarchic diversity by letting service providers create services and publish descriptions, the organizations pushing this set of technologies seem bent on standardizing every possible tool-related aspect of this anarchy. SOAP begat WSDL and UDDI, and now we have piles of new proposals surfacing nearly weekly.

As automated processing moves further and further up the semantic ladder and standards address more and more specialized fields, it often becomes less clear whether the benefits of “everyone’s doing it” outweigh the costs of “we all have to do it the same way.”

Computing has traditionally been about creating information monocultures, fields of common structures where ambiguity is minimized and the simple logic that computers have traditionally provided is challenged as little as possible.

As the computing world starts to digest markup’s combination of simple foundations and potentially endless labeled structure, it may be time to reconsider the creation of shared monocultures, and perhaps even jettison that process for a wide variety of diverse information understanding that are processed locally - where developers have the clearest understanding of the work they are doing, with sharing handled on an as-needed basis rather than given first priority.

Is this just a standard anti-standards rant?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://simonstl.com/projects/tam/

Armed with a brand-new Palm m125 and a copy of Kim Topley’s excellent J2ME in a Nutshell, I set out to build a very simple SVG viewer and an interface for collecting information in the forest. I’m not nearly there yet, but I’m surprised to find myself enjoying what sounded like a difficult environment.

I’d picked up a Palm V at JavaOne a few years ago when they were first announcing this stuff, and stopped paying attention when it seemed like Java for the Palm got lost in more ambitious projects. Around the beginning of this year I noticed that J2ME, at least in its smallest CLDC/MIDP incarnation, was available. Around June, I finally got around to downloading it and starting to poke at it. (Sadly, I’m stuck for now doing my J2ME work on Windows, since the Mac OS X Java doesn’t provide J2ME support.)

The UI tools are primitive, and interface results vary drastically from device to device. Still, they work pretty nicely, and the low-level UI offers just enough features just enough of a toolkit to let me show people a graphic rendition of the information they’ve entered - trees in a forest. Working with integer-only math is a bit tricky when I want to create plots based on compass coordinates, but it just takes some extra thought.

The one piece of my J2ME work that’s now available is the Tiny API for Markup, a not-quite-XML-compliant parser that will be one of the bases for my SVG work. After doing this, I can see why Jon Bosak kept telling the XML 1.0 group that XML had to be processable on PDAs - though I think XML misses the J2ME mark. (A processor in C would likely be more compact and have more space, so they did okay for PDAs in general, I guess.)

Working in J2ME means do more with less, and has really illuminated specifications in a way I wasn’t capable of doing before. The difficulties in the XML spec become much clearer, SVG Tiny starts to look huge, and things which might have looked reasonable in the ever-faster ever-more-bloated environment of PCs become plainly stupid. While I miss a few things (like Java 2 Collections), for the most part, it’s a privilege to work in a leaner environment for a while. I doubt this will sweep the computing world, but maybe it should.

Is less more?

Jeff Bone

AddThis Social Bookmark Button

In various discussions I’ve been having about whether XML sucks and / or whether there are better alternatives for what it does, I’ve repeatedly eluded to something I read at some point about some research going on at IBM Almaden on the efficiency of XML encoding.

Well, I haven’t exactly come across the original reference, but I’ve found some related research at IBM Almaden. The Vinci Service-Oriented Architecture includes something called XTalk, a loosely-coupled component bus that works through XML document exchange.

XTalk doesn’t actually use XML per se, but rather a pre-parsed binary representation of XML. In the words of the authors:

“While the exchange presented above is shown in XML notation, the communication between Vinci components is not “pure” XML, but rather a semi-parsed, pseudo-binary representation of an XML document we call XTalk. This representation is used because most existing XML parsers are too expensive, in terms of code size, processing time and memory footprint, for use in interactive applications. Section 4 shows how this feature allows for Vinci to perform on par with optimized RPC implementations, and an order of magnitude faster than SOAP-based services. (Concerns about SOAP performance have been expressed before].)

“…To summarize, the main advantages of XTalk over XML are speed, size, and simplicity. Speed-wise, we have found our XTalk parsers to provide at worst a 3 times speed up over a hand-optimized, bare-bones XML parser. In practice, when compared to full-blown XML parsers such as Xerces, the speedup is closer to 10 times or more. Size-wise, our XTalk parsers are approximately two orders of magnitude smaller than comparable XML parsers, and memory footprints a a factor of 4 times smaller. For example, in PalmOS, our basic client, server, VinciFrame document model and XTalk conversion library has a size of only 13K.”

So perhaps there’s the 10x that was lurking in my head: a 10x speedup over stock XML parsers using a type-lite binary encoding using symbol tables. Now, information density / encoding efficiency has been studied and addressed in other efforts (WBXML, Millau, etc.) — I’ll keep looking for a formal treatment of that. Back to XTalk… there are trade-offs to its approach:

“None of these advantages are completely without cost. Our XTalk interpreters offer no type checking and conversion, well-formedness checking, or ambiguity resolution that are features of many fully compliant XML parsers. However, our view is that these features are typically unnecessary in a deployed application. Documents are necessarily well-formed and unambigous when constructed through programmatic document models instead of by hand. And should document validation or type-checking be a concern, Vinci allows it to be plugged in as a meta-service when and where as needed…”

The theory behind these compromises lies in the intent of XTalk. It is designed for an Intranet environment where developers work in a more coordinated fashion to integrate their services and components, and where security and other considerations are less than on the open Internet. The idea is that the messages need not be as strongly-typed or self-descriptive, since the usual use of any given message is strictly in some usual exchange of information. This is a legitimate engineering trade-off.

Point is, though, this isn’t a safe set of assumptions in *all* the contexts XML is used in and all the functions it is designed to fulfill. If you can go with implicit typing, possible formedness issues, ambiguity, etc. — i.e., if you can punt many common error cases or handle them yourself — then maybe XML does such. Maybe something like XTalk - a kind of XML-lite with whatever representational syntax you want - maybe that’s for you.

It’s interesting to note that XTalk’s transmission format is translated to an in-memory document representation based on FramerD, one of the classic frame data models. This lends some weight to the arguments of those who claim “frames are all you need.”

This all comes back to the basic point I was making in the first article [4] on this topic. The operative philosophy in question is: this is a technical question, not a philosophical one. (Whew! Get that? ;-) This is sort of like strong typing in programming languages: if you need it in your particular situation, you really need it, but if you don’t it’s a pain in the ass. Similarly, if you need strong typing, extensible / nestable metadata, namespaces, etc. in a representational format, XML probably isn’t too much better or worse than any alternative you could come up with. OTOH, if all you need is frames and / or you can make certain other simplifying / limiting assumptions, then XML — and certainly XML with its full acronym soup brought to bear — is probably overkill.

$0.02,

-jb

Hmmm…?

Micah Dubinko

AddThis Social Bookmark Button

Related link: http://dubinko.info/writing/xforms/dblite.css

“There’s always Word.”

But if the thought of writing an entire book in Word hurts your head as much as it does mine, there’s always XML. Specifically, there’s DocBook Lite, or “dblite”, which is a preferred XML format at O’Reilly, and perhaps other publishers.

As I mentioned earlier, I use SoftQuad XMetaL, which has the great feature of being able to edit any XML styled by any CSS.

Steve Muench, who wrote Building Oracle XML Applications for O’Reilly, was kind enough to share his CSS, which I have modified to add a few things and suit a few personal tastes. With his permission, we’re putting the file under the GFDL for the world to share and use.

Enjoy! -m

Share your thoughts on DocBook Lite or the CSS used to work with it.

Jeff Bone

AddThis Social Bookmark Button

As mentioned in the Slashdot talkback comments on this issue, Activerse developed and released an IM bot development toolkit called DingBot SDK in 1997-1998, over two years before ActiveBuddy’s initial filing. I was the CTO of Activerse during most of this period, and there’s something particularly and personally galling about the egregious lack of due diligence exercised by the PTO in granting ActiveBuddy’s patent.

Up front: though this is perhaps an increasingly unpopular position for me to take, I *believe* in software patents; I just filed 6 of them at the end of last month on behalf of my current company. The problem with patents like the ActiveBuddy patent is that, when granted, they undermine the legitimacy of the entire patent system. They turn the patent system into an incredible waste of time, money, and effort on the part of anyone who seeks to obtain protection on true inventions; since patents are granted without apparently any substantial due diligence, no inventor who receives a patent on something is in any way assured that his invention is protected.

The patent system needs to be reformed, not overturned. Patents often serve a useful purpose: commercial interests often would not be incented to invest in research / development without some assurance of monopoly protection for a limited duration in order to recoup their investment. In the long term, with limited duration and when valid, patents are also a boon to society: upon expiry they help grow our “commons”. But when they are granted frivolously, with inadequate due diligence on the part of the inventors and the examiners, patents are a problem. The PTO MUST begin to exercise appropriate due diligence particularly in prior art discovery, and the law should better safeguard the right of the public to an invention by ensuring that the duration of patent protection is effectively limited.

The ActiveBuddy patent is a good example of a bad patent; apparently the patent examiner did not so much as bother to query Google and read the first few hits in the course of their search for prior art. While I (obviously) don’t have any continuing commercial interest in the Activerse area and IP, it seems to me that — as someone seeking patent protection on new, novel inventions — I do indeed have a vested interest in seeing the quality and defensibility of granted patents improved. To this end, it is in my interest to see the ActiveBuddy patent struck down.

It’s impractical for me to expend the financial resources and personal bandwidth necessary to pursue a reexamination request through its conclusion just on principle, but I’d be happy to help anyone else seeking to strike down this patent. I can assist in the preparation of affidavits and can help put you in contact with other developers from Activerse who also might have an interest in lending a hand.

If anybody is going to take this ball and run with it, feel free to contact me.

If you or someone you know is going to run with this, let me know…

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.wired.com/news/mac/0,2125,54365,00.html

Wired News is running two stories, one on how HyperCard is still around, and another on Bill Atkinson’s reflections about missing out on networked stacks. They’ve gotten me thinking about how I came to be a Web and then an XML developer.

When I bought an iMac this year, my first new Mac since the Quadra 840AV, I immediately bought a copy of HyperCard to go with it. It runs happily in Classic mode, and all my old stacks still run.

I first got into hypertext when I was frustrated by teaching Mac users to solve the same problems over and over, and decided to put my knowledge in one place. HyperCard let me do that, complete with interactive walkthroughs, and HyperTalk was the key to making it all work. It was simpler than the Applesoft BASIC I’d learned years before, and event-based interactivity was really cool.

A few years after that first project, I was living alone and working at Kinko’s Copies. I’d run the copiers by day, and come home to HyperCard programming at night - or vice versa. The stacks I wrote then are still available. Trying to move from HyperCard to some other environment was tough, I found - hypertext was getting started, but all I could find were Xanadu’s promises, HyTime’s remarkable complexity (which put me off markup completely for a while), and the enormous challenge of writing my own projects from scratch.

Fortunately, the Web was getting started at just about that time, and a year later I dropped my other work to focus on the Web. I started converting my stacks to HTML, though there were plenty of character-escaping glitches to make life entertaining. After a few years in HTML, I shifted over to XML, where I’ve stayed ever since. I have to admit, though, that I’ve repeatedly considered rebuilding the work I did in HyperCard using XML, XLink, and Java, and that I’ll probably get to it some day.

I’d love it if Apple open-sourced or even just freed HyperCard, but I’m not waiting. It was a fantastic place to start, though!

Anyone else have HyperCard nostalgia?

Ben Hammersley

AddThis Social Bookmark Button

Related link: http://groups.yahoo.com/group/rss-dev/files/Modules/Proposed/mod_cc.html

Just a quicky: I’ve just uploaded the Creative Commons initiative’s RSS 1.0 module: mod_cc.

It is designed for content authors to mark up their RSS feeds with the licensing details of the <item>s contained within. Any license can be used, but the Creative Commons team , and the module, provide seven components to build together to describe your own.

Anything to say? Go oooonnnn, my son.

Jeff Bone

AddThis Social Bookmark Button

RANT_ON

Okay guys, I give up. Since my first online post to USENET in the mid-late 80s, I have received an ever-increasing amount of unsolicited and increasingly commercial (and decreasingly interesting, well-constructed, compelling) e-mail advertisement. It was amusing, then annoying, then infuriating, then… I thought… intolerable.

I have used every ounce of ingenuity and technology at my disposal against this blight. I’ve used local mail filters, increasingly sophisticated procmail scripts, commercial services, misdirection, etc. etc. etc.

None of them have worked satisfactorily.

I have tried to maintain my anonymity online, to the detriment of my public persona, my personal / public career, and my “reachability” to those persons in my various latent communities who I do not yet know.

I’ve opted out. (How naive am I? Respond to a spammer? What was I thinking?)

I’ve lobbied my congressmen, and have done so with a certain sense of guilt: appeal to the gov’t to act to restrict a freedom that is, while an economic and personal nuisance to me personally, still a basic constitutional right of all Americans and probably all human beings, depending on one’s theory of rights? How big a hypocrite am I?

None of this worked. None of this *should* work — the solutions I’ve sought have not been consistent with my basic philosophy of the technology in question, as not receiving e-mail from unknown parties as a policy is a form of censorship. And CERTAINLY any legislative solution is also censorship — and we all know how the ‘Net deals with censorship.

Messaging technologies must ALWAYS make a choice: either open communication from unknowns is allowed by default — enabling opportunistic communication — or it is denied, either by fiat / heuristic or by requiring inconvenient point-of-contact user intervention — or inbound communications are restricted to some explicitly designated group of “already knowns.” At Activerse, in our Ding! p2p instant messaging product, we required explicit user approval after initial contact — but it was not without its drawbacks, namely the user intervention needed to permit communications after an initial contact. The better spam filters — the personal, “client” side or user-defined ones — today use exactly this mode of operation, but the domain of e-mail messages are too rich, the notion of personal identity online too week, and the filters themselves too dumb to make this effective.

So I have come to this conclusion: I will fight no more forever. Spammers, send me your lame-ass solicitations, your broken English, your poorly phrased and even more poorly conceived dreams of having me as your customer. My e-mail address is jbone@deepfile.com. Go for it. Waste your time sending me these things; waste your money buying my name as a statistically-insignificant part of some list you pay good money for. Do this and KNOW that I will never, ever, ever buy your freakin’ product. I’m not going to visit your porn site. I’m not going to waste the time — time you say will be well spent — reading your message. I will never, ever, ever respond in any way whatsoever to your crap. Get it? You’re wanking off by spewing your rhetoric at me.

Now here’s the deal: these low-life IQ80 scum-of-the-earth who send us this garbage in fact have *EVERY RIGHT TO DO SO* and *THERE’S NO TRULY FREE TECHNOLOGICAL SOLUTION TO THE PROBLEM.* But they thrive, they justify their existance on the miniscule percent-of-a-percent of the population — also IQ80? — that gives them any sort of feedback whatsoever. IF WE ALL just IGNORE them, entirely — if we can reduce that percent-of-a-percent by another order of magnitude or two — THEY WILL GO AWAY, forever. Eventually.

Seriously.

That’s my solution. Everybody just wise up, cool down, lose the outrage, and stop feeding these bottom-feeders even the notoriety and (economically justifiable) feedback they get even now… and they’ll die off.

RANT_OFF

Of course, that’s just my opinion, I could be wrong.

jb

Better solutions? Can technology solve what are fundamentally social problems?

Jeff Bone

AddThis Social Bookmark Button

XML-bashing seems to have become a semi-popular passtime of late… of the many critiques I’ve come across, this presentation is one of the best. Here are a few hopefully reasonable comments addressing the whole anti-XML sentiment that’s floating around and this critique in particular.

Up front… I’m not particularly an XML advocate; I’ve been involved in none of the XML specifications or working groups. However, while I share some of the same feelings that somehow XML is rather “grotty,” I’ve developed what I hope is a reasonable position on the matter.

Some of Aaron’s arguments are pretty good, but some rest on a few assumptions and philosophical positions that are, IMHO, erroneous.

What Technology Should (and Shouldn’t) Try to Achieve

All instances of technology have one meta-purpose: to accomplish or achieve some function, feature, or design requirement. That’s it. It’s not the job of a technology to be beautiful, aesthetically pleasing, etc. In fact, there’s *no such thing* as beautiful or aesthetically pleasing technology. We technologists are prone to thinking that technology can have these qualities because we “feel” that some technologies do; however, this is a trap — one that we technologists often fall into.

A technology should accomplish what it is designed to accomplish in a reasonably efficient manner, with minimal cognitive overhead. Let’s break that down: technology should be designed to accomplish some purpose. It should not attempt to accomplish things for which it is not explicitly and specifically designed. (A desire to make our artifacts very general is another pervasive trap that us techies fall into.) It should fulfill its design in a reasonably efficient manner; this means that it shouldn’t be obviously and grossly inefficient, i.e., some other design should not be able to fulfill the specific requirements in a significantly more efficient manner. Efficiency is a loosey-goosey term, but it can mean computational efficiency, storage or bandwidth efficiency, ease-of-use, or any combination of the above and other types of efficiency. Efficiency is likely to be domain-specific, so the statement needs be interpreted in the context of the particular application. Minimal cognitive overhead means that the simplest design that achieves the design requirements reasonably efficiently wins.

All other considerations about a technology are illusionary, insignificant, unimportant. A technology that does what it is supposed to in an efficient manner with no more than a minimum amount of complexity is “good enough” — and there’s no such thing as “better than good enough.” The mistaken belief that there is such a thing is what leads to a chronic and puzzling thing in the marketplace: technologies that are deemed “best of breed” (i.e., exceed their design requirements on somebody’s subjective quality assessment) *ALWAYS* underperform “inferior” but adequate technologies in the market. (NeXT, Beta, Objective C, the Mac, Be, Newton, etc. etc. etc.)

Aaron falls into this trap when he describes XML as “technologically terrible.” He’s expressing an aesthetic opinion dressed up as a technological argument. The question we should ask when evaluating XML (or any technology) is: “can something else do as good or better a job on the relevant / important dimensions?” The answer for XML *might* be “yes”, but in absence of any compelling evidence to the contrary it’s probably “no.”

“Does XML Suck?” Revisited

Aaron lists “verbosity” as one of the problems with XML. He’s not alone in that complaint. However, this criticism is off base for several reasons. First, it’s important to distinguish between XML-the-syntax and XML-the-datastructure. Complaints about syntax are, generally, pretty silly. Two otherwise equivalent syntaxes for something should be considered the same; and a range of techniques exist for reducing the verbosity of XML (including judicious use of namespaces and schemas, as well as things like binary and indexed representations.) Complaining about XML’s verbosity is a generalization from some certain bad examples of XML. And: some researchers at IBM Almaden about two years ago [lost the ref, anybody] showed that a reasonably efficient XML representation of something was IIRC same-order the size of the minimal representation / encoding of something that carried the same amount of structural and semantic information. That is, with appropriately efficient use of namespaces, etc. XML will be less than 10x the minimal compressed size of the same info given any non-lossy compression scheme. In my experience, differences of that order can largely be ignored in almost all systems. (It’s the O(n^2), O(n!) etc. stuff that we’ve got to worry about.)

Aaron also emphasizes that XML isn’t the most human-readable representation. But that assumes that XML is intended to be read / written by humans. It’s tempting to make that assumption, but IMHO it’s incorrect: we make that assumption because HTML is often read / written by humans. However, just as *more* HTML is created / processed by software than by people, so too (and even moreso) is more XML created / processed by software than people. The nice thing about both is that they *can* — when necessary — be processed by humans. XML represents a nice tradeoff between human readability and efficient machine representation.

Aaron’s arguments about complexity sound reasonable on the surface, however… Complexity is a tough thing to pin down. In computer science we have good (at least adequate) tools for analyzing and understanding computational complexity e.g. time-space tradeoffs, algorithmic complexity, etc. Information theory gives us some tools for dealing with information complexity… But we have very poor or no tools for analyzing and quantitatively addressing problems of dynamic complexity of component interactions in systems, representational complexity of data structures, expressivity of languages, etc. I’ve spent over a year trying to create a theory of the former (compositional complexity in software architectures) and let me tell you, complexity is a complex notion. ;-)

Representational complexity and expressivity is an even less studied and less understood area, and while Aaron may in fact be right his argument isn’t well supported. And there are hints from information theory — such as the size order of XML vs. theoretic optima — that indicate that it’s wrong.

If Aaron can state exactly what he means by “complexity” and quantify / generalize his argument, it would be significant not only as XML criticism but as an important result in computer science.

The “acronym proliferation” problem is very real, but it’s a function of where this technology is in its lifecycle and the amount of attention it has received. It’s not surprising that there’s a “fan-out” of overlapping applications / standards / etc. related to XML — it’s relatively early, very general, and lots of people are trying to do stuff. That leads to quite a bit of noise and frustration but — inevitably — there will be a “fan-in” to a few general, standard tools for various things. Aaron even recognizes that this is happening: “Even here, the situation is improving.”

The bottom line is this: it *might* be possible to design a similar representational mechanism that accomplishes all the things that XML accomplishes — i.e., multidimensional reference structures with arbitrary attribution and strong typing… But *today* there are no existence proofs of such alternatives and, indeed, I believe that if there were they would strongly resemble XML except in the trivial details. In the absence of proposals for such alternatives, it would seem that criticizing XML is a rather empty exercise.

And Aaron recognizes the most important argument *for* XML — its socioeconomic benefits. “Everybody’s doing it” is a very *good* argument for any technology; systems that can communicate through such a mechanism grow in value with the square of the number of components, per Metcalfe’s law. Anyone using other idiosyncratic technologies to accomplish some or all of the same things actually inhibit the overall growth of value of the system.

What do you think? Does XML suck? Is it horribly inefficient? Are there better alternatives that accomplish the same thing?

Micah Dubinko

AddThis Social Bookmark Button

Related link: http://dubinko.info/writing/xforms/

It’s hard to find much information on tools that authors use, especially technical book authors. Here’s one data point,

  • Paper is my first resource. I outline the whole book and individual chapters on regular blue line paper. Illustrations too begin their life on paper.
  • Zoot: I keep all kinds of electronic notes and Web clippings in a program called Zoot. I really like the ability to instanly search through scads of information.
  • SoftQuad XMetaL is one of the most useful pieces of software I’ve ever used. With the help of some DocBook lite CSS from Steve Muench, I was off and running.
  • Linux …but Windows XP, which came with my notebook computer, just wasn’t cutting it. Or rather, with far too many crashes and other annoying behaviors. So out it went. New computers these days don’t come with OS installation media, which makes dual booting rather difficult, so…
  • Win4Lin from Netraverse provides a complete Windows environment inside a…window. It’s quite breathtaking to see your first Blue Screen Of Death that goes away with a click on the window close button.
  • Wireless 802.11: like having a spare office.

That’s what I’m going with. I’d love to compare notes, so freel free to Talk Back. -m

What writing tools do you use? Drop a note here to compare.