January 2004 Archives

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://simonstl.com/articles/sanity2/

Sane XML

Is XML driving you insane? It shouldn’t, and it doesn’t have to. Sanity is within reach, if you’re willing to discard a lot of junk and take a look at some tools that fit XML neatly.

Back in 1996, the XML effort began, focused on creating a subset of SGML that would be easier to work with, and finally allowing it to reach a much wider group of developers and users.

Today, in 2004, XML is far more tangled than SGML ever dreamed of being. Even if you include other SGML-related ISO projects, like HyTime, in the mix, XML has far outstripped its supposedly complex parent. In practice, most users focus only on a tiny subset of the capabilities that standards bodies and vendors have provided, but choosing a subset that doesn’t inflict major pain over the course of a project is still difficult.

Over three years ago, a group of developers in which I participated proposed “Common XML“, a subset of XML 1.0 and Namespaces in XML. We thought it trimmed the fat pretty reasonably and enhanced the interoperability that had been compromised by several design decisions in XML 1.0 itself. In practice, I think we got things mostly right, as developers who work with XML tend to stick to the parts whose use we encouraged, and seem to have gotten the message that some of the pieces we described as extensions may or may not work as expected across applications. (I don’t credit Common XML with making any changes; it just codified practices people have largely found on their own.)

Today, the XML landscape is far more complex, with specifications good and bad littering the computing world. One of the most bloated, W3C XML Schema, has dominated the tools world despite interoperability and complexity issues. Thanks in large part to early support from vendors, this collection of issues masquerading as a schema language continues to dominate the XML world - and in my opinion, makes the cost of using XML much higher for both vocabulary creators and consumers of those vocabularies. W3C XML Schema is only of a number of complicating specifications from the W3C, and the W3C is starting to find itself troubled by the additions it made to SGML.

Developers don’t need these headaches, though they may feel trapped by currently available tools, and many of them haven’t heard that there are in fact alternatives to W3C XML Schema. Using XML shouldn’t be a mind-binding experience, and it’s possible to discard most of W3C XML Schema and still get work done - even get more and better work done.

The key to this sanity is a strict focus on XML and XML documents. Stop pretending that these things have object hierarchies, and stop hacking around the conflicts between object hierarchies and document realities with broken tools like substitution groups and keys. Focus on the documents themselves and the structures you’d like to have in those documents, and there’s a chance you’ll produce documents that are a pleasure, rather than a burden, to work with. You can build schemas using this understanding of documents with RELAX NG, a schema language that describes document structures, not type structures abstracted on top of document structures.

I gave a presentation last week on how to use RELAX NG to create schemas which work with W3C XML Schema tools - you don’t have to give up compatibility with current tools to escape the complexity. There’s plenty of information at XML.com to get you started, as well as a new O’Reilly book that’s also available online.

Take a look at RELAX NG, and start using it where you can. Start by writing new schemas in RELAX NG, and convert them to W3C XML Schema later if you need to. Ask other developers for schemas in RELAX NG format. Even the W3C, purveyor of W3C XML Schema, has found RELAX NG to be useful.

XML was never meant to be complicated. You shouldn’t have to buy a continuous stream of books, even O’Reilly books, to get your work done using XML. (Given the state of the XML book market, it seems clear that the treadmill has exhausted people.) While you’ll undoubtedly still find data modeling a challenge, RELAX NG will let you focus on your information structures rather than on the intricacies of a bloated schema specification half-hidden by tools.

What else about XML makes you crazy?

David A. Chappell

AddThis Social Bookmark Button

Related link: http://www.ibm.com/developerworks/library/ws-resource/ws-notification.pdf

We jointly announced the WS-Notification spec yesterday – The co-authors include IBM, Akamai, HP, SAP, Sonic Software, The Globus Alliance, and TIBCO. The spec can be viewed at - http://www.ibm.com/developerworks/library/ws-resource/ws-notification.pdf

WS-Notification is part of the WS-Resource framework, also announced yesterday -
http://www-106.ibm.com/developerworks/webservices/library/ws-resource/

I would best describe WS-Notification as “Pub-sub for the Internet”. Its a Distributed broker-based pub-sub using web service interfaces. It doesn’t directly address QoS issues such as exactly-once delivery, although it is intended to be composable with WS-ReliableMessaging (In fact, its intended to be composable with the rest of the WS-* set of specs).

WS-Notification allows the use of a message broker, or a series of connected brokers, as intermediaries between producers and consumers. It Allows the use of proprietary MOM and bridging between proprietary MOM.

I encourage you all to go read the spec. Here are some of the interesting items of note -

- WS-Notification allows for a separate subscription broker that is able to subscribe on behalf of another subscriber.

- While a “Notification broker” is part of the spec, it allows for pub/sub to happen directly between the endpoints. The actual sending of the message works similarly. Publishers can send directly to other subscribers, or can send to a Notification Broker and let the Notification Broker take care of getting it to where it needs to go.

- WS-Notification supports “demand-based publishing” - Publishers can register with a Notification Broker and inform the broker about the list of topics it intends to publish on. This can be used at least a couple of ways – to authenticate the publisher and apply access control lists to limit the publisher to particular topics. It can also be used as a way for a Notification Broker to be able to manage the topic namespace such that it can apply optimizations such as suspending the publisher when there are no subscribers currently active.

- The publisher need not be a web service. The publisher may act through a Notification producer which is Web service savvy.

Publishing and Subscribing:

WS-Notification supports a hierarchical tree-based topic space (much like SonicMQ). A subscriber can subscribe to entire branches at any level. Hierarchical topic trees allow subscribing to a Topic space that can include any branch in the tree using an Xpath-like “TopicPathExpression” such as “tns:t1/t2 | tns:t4/t5”. Security permissions may also be applied using the same mechanism.

- Publishers and Subscribers are referenced using WS-Addressing Endpoints.

- Message selectors are also supported using an Xpath-like notation. Message selectors allow a subscriber to filter a message based on a criteria.
- There is also the notion of a “precondition”, which is sort of like a “publish when…” a certain condition occurs.

Subscriptions may have a InitialTerminationTime, which governs how long the subscription is valid for. The whole lifecycle of resources is governed by a companion spec, WS-ResourceLifeTime. http://www.ibm.com/developerworks/library/ws-resource/ws-resourcelifetime.pdf

Dave

Micah Dubinko

AddThis Social Bookmark Button

Related link: http://xformsinstitute.com

Several folks have asked me for a gentler tutorial to W3C XForms. Several other folks have asked to see some examples of “real-world” XForms. Here are both, together on one fun site.

The site has what you’d expect from a tutorial: progressive lessons, each building upon the last. It also has interactive quizzes, written without script in XForms. These run fine in nearly any browser, thanks to a remarkable Flash program called DENG, the Desktop Engine.

In a mere 120k of SWF files, this small applet implements a huge swipe of XForms, XHTML, and CSS level 3. Each live example includes a “View Source” link so that you can see how it works in the full context of a complete document.

Please link to XFI.

This isn’t the end–there’s more on the way too. -m

David A. Chappell

AddThis Social Bookmark Button

Related link: http://www.davidchappell.com/blog

The “other” David Chappell has started blogging. The first blog entry he posted was to talk about the David Chappell naming collision. I first wrote about that phenomenon here. It will be interesting to see how this develops in the coming months, as it seems he will continue to focus on all things MS, including BizTalk and Indigo, and I will be focusing on heterogeneous integration and the emerging Enterprise Service Bus technology category.

Dave

Bob DuCharme

AddThis Social Bookmark Button

Related link: http://www.snee.com/addids

(The following is the introduction from the web page at the URL shown above; see the web page for information on how to use the CGI that does this.)

In the early days of the web, you could only link to a specific
point within a web page if that point had an a element with a
name attribute. Recent releases of the Mozilla, Internet
Explorer, and Opera web browsers, however, let you link to any element that has
an id attribute. (More on this in a weblog
posting
I did.) Hopefully, more and more web development tools will
start adding id attributes to more block elements; I’m
trying to get into the habit of doing it to everything I
write.

Meanwhile, I’ve written a CGI script named addids.cgi (”add IDs”) that creates a temporary
copy of any web page you pass to it, with IDs added to block elements
so that you can create links to any block element you like in that
temporary copy. For a web page that doesn’t change much (not, for
example, the home page of a newspaper’s web site), nearly all
generated IDs will be the same every time a temporary copy is
generated. This means that you can look at a copy created by
addids.cgi, create a URL that links to a specific point within that
copy, and send that link to someone else with reasonable confidence
that it will show them the same point in the document.

A few random tests show that it works with some slick commercial sites (I linked to stories in the archives so that the examples would last longer): The BBC (“The varying hotel guests in each episode…”) , Rolling Stone (“On 1971’s Gets Next to You…” ) and a Vignette Storyserver-generated Time Magazine article (“Ethiopia: Tackling terror in East Africa.” Scroll up for slickness.) For a layout so complex that the CGI messes it up (for example, Wired) there may be a “Print” version of the same story that’s easier to link to (”Paper modeling reached the zenith…”). I found that it doesn’t always work properly with IE 6.0 under Windows, but it seems to work fine with Firebird .7, Mozilla 1.5, and Opera 6.1 under Windows and IE 5, IE 5.1, Safari 1.0 under OS X.

How did it work for you?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.princeton.edu/~mhindman/googlearchy–hindman.pdf

A few days ago, Len Bullard posted a link to “Googlearchy”: How a Few Heavily-Linked Sites Dominate Politics on the Web. After reading it, it occurred to me that the situation they describe is precisely one I’m trying to escape.

It’s not a new story, really - it’s yet another case where network effects create winners and losers. I’ve been writing about network effects since 1991, when I was researching an article on the failure of the Susan B. Anthony that eventually ran in The Journal of Money, Credit, and Banking. John Caskey, the professor I was working with, suggested that network effects might be a good explanation of why it’s so difficult to introduce new coins unless you pull the bill it replaces.

Network effects have come under fire when they sound like they undermine a free market and praised to the skies when the benefit adventures like the Internet and the Web. They’re relevant to standards development, the interpretation of monopolies, and a wide variety of situations which start out as an open field but eventually become much less open.

The distributions which take place as patterns settle in have been discussed before in the context of weblogs, in Clay Shirky’s Power Laws, Weblogs, and Inequality. I think Clay is right about what happens in such systems but wrong to shrug his shoulders and say “There just weren’t enough blogs to have really unequal distributions. Now there are.” The writers of Googlearchy don’t quite shrug their shoulders; they’re researchers pointing out that this field is worth studying, and that naive rhetoric about technology improving communications to create a more level playing field for different perspectives needs to be reconsidered. My favorite quote:

But whether this result is surprising or not, it is clear that in some ways the Web functions quite similarly to traditional media. Yes, almost anyone can put up a political Web site. But our research suggests that this is usually the online equivalent of hosting a talk show on public access television at 3:30 in the morning.

So how can people escape this dire world? Both Shirky and the Googlearchy researchers seem to recognize that community scale is an important factor. As people know less and less about each other, they tend to forge connections through people they do know, and those people effectively gain the ability to serve as a nexus. On the Web, that means links. As Shirky notes:

people’s choices do affect one another. If we assume that any blog chosen by one user is more likely, by even a fractional amount, to be chosen by another user, the system changes dramatically. Alice, the first user, chooses her blogs unaffected by anyone else, but Bob has a slightly higher chance of liking Alice’s blogs than the others. When Bob is done, any blog that both he and Alice like has a higher chance of being picked by Carmen, and so on, with a small number of blogs becoming increasingly likely to be chosen in the future because they were chosen in the past.

Think of this positive feedback as a preference premium.

The Googlearchy folks do note some categories of exceptions to this pattern, and this may suggest a direction for those of us who would like to contribute despite coming late to the race:

This study suggests that
communities where most sites have substantial numbers of inlinks are the exception, not
the rule. The communities that have previously been studied at length - public companies,
universities, newspapers - are all unusual, in that they represent groups in which there is
a high degree of mutual recognition among the actors (Pennock et al. (2002)).

These are all communities with a relatively small number of members, a sense of equality fostered by there only being a limited number of choices.

While the enormous bazaar of the web, globalization, and all these other delightful 21st century visions of glory may sound exciting, I think it’s time to recognize that we’re losing things people took for granted when there were fewer people in a larger world. Lots of small fish in big ponds growing more and more alienated from each other and from their own possibilities seems like a perverse result for something we’ve called progress.

That said, I think there’s still room to escape this fate, though maybe it has its own downside. “Groups in which there is a high degree of mutual recognition among the actors” are generally pleasant places to be.

Small groups have different dynamics from large groups, and frankly, those dynamics are often less alienating, even when they’re hostile. Geographic proximity, tying ourselves back to a sense of place rather than joining vast and unknowable communities, has its appeal. Similar interests can also bring people together, and I have to say that I’ve found some of the most interesting conversations on sites where people discussed matters of deep concern to them - but “them” was a small cluster of people. (I think most geeks have an unusual interest or two and may recognize the phenomenon.)

I’m not sure that the Googlearchy and the celebrity culture that power laws seem to propel are necessarily that important. They’ll be there, and they’ll have influence, but I can think of lots more exciting things to do, even on the Web, than read A-list bloggers I’ve never met.

Ever find a smaller community that feels comfortable? Start one?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://simonstl.com/dryden/

I suspect the world has enough blogs where people comment on national and international politics, and know there are an ever-growing number of blogs on people’s own personal lives. Something in between those poles seems to me to be missing, though - blogs about particular places. Two months ago, partly to see if it could work, I started one focusing squarely on Dryden, New York.

Dryden contains 96 square miles and had 13,352 people at the last census. It’s in upstate New York, next to Ithaca, home of Cornell University and Ithaca College, but far from any large cities. Syracuse is an hour away, Rochester two hours, New York, four and a half. It doesn’t have a daily paper (Ithaca’s paper covers it), and it’s not a place that generates a tremendous amount of news. Three or four stories a week is typical so far.

A blog about Dryden has a naturally limited audience, but at the same time, the people who are in that audience likely have a thorough knowledge of the place. They drive its roads, pay its taxes, and hear its stories. Because of Cornell, there’s a large population just passing through, but even some of those people are likely interested in figuring out where they are at the moment.

The blog I started has a definite political angle (”One Democrat’s perspective”), and I started it after an election that didn’t go the way I’d hoped, but I don’t think there’s any reason that focusing a blog locally should condemn it to being less opinionated than blogs which look out on a larger world. Local politics is tricky, though - simple platitudes about “those who deserve work will find it” or “everyone deserves to get a good start in life” are hard to sustain when you’re writing at this level. People don’t necessarily know everyone, but alliances shift, ideology is frequently less important than communications, and the flow of news is irregular at best, making it hard to pick and choose stories.

It’s been difficult staying inside the town borders, and I’ve occasionally strayed elsewhere in the county when it seemed relevant, though I’ve tried hard not to discuss issues outside of Dryden unless they had a direct impact here. “Think global, but stick to local” might well be the motto for this kind of blogging.

I don’t think I’m likely to run out of material, though I’ve certainly had to rely on photos and out-of-copyright history for a lot of stories. I’m making sure I post at least once a day, and often end up posting two or more. 120 entries in 60 days feels like a promising start.

I’m not sure this kind of blogging will catch on too widely, but it seems like an opportunity, especially in places where news is quiet, government not widely reported, and people think there isn’t much going on. There’s always something percolating.

Know of any other blogs focused on a particular place?

Advertisement