April 2003 Archives

Simon St. Laurent

AddThis Social Bookmark Button

Marshall McLuhan published Understanding Media in 1964, back when the foundations of most of our current information processing systems were still developing. While McLuhan’s discussion of media per se may not seem relevant to the dull work of information management (as opposed to the glossy hype of Wired, which adopted him as patron saint), some of McLuhan’s fundamental insights apply as strongly to relational databases and XML as they do to television or the Web.

In Understanding Media, McLuhan first explored a general theory of media and then explored speech, writing, print, the telegraph, the photograph, the phonograph, and much more, including things like roads, housing, and automation. Perhaps the best summary of what he was getting at comes from the Introduction to the Second Edition:

Environments are not passive wrappings but active processes.

In computing, we generally regard environments as active processes we use to get things done, but we don’t often look at the impact that the environment has on the way we think. People recognize, for instance, that there are real transition costs in moving from Perl to Java to Lisp, but people who remain immersed in one environment don’t always recognize how the lessons they learn in that environment may be completely inappropriate in other environments. The paths, the priorities, and the best practices emerge from a combination of design and discovery that are particular to each language.

In my experience, these differences are even less recognized on the data side of computing, which most developers seem to regard as passive storage. Typically, convenience rules the choices here, with developers either working from legacy data or building systems using tools they already understand or have paid for. “It’s all just data” is a fairly common expression, and a lot of developers see the code they write rather than the form of the data as the important part of the puzzle.

Developers who look at information this way make the same mistake made by people who think newspapers and television both deliver news, so what does it matter? To some degree, you can get the same information from different media sources, but no one expects television to be a reading of newspaper stories or the newspaper to be a transcript of the nightly news on TV. Both are containers for information, but the shape of the container inevitably affects the way the information is both produced and consumed. Sophisticated consumers and media moguls both typically understand this. The consumers try to get information from multiple sources and compare different media, while moguls have spent the last decade building business empires that span different media to reach different customers with similar (advertising and more) messages.

While the developer’s view of information politics is usually more local and better understood than mass media politics and the FCC, the differences between media persist. Relational databases are all about linked tables and structured atomic data and the possibilities that opens. Object stores and serializations are generally about flexible hierarchies, with relatively direct linkages to to particular processing expectations. XML is about labeled hierarchically-structured containers, with a general separation between content and processing expectations. (I’m using XML here generically for both XML documents and collections of XML documents.) RDF is about directed graphs, keeping away specific processing expectations regarding their content but with a well-defined general model for manipulating the graphs. Plain text, of course, offers a sequence which may or may not contain identifiable repetitive structures.

Perhaps the most important thing to recognize about all of these forms is that they are different. There are, of course, cases where relationally-modeled information can be represented as objects, XML or RDF, and there are cases where object stores or RDF triple stores use relational databases as back-ends, but these all involve tinkering and compromises. There is no general way for an XML document to serve as an efficient foundation for relational queries, nor is RDF much good at modeling XML’s mixed content. While it may be convenient in some cases to serialize objects to XML, it requires lots of metadata if the object needs to be reconstituted in the same form, and the XML produced by serializations often looks alien to people who actually care to work with XML itself.

At the same time, these different approaches do particular tasks very well. The relational model allows the efficient processing of vast quantities of information stored in unordered rows in related tables. Object stores let developers put objects somewhere without having to spend time creating pathways between their existing model and a different model. XML comes from the document world, and most of its functionality is aimed at creating labeled structured content that both humans and computers can process. RDF is about assertions and how they combine to make statements, and while humans frequently have a hard time making sense of URI chains, some programmers find they solve classification and other problems easily.

XML currently carries the unfortunate burden of being the medium the other forms think they understand. Object serializations in particular produce an enormous amount of lousy markup. Technically, it’s XML, but its creators plainly cared about their program and not much about XML, or how anyone else might want to process the XML they create. Relational database folks have faced the same problem for years, as developers find all kinds of strange structures in databases that reflect the needs of a particular program rather than a vaguely sane normalization of data according to relational best practices. (At least relational databases and XML share a notion of named containers for data, though how they work with those containers is very different!) RDF creates similar problems for XML, as lately there’s been a flurry of proposals for ‘fixing’ XML with RDF tools and structures. RDF’s own XML syntax isn’t widely beloved either, but as long as you never look at the XML…

I don’t think I invented it, but I’ve long described a difference between ’square’ and ‘groovy’ data. ‘Square’ data is easily atomized and tabulated, a perfect fit for relational models. ‘Groovy’ data is information that doesn’t neatly fit in a box, typically information created by and for human users directly. XML fits that kind of data very well, with its tolerance for arbitrarily recursive structures, reuse of content through inclusion, and structures flexible enough to mix raw text nodes with containers. RDF feels like ‘puzzle’ data to me, interlocking pieces which form larger pictures when assembled. Object stores are kind of a combination of all three of these, with the strong demands for structure common to relational work, the hierarchy of XML (though with multiple and different kinds of hierarchy), and a massive dose of RDF’s interconnectedness. I don’t quite know what to call that, as it both subsets and supersets the other categories.

I think it’s time for developers to take a closer look at how they’re storing data, and what that means for the data and for other developers. We seem to have moved into an age where modeling information too tightly against a particular set of processing expectations incurs significant costs, and it’s time to start thinking about what media fits our information best rather than what we want to do with the information today.

Diversity has its costs, but recognizing the positive contributions these different media can make should lower costs over the long term. Use relational databases where they’re appropriate, XML where it’s appropriate, RDF where it’s appropriate, and strictly object approaches where they are appropriate. There’s no need to put everything in a single model, despite the claims of both one-model purists and vendors trying to solve everyone’s problems. It may take some learning and some looking around at different models, but the upfront costs should avoid painful legacy disasters later.

Why can’t data all be the same?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.taunton.com/finewoodworking/pages/w00138.asp

Wayne Miller, of the now-closed Badger Pond forum, explains how an experimental bulletin board grew into a community.

Fine Woodworking magazine isn’t a typical stop in techie bookmark lists, but today there’s an interview that explores the process of building communities online. Badger Pond started out as an experiment, remained controversial to some extent throughout its existence, and finally shut down when its moderator moved on, but during that time it managed to build a genuine community of woodworkers.

Probably the most interesting, though most difficult, aspect of Badger Pond was its intensive moderation:

My belief was that a community could only evolve if the traditional subjects that tend to polarize communities (and countries for that matter) were not allowed to be discussed. It seems to me that you can bring a community (or country) together in one of two ways: create a common enemy to hate, or stress a shared interest. I chose the latter. One of the kindest characterizations of my role as moderator was that I was a benign dictator. I agree with that assessment. That was my function.

For me, Badger Pond was both a place to learn and a place to enjoy. Personal attacks weren’t permitted and (despite after-the-fact moderation) rarely happened. Wayne kept politics and religion off the radar, though both were plainly and largely politely in evidence at the “Ponder Picnics” I attended.

The latest picnic, a few weeks ago in Anderson, IN, brought together about 75 people from all over the country. I drove 13 hours with two other people from upstate NY, while others drove from as far as Oklahoma and Arkansas. A couple of people even flew in, from Washington state and Florida.

Not a bad result for an experiment in community building, especially one that’s now over.

Can benign dictators help communities grow?

Timothy Appnel

AddThis Social Bookmark Button

We’re into the third and final day at the Emerging Technology Conference, but here is a quick recap of some of the happenings yesterday.

Alan Kay started with a fascinating history lesson of computing where he asserted the last 20 years of computing has been boring. Kevin Lynch took the stage and demonstrated Macromedia's Central product — a stand-alone runtime and application repository. Most impressive was the notion/ability to send information (even broadcast it) between applications using XML or RDF that where even if separately developed. The last keynote of the day was given by the sagely Clay Shirky. (All hail Clay!) Clay took us through a fascinating overview of social software and behavior drawing from psychology through the past century and accounts of early social software systems. He concludes that groups needs to defend itself from group. The users are there for one another. The design build pattern is build for a handful of users. Design for a platform.

Todays keynotes have started off with Felipe Cabrera on the future of Web services — a sometime contentious, but lighthearted and often entertaining, discussion of complexity versus simplicity and interoperability. Craig Silverstein is presenting the Google way and how they operate and more importantly communicate to be as successful as they have been. Later Eric Drexler will discuss Nanotechnology: Bringing Digital Control to Matter.

On a personal note, while I'm running on adrenalin, I'm a bit sad that its almost over — until next year. It has been really fun though at times a bit overwhelming and perhaps surreal to have finally met dozens of people that I've come to know over the past year or more without having spoken or met in person. (A phenomena for Clay to study?) We need more meatspace meetings like these. Certainly I will be counting the days till next year.

I'll be continuing to do more of my on-the-scene weblogging as are others as are others.

Timothy Appnel

AddThis Social Bookmark Button

Exhilarated. Frantic. Enlightened. Engaged. Over-caffinated. Spent. I have to sheepishly admit to being a kid in a candy store here at the ETech conference. And, as Ben Hammersley put it, its iBooks as far as the eye can see. (I'm carry a PowerBook and so is Ben so I don't know what he's talking about.)

I'm doing on-the-scene weblogging of the proceeding here. Many others are too. Today was an exciting day which included keynotes from Howard Rheingold, Eric Bonabeau and a panel on Digital Rights Management which included Cory Doctrow, Dan Gillmor, Bunnie Huang, Joe Kraus and Wendy Seltzer. Many other interesting conference sessions and many water cooler (literally) conversations in the halls. Capping of the day was a book signing event with an all-star cast of luminaries on hand. Great fun.

Tomorrow promises to be equally if not more exciting with keynotes from Alan Kay, Kevin Lynch and Clay Shirky. Stay tuned.

Bob DuCharme

AddThis Social Bookmark Button

Link typing is the assignment of a link to a particular category in order to give a human or automated reader a clue about the implications of traversing that particular link. It would be more accurately termed “link categorization,” because it has very little to do with the computer science notion of data typing: the assignment of a type to data to identify the set of operations that can be performed on that data (integers can be added, subtracted, multiplied, divided; strings can’t, but can be concatenated, have substrings extracted, etc.).

Link typing clearly adds value to a link, and anyone discussing it agrees that it’s a Good Thing. Several link type taxonomies have been proposed, but no one I know of actually uses them for anything. In fact, none of the taxonomies I know of have improved on the one described twenty years ago by Randall Trigg in chapter four of his University of Maryland Ph.D. dissertation.

I have my own ideas about problems with various link taxonomies out there, such as the the values proposed for the XHTML 2.0 a element’s rel attribute, but for now I’d like to know if anyone else has done anything besides proposing new taxonomies since Trigg’s thesis. Do you know of any collection of links that actually have link types assigned to them? Even HTML with rel attributes on the a elements? (My limited research into that was not encouraging.) Has a generalized link taxonomy ever been proven useful, or are more application-specific ones such as the History and Treatment labels used in court case citations the only practical applications? Do you know of any any specialized ones besides court case citations?

Please post a comment here or e-mail me at bob@snee.com to let me know. And meanwhile, check out Trigg’s taxonomy.

What kind of link typing applications do you know of?

Bob DuCharme

AddThis Social Bookmark Button

(An early draft of this was originally posted to the xml-hypertext list in response to Dave Pawson’s question “What is the difference between hypertext and rss?”, but this is more of a bully pulpit so I’m putting a revision of it here.)

Hypertext is is a way to present links and RSS is a way to mark up a
class of links for a particular domain.

People can argue about the definition of “hypertext,” but XLink’s
definition of “hyperlink” works for me: “a link that is intended primarily
for presentation to a human user.” The word “presentation” is key; I think we can also assume that an interactive, electronic medium is what puts the “hyper” in “hyperlink.” A printed footnote reference is a link, not a hyperlink.

RSS is a way to describe links for a particular domain: relationships
between a resource (or subresource–in this case, a specific element storing the
title of a story) and another resource (the web page storing the complete story) in
the world of news. Everyone’s free to call anything they want “news” if it can fit into the RSS structure, so this has been a boon to weblogs.

If we distinguish between link presentation systems (for example, hypertext, endnotes,
and sidebars) and classes of links specialized for particular domains (such as RSS, legal
case citations, and bibliographic references) we see a many-to-many relationship,
which is a very good thing. People can mix and match link presentation styles
with link domain classes. Hardcoding presentation styles to link domains loses
this flexibility. If I ship all header-story links, case citations, and
bibliographic references marked up <a href=”http://whatever”>like this</a>,
I lose that flexibility–what if I want to represent bibliographic reference
links differently from case citation links?

RSS links aren’t inherently hypertext links; that’s only one justifiably
popular way to present them to the user. Thinking of them as “links” and not
just “hypertext links” opens up the possibilities of what you can do with
them. For example, you could write a stylesheet that reads an RSS file and
writes each header, description, and linked story together in a XSL-FO file
for printing as a hot sheet report. The “link” isn’t actively traversed by the
reader, but the link’s existence still benefits the reader, because a
production step traverses the link for the reader. Maybe I sound like an old SGML hack, but I still believe that separating presentation information from content increases the value of the content because it lets you do more with that content–even when the “content” is data about link relationships.

What do you think?

Bob DuCharme

AddThis Social Bookmark Button

I’ll be using this weblog to discuss issues about linking that I’ve been researching. What do I mean by “linking”? I’ll start with two definitions of linking that I like:

  • The Dexter Model: “Links are entities that represent relations between other components.” (”Components correspond to what is typically thought of as nodes in a hypertext network.”)
  • XLink: “An XLink link is an explicit relationship between resources or portions of resources.” (”As discussed in IETF RFC 2396, a resource is any addressable unit of information or service.”)

For my own purposes, I’ll generalize from these two definitions a bit to this: the identification of related information using an addressing system that lets the reader locate the information—for example, Genesis 1:20, Hamlet III i, 163 U.S. 537, Mencius 5B 1:6, or http://www.weather.com/outlook/travel/weather/tenday/LAX.

You may have a different definition, particularly if you’re talking about sausages, metal chains, the Mod Squad, secret chimps or even hypertext. Hypertext is an application of links that makes certain assumptions about their role and especially user interface, and it’s what got a lot of people thinking about linking in the first place, but I plan to think in more general terms. In fact, untangling the relationship between hypertext and linking will be a recurring theme here.

Take a look at some of my linking-related work to get an idea of how I got where I am.

What do you think?

David A. Chappell

AddThis Social Bookmark Button

Related link: http://www.sys-con.com/webservices/articlenews.cfm?id=530

How many Reliable SOAP specs do we need? I have read them all, and I have come to the conclusion that they all seem different on the surface, but they are all pretty much the same underneath. Unfortunately, its the poor users who suffer from this fracture in the standards wars.

I have posted an objective, detailed article on the
WSJ site

I stress the word “objective” because I happen to be one of the authors of WS-Reliability, but I also think there are some pretty cool things in WS-ReliableMessaging.

Dave

Your comments are always welcome

AddThis Social Bookmark Button

When designing an architectural style from scratch, one always begins with an implicit style Roy Fielding termed the null style. While it is certainly possible to build software systems that conform to this style, it has the ignominious distinction that every system built with only it in mind, will literally be “good for nothing”; that such a system will possess no properties that would make it a good form of solution for any particular problem. It hopefully goes without saying that no amount of hype will save a null-style software system.

A few weeks ago, I asked a question of my weblog readers (note; my primary weblog is still down as I change ISPs - stay tuned) about whether they felt that it was possible to make a mistake when designing a software system that would guarantee its failure. I received two responses, both of which stated, no, it wasn’t possible to make such a mistake. I’ll describe here why I believe that to be incorrect.

Sometimes, an architectural property is an absolute necessity for a system to possess in large quantities for the environment in which it is to be used. As a rather trivial example, it should probably not come as a surprise to learn that a high degree of scalability is required of systems that operate on a large scale, for example with billions of components distributed across a network. Should a design mistake be made, either by neglecting to adopt a constraint which induces large degrees of scalability (and for which there exists no other constraint which can induce a similar degree of it), or by selecting a constraint which severely reduces scalability, systems conforming to such an architectural style will fail to scale, and therefore, as objectively as one can determine, be a failure. Again, no amount of hype will prevent this from happening.

As a more concrete example, let’s look at the visibility property. It is a property which all successful systems deployed on the Internet that I’m familiar with, exhibit in spades. The bulk of the visibility exhibited by each of these systems derives from their use of a coordination language, where each new software component added to the system uses the same coordination language as every other component. This permits intermediary components, such as firewalls, to understand the interactions as well as the sender and server components, and explains why those firewalls are able to do their job. Web services are not built with a coordination language, and their visibility is greatly reduced because of it, removing the ability of firewalls to do their job. Once again, no amount of hype will replace this lost visibility.

So while hype is almost always a positive force that can aid in acceleratating adoption, it is not, unfortunately, an architectural constraint, and therefore cannot induce any useful architectural properties. It is not a sufficient condition for success.

Hopefully this explains the two main reasons why I’ve taken such a strong position against the current approach to Web services. First, bad design cannot be masked. And second, that Web services’ lack of use of a coordination language is an example of bad design.

What say ye?

Micah Dubinko

AddThis Social Bookmark Button

Related link: http://news.com.com/2100-1012-995045.html

According to News.com, Microsoft is releasing InfoPath only with Office 2003 Professional Enterprise version–which is only available on the volume licensing program.

In a recent survey, 72% of Microsoft customers declined to join the volume licensing program. As a result, only a the largest, most heavily Microsoft-invested companies will be able to run the InfoPath “thick client” on their desktops.

Contrast this with XForms, of which eWeek just said:

there’s a natural synergy between the equally interoperable HTML and XML that InfoPath misses. Those looking for another approach should point their browsers at XForms.

XForms also has multiple free implementations, on devices as small as phones and as large as servers.

For the time being, rather than “embracing and extending”, Microsoft appears to be deliberately distancing themselves from XForms, which seems to be the most comfortable position for both sides.

Share your thoughts on InfoPath and XForms.

Timothy Appnel

AddThis Social Bookmark Button

ECMA announced (with an online PDF?!?) it is completing what it calls E4X (ECMAScript for XML). The goal of this extension is to standardize the syntax and semantics of a general-purpose, cross-platform, vendor-neutral set of programming language extensions adding native XML support in ECMAScript. John Schneider's article on BEA's dev2dev site that illustrates the concept and value of native XML scripting in detail.

This is really exciting stuff that should contribute significantly to the development of more lightweight and fluid Internet applications that can take full advantage emerging spectrum of Web services in a rapid and efficient manner.

This development reflects my thinking in how Sun should (but probably won't) simplify Java development and why Flash/SWF is on track to achieve a great deal of success in developing Internet applications. Insights from Adam Bosworth, Jon Udell and Ward Cunningham have reinforced and contributed to my views further.

Incidentally Adam Bosworth was instrumental in driving the E4X effort. It is no surprise that BEA is the first to implement it in a product given the fact Bosworth is their Vice President of Engineering and public face for technologists.

What do you think of native XML scripting in ECMAScript?

Advertisement