July 2003 Archives

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://tbray.org/ongoing/When/200x/2003/07/29/SearchMeta

In his series of articles on search, Tim Bray explores the value of metadata but also its cost - noting that “There is no cheap metadata.”

Metadata’s value is often taken as a given at the markup conferences I tend to attend, especially as knowledge technologies - including things like RDF and Topic Maps - have become part of the markup mix. (To me, markup is very definitely metadata itself, but there are lots of levels of meta beyond that.) Given metadata, you can do all of these exciting things automatically, and costs will decline while productivity increases, and so on and so forth.

Bray’s piece explores some different kinds of metadata, contrasting the Yahoo way and the Google way, but the most interesting part to me is the question of where the metadata comes from. Yahoo’s approach involved human editors, while Google’s is developed by scraping existing information from sites to find patterns, but neither of these approaches is free. Distributing the cost of metadata by asking everyone to provide it also tends to annoy users pretty drastically.

I don’t think there are any easy answers here, but there are lots of good pieces. Making metadata work is going to take a lot more than metadata-management tools.

Brother, can you spare some metadata?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://lists.w3.org/Archives/Public/www-tag/2003Jul/0353.html

Tim Berners-Lee, the Director of the World Wide Web Consortium (W3C), suggests that “the web is no good unless it can be a sound foundation for the semantic web and web services too.” This seems horribly wrong.

Once, long ago, there was an understanding of how the Web was different from the Internet. Web designers wasted hours on lists educating people about the differences, noting that the Web was in fact a subset of the Internet, the subset you saw in a Web browser. The IETF and W3C seemed to partition their work along these lines as well.

Over time, this distinction has become pretty badly blurred. Mail clients grew into the browser as supposedly competitive advantage, while Web protocols found themselves reused for a variety of tunneling applications between computers. SOAP, XML-RPC, and related approaches all build on that strand of development. Warnings about problems with these approaches, even when applied directly, have largely been ignored.

Over the last few years, the W3C has also done its best to blur the distinction. While development of the “traditional Web”, notably in the HTML field, has slowed at both the consortium and at some key software organizations, SOAP-based “Web Services” have been the commercial rage while visions of a “Semantic Web” have charmed a smaller but thoroughly devoted group of developers.

While Web Services and the Semantic Web are popular in some quarters, their relationship to what I’ll call the Traditional Web is pretty distant. SOAP reuses HTTP in ways that are barely familiar to those who’ve worked with the protocol in Traditional Web contexts, while the Semantic Web reuses URLs (now rechristened URIs and topped with a strange dose of philosophy about identifiers) in ways which are barely familiar to developers in other contexts. Both Web Services and the Semantic Web cite the success of the Traditional Web as demonstration that their approaches work, but neither has the patience to work within the confines of the system they claim supports their work.

Berners-Lee’s latest claim seems to forget that a huge group of people finds the Traditional Web more than adequate for their needs. Some of those folks even do Web Services-like things with REST-based approaches that more closely follow the patterns laid down by the traditional Web. Some of them use XML (and even at times RDF) to exchange semantically-rich information between computers without needing the full power of the Semantic Web. The Traditional Web may be no good to the W3C’s director or its members any longer, but it’s still good for a lot of us.

(If that’s not enough for you, the Internet’s still wide-open for possibilities beyond the Web, of course!)

Is “the web… no good unless it can be a sound foundation for the semantic web and web services too”?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://simonstl.com/asdf/

I’d like to pause for a moment in tribute to a key innovation which made the Web possible, the 404 Not Found error.

Prior to the Web, there had been all kinds of adventures in hypertext, from Ted Nelson’s transclusions to path-based approaches to scripting to HyTime to my own trivial keyword-based work in HyperCard. While hypertext was really cool and really great stuff, it wasn’t going anywhere, in large part because the systems for building it were pretty onerous and frequently enormous. The desire (and sometimes obsession) to control the links made for some really thorny problems that grew rapidly with the number of documents and links.

The Web dropped all that pretense. Links were one-way, they weren’t centralized, and best of all, they could break. Make a bad link? The worst thing you’d get was an error message. Sure, users get frustrated by these things, but over time they’ve largely learned to adapt (or use archives if “it used to be there”).

By doing less, the Web did enormously more.

Anyone else appreciate the 404?

Bob DuCharme

AddThis Social Bookmark Button

In a May weblog posting, Tim Langemann claimed that I said that “hypertext has not advanced.” Like Langemann, I’ve described potential improvements I can picture in the future; I also enjoy researching and publicizing cool linking applications from the past, but I never said that hypertext has not advanced.

One might read this into a discussion of past hypertext achievements, though, because similar discussions often harp on how the modern web falls short of the systems developed back in the day. A typical complainer (and Ted Nelson is not alone here) is still waiting for the web to catch up to his original vision, unaware that while that vision contributed to the web’s progress, it was only The Vision for a small group of people, and other visions and new ideas were bound to contribute as well. The most interesting new things often come out of nowhere, instead of being the implementation of a detail of a grand vision. Who could have predicted the effect of webcrawler-aggregated content? Of Google’s page rank system? Of wikis? Of XML (well, who besides the SGML folk)? Or of XML-based standards such as XSLT and SVG? Of weblogging, and the capabilities contributed to it by trackback, IRC, and for that matter, relational databases?

When I discuss something that might improve the web, I promise never to get bitter if it doesn’t catch on. For example, it looks like there’s no critical mass of people who believe that link typing adds enough to the web to be worth the trouble. There are still plenty of other new ideas being tried, and plenty of impractical old ideas that Moore’s law may yet render practical. If a new idea improves the web experience for enough people, they’ll adopt it, and if not, they won’t.

It’s nice to see that even Microsoft marketing muscle can’t force the adoption of a perceived web need. Tell people what they want or need and, as with any other report of new technology, they’ll try it if it’s not too expensive and then either continue using it or move on. Instead of wringing hands and pointing fingers when they move on, I’d rather keep on the lookout for new reports.

What signs do you see of positive evolution of the web, hypertext, or linking?

Edd Dumbill

AddThis Social Bookmark Button

Related link: http://oscon2003.xmlhack.com/

I’ll shortly be heading off to Portland for the href="http://conferences.oreilly.com/oscon">O’Reilly Open Source
Convention. As an aid to myself and other attendees I’ve set up, together
with Dave Beckett, a publicly logged and “chumped” IRC channel on
irc.freenode.net, channel #oscon.

The chump bot produces a collaborative
weblog from IRC chat. We used one of these for the href="http://www2003.xmlhack.com/">WWW2003 community coverage site to great
effect — participants could share URLs and comments relevant to the talk in
real time as with any chat, but the results are preserved in a web site.

It’s my hope that anyone either finding or creating web pages relevant
to OSCON goings-on will drop into the channel and add their URLs to the
list. The site’s also available for syndication via RSS.

I’m sure there will be many ways people will use web technology to cover the
conference, and there’s probably not going to be “one true” IRC channel. I
offer this channel and service in the knowledge it’ll be handy for me
personally, and the more people who want to contribute, the better!

Useful links:

Bob DuCharme

AddThis Social Bookmark Button

In an earlier weblog entry, I bemoaned the lack of link typing out there. There are several link type taxonomies, but they’re like database schemas without databases: hardly anyone has actually put these taxonomies into practice, assigning the types to any realistic collection of links.

So I started assigning some link types to a bunch of links. Because weblogs, as a new class of content, have inspired new linking applications such as Technorati and Weblog Bookwatch, and because weblog entries often cite each other, it seemed like a good subset of the web to use. For an even more focused subset, I just went with O’Reilly Developer weblogs.

Assigning types to the links within my own weblog entries was easy; that’s what the HTML A element’s REL attribute is for. See the source of this weblog posting or my last few for examples.

To assign types to links created by other O’Reilly Developer webloggers, RDF was a no-brainer—if you can’t use it to add metadata to resources that can be identified by URLs, you can’t use it for anything. (And, the necessary RDF turned out to be remarkably simple and straightforward.) For the link type values, I used the link taxonomy from Randall Trigg’s 1983 Phd thesis, augmented with more types that I came up with myself to suit the world of weblogging: Blog Link Types, or BLT. I also threw in a few of the suggested values for the REL attribute.

Technorati.com and Weblog Bookwatch are only the beginning of potential new linking applications built around weblog content. Just imagine the possibilities if a large amount of the links in weblogs had link type indicators. When you know why links were created, and can look for patterns in those motivations, all kinds of interesting information can emerge. For example, according to Weblog Bookwatch, Ann Coulter’s “Treason” is the most commonly mentioned book after the new Harry Potter. We can assume that many people admire her book and others consider it badly-documented lies; wouldn’t it be nice to know exactly how many people liked her book (indicated with a link type value of “blt:Resource-good”), how many thought her arguments were simplistic (”tt:Pt-simplistic”), based on strawman arguments (”tt:A-strawman”), based on dubious data (”tt:D-dubious”), and so forth? (I use “tt” as a namespace prefix for Trigg’s types—he has quite a rich set of negative link types to choose from.) Wouldn’t it be great to see how those numbers change from week to week? Or to have an SVG-generated pie graph next to each book on Bookwatch showing the relative proportions of types assigned to all of the links to a given book?

So join me! Go to my Blog Link Types (blt) home page to learn more about what I’ve done so far. Then, add types to your own links. Add types to any links on the web that you want (to add out-of-line link typing entries to my RDF file, I have a form you can fill out), but particularly to weblog links, and especially O’Reilly Developer weblogs. Let me know the URLs of pages where you’ve added REL attributes, or of the URLs of RDF files if you’ve created new files of out-of-line link typing.

If enough are added to O’Reilly Developer weblogs, we’ll have the data to experiment with an interesting new class of linking applications.

Have you added types to any links following the Blog Link Types guidelines? Or is my whole idea a waste of time?