May 2003 Archives

Bob DuCharme

AddThis Social Bookmark Button

Works about linking often claim that it’s been around for thousands of years, and then they give examples that are no more than a few centuries old. I can only find one reference to something more than a thousand years old that qualifies as a link: Peter Stein’s 1966 work “Regulae Iuris: from Juristic Rules to Legal Maxims” describes some late fifth-century lecture notes on a commentary by the legal scholar Ulpian. The notes mention that confirmation of a particular point can be found in the Regulae (”Rules”) of the third-century Roman jurist (and student of Ulpian) Modestinus, “seventeen regulae from the end, in the regula beginning ‘Dotis’…”. The citation’s explicit identification of the point in the cited work where the material could be found makes it the earliest link that I know of.

Other than Stein’s tantalizing example, all of my research points to the 12th century as the beginning of linking. In a 1938 work on the medieval scholars of Bologna, Italy, who studied what remained of ancient Roman law, Hermann Kantorowicz wrote that in “the eleventh century…titles of law books are cited without indicating the passage, books of the Code are numbered, and the name of the law book is considered a sufficient reference.” He uses this to build his argument that that a particular work described in his essay is from the eleventh century and not the twelfth, as other scholars had argued. Apparently, it was common knowledge in Kantoriwicz’s field that twelfth century Bolognese scholars would reference a written law using the name of the law book, the rubric heading, and the first few words of the law itself. (Referencing of particular chapters and sections by their first few words was common at the time; the use of chapter, section, and page numbers didn’t begin until the following century.)

Italian legal scholars trying to organize and make sense of the massive amounts of accumulated Roman law contributed a great deal to the mechanics of the cross-referencing that provide many of the earliest examples of linking. The medievalist husband and wife team Richard and Mary Rouse also found some in their research into evolving scholarship techniques in the great universities of England and France (that is, Oxford, Cambridge, and the Sorbonne) and they described Gilbert of Poitiers’s innovative twelfth-century mechanism for addressing specific parts of his work on the psalms: he added a selection of Greek letters and other symbols down the side of each page to identify concepts such as the Penitential Psalms or the Passion and Resurrection. If you found the symbol for the Passion and Resurrection in the margin of Psalm 2 with a little 8 next to it (actually, a little “viii”—they weren’t using Arabic numerals quite yet), it would tell you that the next discussion of this concept appeared in Psalm 8. Once you found the same symbol on one of the eighth psalm’s pages, you might find a little “xii” with it to show that the next discussion of the same concept was in Psalm 12. This addressing system made it possible for someone preparing a sermon on the Passion and Resurrection to easily find the relevant material in the Psalms. (In fact, aids to sermon preparation was one of the main forces in the development of new research tools, as clergymen were encouraged to go out and compete with the burgeoning heretic movements for the hearts and minds of the people.)

The use of information addressing systems really got rolling in the thirteenth-century English and French universities, as scholarly monks developed concordances, subject indexes, and page numbers for both Christian religious works and the classic ancient Greek works that they learned about from their contact with the Arabic world. In fact, this is where Arabic numbers start to appear in Europe; page numbering was one of the early drivers for its adoption.

Quoting of one work by another was certainly around long before the twelfth century, but if an author doesn’t identify an address for his source, his reference can’t be traversed, so it’s not really a link. Before the twelfth century, religious works had a long tradition of quoting and discussing other works, but in many traditions (for example, Islam, Theravada Buddhism, and Vedic Hinduism) memorization of complete religious works was so common that telling someone where to look within a work was unnecessary. If one Muslim scholar said to another “In the words of the Prophet…” he didn’t need to name the sura of the Qur’an that the quoted words came from; he could assume that his listener already knew. Describing such allusions as “links” adds heft to claims that linking is thousands of years old, but a link that doesn’t provide an address for its destination can’t be traversed, and a link that can’t be traversed isn’t much of a link. And, such claims diminish the tremendous achievements of the 12th-century scholars who developed new techniques to navigate the accumulating amounts of recorded information they were studying.

Is there something earlier that I’m missing? Where can I find out more about it?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.artima.com/intv/xmlapis3.html

Near the end of the first part of his “What’s wrong with XML APIs” interview, Elliotte Rusty Harold makes it beautifully clear why developers used to wearing object and database “glasses” may find it tough to work with XML.

What I’ve just described is essentially seeing the world through database colored glasses—everything’s a table. And yes you could probably figure a way to stuff most anything that can be represented in a computer into a table, but some things fit better than other things. A different version of the same problem is saying well everything’s an object, and we can model everything as objects. And that’s equally flawed, for different reasons.

Elliotte has spent lots of time in XML, objects, and databases, and it’s delightful to see him explain how those perspectives can change how you work with data.

What glasses are you wearing?

Edd Dumbill

AddThis Social Bookmark Button

I now have three audio players I use regularly: my PC, my MP3 player, and my phone. The PC can play any sound format. The MP3 player will only play MP3s, and my phone (a Sony Ericsson P800) can play Oggs or MP3s.

By choice I’d rather encode my music to the free Ogg Vorbis standard, but so few hardware players support it, it’s hard to make that choice. Also, if Ogg Vorbis is replaced by another standard further down the line, I will have to reconvert from the original CDs — assuming they’re playable at all! Reading a Slashdot discussion on this topic the other day, I saw a suggestion to use the FLAC encoding method to insure against future hardware support and encoding changes.

FLAC, Free Lossless Audio Codec, is as its name suggests a non-lossy way of encoding audio. FLAC files are typically around 50-60% the size of WAV files. They can be easily converted back into WAV files and from their into MP3 or Ogg Vorbis files.

Each of my devices have different storage limits. The optimum space/quality trade-off for the MP3 player is 128kbps, while the phone (max storage 64MB) is suited to an Ogg rate of 64kbps.

This is where FLAC wins. It’s very difficult to step down an MP3’s bitrate or transcode it to an Ogg without really degrading the sound quality. Mind you, with FLAC there is a price to pay: each of my CDs encoded in FLAC takes over seven times the amount of disk space as the 128kpbs MP3s.

However, disk space is becoming ridiculously cheap. So, I plan to convert my CD collection into FLAC files, and downconvert to other formats for the MP3 player and phone.

Playing around with implementing this on my Linux machine, I noticed two things. Firstly, you need to ensure to encode the CD metadata into your FLAC file when ripping. For the CD-ripper grip that meant using these encoding parameters: -V -o %m -T TITLE="%n" -T ARTIST="%a" -T ALBUM="%d" -T DATE="%y" -T TRACKNUMBER="%t" -T GENRE="%G" %w.

Secondly, as oggenc will only encode from WAV files or raw input, this metadata needs re-extracting when downconverting to WAVs. The same is true if you’re downconverting to MP3s, of course.

All of which brings me to the point of this weblog post, to share with you my hacky script, flac2ogg, which converts FLAC files to Ogg Vorbis audio files, preserving the audio metadata. Enjoy and improve!


#!/bin/bash

# by Edd Dumbill
#
# converts a flac file to ogg, preserving the vorbiscomments metadata
# writes filenames like 01-song_title.ogg into the current directory
#
# usage:
#    flac2ogg file [file2 ... fileN]
#
# your .flac files should be encoded with this command line:
#
# flac -V -o <FLAC-FILENAME> -T TITLE="albumtitle" -T ARTIST="artist" 
#    -T ALBUM="album" -T DATE="date" -T TRACKNUMBER="n" 
#    -T GENRE="genre" <WAV-FILENAME>
#
# if you use grip, use this format string (all one line)
#
# -V -o %m -T TITLE="%n" -T ARTIST="%a" -T ALBUM="%d" -T DATE="%y"
# -T TRACKNUMBER="%t" -T GENRE="%G" %w
#
# prerequisites: flac, metaflac, oggenc, bash, sed, tr

tmpnam="flac2ogg-$$.wav"

if test "x$OGG_BITRATE" = "x"; then
	OGG_BITRATE=64
fi

ME=`basename $0`

for file in $*; do

if test -f $file; then
	ARTIST=`metaflac --show-vc-field=artist $file | sed -e 's/^.*=//'`
	TITLE=`metaflac --show-vc-field=title $file | sed -e 's/^.*=//'`
	ALBUM=`metaflac --show-vc-field=album $file | sed -e 's/^.*=//'`
	DATE=`metaflac --show-vc-field=date $file | sed -e 's/^.*=//'`
	TRACK=`metaflac --show-vc-field=tracknumber $file | sed -e 's/^.*=//'`
	GENRE=`metaflac --show-vc-field=genre $file | sed -e 's/^.*=//'`
	cmd="flac --silent --decode -o /tmp/$tmpnam $file"
	echo "$ME: Decoding $file"
	eval $cmd
        # crop leading 0, if any
        TRACKINT=`expr 0 + $TRACK`
        # normalise to 2 digits, with leading 0 if required
	fname=`printf "%2.2d-%s.ogg" $TRACK "$TITLE"`
	fname=`echo $fname | sed -e 's/[ \"]/_/g'| tr A-Z a-z `
	cmd="oggenc --quiet -b $OGG_BITRATE 
         --artist=\"$ARTIST\" --title=\"$TITLE\" 
         --album=\"$ALBUM\" --date=\"$DATE\" 
         --tracknum=\"$TRACK\" --genre=\"$GENRE\" 
         --output=\"$fname\" /tmp/$tmpnam"
	echo "$ME: Encoding $fname"
	eval $cmd
	rm "/tmp/$tmpnam"
else
	echo "$ME: Can't find $file, skipping"
fi

done

Got any more tips for FLAC, Ogg and MP3? Let me know.

Timothy Appnel

AddThis Social Bookmark Button

In recent weeks a significant amount of discussion has been ongoing as to the future of Weblog APIs. At issue is that there are two similar, but different Web service APIs in use — the Blogger and MetaWeblog APIs. Within each of those APIs are various interoperability and implementation issues and even some extensions. The community clearly wants one tool-agnostic API that all can utilize and integrate tools with, but there is differing views as to how this will and should happen.

The discussion began with a review of the existing Weblog APIs on Diego Doval’s site that lead to a long thread of comments and follow-up posts. Adriaan Tijsseling creator of the Kung-Log Weblog authoring client offered his thoughts on the matter having worked extensively with numerous implementations available. Most notable in the recent posts is the one by Blogger’s Evan Williams where he explains how their API came to be and why Blogger has not supported the MetaWeblog API. He agrees with the call for a universal blogging API and adds that no one vendor control it. He concludes “I perhaps now understand the need for standards bodies more than I ever have before even though the term gives me willies.”

I think that such an universal interface would be a great for the weblogging community and beyond. There absolutely should be one interface that we all can utilize and rely on. (It shouldn’t necessarily be limited to just weblogging publishing though.) And yes, standards bodies give me the willies also.

This all being said, I’m left with some nagging questions and omissions in the design and implementation discussion that could effect the utility and effectiveness of such an interface — international character support, robust extensibility and cohesion with RSS.

The most important amongst these issues is XML-RPC’s lack of international character support. Adriaan Tijsseling noted to me that Apple’s WebServicesCore and MovableType’s API support UTF8 in XML-RPC, but according to the XML-RPC specification strings are limited to ASCII thereby undermining international character support. There have been numerous threads in the past on this issue such as this one made by Charles Cook.

To a lesser extent, but still significant, is robust extensibility. As weblogging expands and evolves, feature sets will become more diverse and feature sets in tools will begin to vary and hybrids emerge. What effect will this have on interoperability? What XML-RPC
lacks is a straight-forward and reliable way for it to be extended when warranted — whether that’s between two weblogs or two million. In comparison to straight XML, a XML-RPC struct is not straight-forward — its quite verbose. Structs in XML-RPC don’t have the benefit of utilizing namespaces and thereby cannot directly leverage previous work such as Dublin Core. Back in January, Sam Ruby published “Evolution of the Weblog APIs” that amongst many things, highlights this issue by comparing the implementation of services with XML-RPC and other formats.

The simple answer to both these issues would seem to be that the XML-RPC specification should be modified to address these issues, however XML-RPC has been declared frozen and not subject to change.

So I wonder aloud, is a broad API based on XML-RPC the way to go forward? This is a difficult question to definitively answer given the existing landscape, but given the circumstances one worth of consideration.

Furthermore, RSS and Weblog APIs are working with the same data — why are two very different formats needed? As I have asserted in the past, RSS is a Web service we already have. RSS is even more widely deployed then either of these Weblog APIs. It can handle all international character sets and with the introduction of modules in 1.0 and copied in 2.0 you have all the extensibility built-in you need. RSS over HTTP (perhaps even wrapped in a simple SOAP envelope) may be a better long-term solution in terms of extensibility, simplicity, and international support. Merging syndication and APIs in this space seems a plausible and worthy consideration. This notion puts even more emphasis on the need to clean-up and better define RSS.

These are a difficult and contentious issues without easy answers. Nevertheless they are better addressed now rather then later in serving the long-term good of the community.

What do you think is the future of Weblog/Publishing APIs?

Timothy Appnel

AddThis Social Bookmark Button

Andrew Oliver notes that the “inflamatory ‘JCP Better than Open Source?’ monkier for the more appropriate ‘An Independent Look At the Java Community ProcessSM Program’. Thus the power of an informed Java community and a responsible decision maker at Sun listening to the inflamed community fixing this gaffe.”

This change is due to a chain-reaction started by his weblog entry that I highlighted here. I concur with Andrew when he writes “I hope this demonstration proves that we can make a difference. I have a lot more hope that with each of us watching and speaking out, that Sun will over time behave well without our help. Its up to us to be watchdogs. The onion smells much better today.”

Edd Dumbill

AddThis Social Bookmark Button

Economic conditions notwithstanding, WWW2003 remains one of the
larger meetings focused on web technologies. Over 700 papers were
submitted to the conference, vetted by over 300 reviewers. Of these,
about 12% of the papers were accepted. The poster review committee
received over 200 posters. There are 800 participants from 57
countries.

I spent most of the day today following the W3C updates for XML and
Web Services. There weren’t really any surprises in these sessions: I
suspect they’re aimed more at newcomers than at those following the
development of these technologies day by day.

One talk that did contain something new to me was Hugo Haas’
summary of the W3C web services architecture Working Group. The
“WebArch” group’s job is to try and place the acronym soup of web
services into an ordered context.

Hugo’s href="http://www.w3.org/2003/Talks/0521-hh-wsa/">slides
contain a handy diagram giving an introduction to the concept of SOA, or href="http://www.w3.org/2003/Talks/0521-hh-wsa/slide5-0.html">Service
Oriented Architecture. SOA is rapidly emerging as the buzzword to
describe large web services-based architectures.

Although slides alone aren’t always enough to get the full drift of
a talk, they certainly help. One good thing about the W3C track is
that all the talks are on the web, and the URI of the slides is
usually given out at the beginning. Together with pervasive wireless Internet it enables the audience to have the slides up close on their own computers. That’s the upside of their
adherence to HTML as a presentation format, of course. The downside
is an incredibly klunky experience for the audience as slides often
don’t fit on the screen, or presenters scramble for the “next” button.
Now OpenOffice.org can store presentations in XML, the W3C ought to
whip up some XSLT to transform to OpenOffice Presenter format: or better sitll author in OpenOffice and export to HTML. The
result will be much easier on the audience, and allow the W3C still to
use their HTML format for web purposes.

  • In my previous report I mentioned the href="http://www2003.xmlhack.com/">WWW2003 Community Coverage
    site, which I set up with Dave Beckett. Two days into the conference,
    I think it has proved its usefulness. Happily, usage hasn’t been
    overwhelming, so most of the time useful information and context has
    been available during talks. Various people, including Tim
    Berners-Lee himself, have chipped in, often correcting or clarifying
    things the speakers have said.

Share your comments or WWW2003 experiences in the forum.

Edd Dumbill

AddThis Social Bookmark Button

I’m in Budapest, Hungary, attending the Twelfth International World Wide Web Conference. During the day I’m writing up interesting and newsworthy sessions: I’ll keep updating this page.

For raw information, check out the WWW2003 Community Coverage, generated directly from IRC chat by conference delegates.

Here are the stories I’ve written today for XMLhack.

Semantic Web and Web Services can live together, says Berners-Lee

In his opening keynote at the Twelfth International World Wide Web conference, the Director of the World Wide Web Consortium explained how the two main thrusts of the development of the web do not compete, but can work together.

W3C announces royalty-free patent policy

The W3C has announced the publication of its patent policy. After long debate, the royalty-free policy has been implemented as a result of widespread consensus.

Attending WWW2003? Add your comments here.

Edd Dumbill

AddThis Social Bookmark Button

Related link: http://www2003.xmlhack.com/

This week I’m going to be reporting from the WWW2003 conference in Hungary — as will a large number of other delegates, thanks to the increasing popularity of virtual backchannels at conferences.

Love it or hate it, the presence of 802.11b internet access has changed the way delegates consume technical conferences.
The WWW series of conferences was one of the first to provide wireless internet access. Last year in Hawaii quite a few delegates gathered on IRC to comment on and report the talks they attended, and at this year’s Emerging Technologies conference we saw this trend accelerated.

By and large, I found this sort of commentary helpful. Links to subjects related to the talk can be shared in real time. Furthermore, the problems of a schedule clash were mitigated somewhat by being able to consume others’ notes on the session.

The W3C RDF Interest Group has long used an IRC channel and weblogging bot (the Chump) to record notes and commentary. It’s proved a useful tool for simple annotation and information sharing. So for WWW2003 this year, together with Dave Beckett, I’ve set up a chump weblogging bot and chat logs on a special IRC channel dedicated to conference coverage.

The site is at www2003.xmlhack.com, and contains instructions as to how delegates can join in the community note-taking on IRC. It’s my hope that the channel will provide a central point where those that are writing can drop off the URL of their articles or blog entries, so others can share.

If you’re attending the conference, I hope to see you there. If not, try not to be too jealous of our café lifestyle by the Danube, and satisfy yourself with the community coverage!

Is a virtual backchannel good or bad for conference-goers?

Bob DuCharme

AddThis Social Bookmark Button

For most of the history of the web, linking to a point within a document
meant linking to an a element with a name attribute at
that point in the document. To create the linking URL, you added a pound sign (#) and the
name value to the page’s URL to link to that point in the page. If there was no a element with a name attribute at that point, you couldn’t link to that point.

For example, if you go to href="http://www.w3.org/TR/REC-xml-names/#ns-decl" rel="tt:Abstraction/Example">http://www.w3.org/TR/REC-xml-names/#ns-decl, do a View
Source, and search for “name=’ns-decl’,” you’ll see the <a
NAME=’ns-decl’>
tag that let you use that URL to jump directly to the “Declaring
Namespaces” section of the “Namespaces in XML” Recommendation.
This was possible because the a (”anchor”) element, in addition to
its most popular role as the starting point of a link, could be a link
endpoint if it had a name—or, in geekier parlance, if it had
identity.

SGML always let you give any element identity; you just
declared an attribute for it with an attribute type of ID instead of NMTOKEN,
CDATA, or one of the other attribute type choices. (To make things more
confusing, while virtually no one ever named an attribute “nmtoken” or
“cdata,” the most popular name for attributes of type ID was always “id.”) If
an attribute of type ID in an SGML document had a particular value, no other
attribute in that document that had also been declared to be of type ID,
regardless of the attribute’s name (”id,” “uid,” “empnum,” or whatever), was
allowed to have the same value. Part of a parser’s job was to report any ID
attribute value duplication as an error.

When SGML was being simplified into XML, some suggested that,
if DTDs were being made optional, parsers should assume that any attribute
named “id” was of type ID. This suggestion didn’t make the cut for the 1.0 Recommendation,
although the suggestion still crops up in any discussion of potential XML modifications (in fact, it did just this month; see update below). XML 1.0 handles ID attributes the same way SGML did: you have to declare id attributes as having type ID, they don’t have to be named “id,” and most people name them “id” anyway.

While the HTML a element had a name attribute
since its first
DTD
, it didn’t get an optional id attribute until around HTML
4.0. (3.2
doesn’t have it and href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/links.html#h-12.2"
shape="rect" rel="tt:C-source">4.01 has it along with the other href="http://www.w3.org/TR/1999/REC-html401-19991224/sgml/dtd.html#coreattrs"
shape="rect" rel="tt:C-source">core attributes.) XHTML had it from the beginning, and the href="http://www.w3.org/TR/xhtml-modularization/">Modularization
of XHTML Recommendation actually href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#s_nameidentmodule" rel="tt:C-source">deprecates the use of the module that declares the
a element’s name attribute. For backward
compatibility, it makes the name attribute available, and tells us
that “if the name attribute is defined for an element, the
id attribute must also be defined. Further, these attributes must
both have the same value.” (A View Source on this document shows that it
practices what it preaches: this “Name Identification Module” section includes
the tag <a name=”s_nameidentmodule” id=”s_nameidentmodule”> in
its header.)

Of course, what the specs say is purely academic if the
browsers don’t implement it. I only recently found out that you can point
recent releases of Mozilla, Internet Explorer, and Opera directly at any
element (and that’s any element, not just the a element) in
an HTML document that has an id attribute by adding a pound sign and
the id attribute’s value to the URL for that document. For example,
the paragraph above beginning “SGML always let you” has an id
value of “p2″ with no a element near it, and the
link in this sentence jumps to “#p2″. Try it.

This means a lot for web linking: it means the bar has been
lowered for linking into fairly arbitrary points within documents. It just
wasn’t practical to ask people to add lots and lots of a elements
with name attributes to all of their documents. Some
people like to think that XPointer will make this kind of arbitrary addressing easier,
but even if all browsers supported XPointer (Mozilla 1.4, currently in beta,
has some
support
) the use of XPointer assumes that the target document is
well-formed, which is too much to assume for most of the web. It’s much more reasonable to hope for increasing use of id attribute values in HTML documents.

And, apparently, it’s already in progress! When I began
writing the preceding paragraph, I didn’t remember where I had heard about
Mozilla’s burgeoning XPointer support, so I did a Google search and found the
appropriate page. When I found the part of the page that I needed, I did a
View Source and saw it enclosed in a div element with an id
value of “linking,” so I was able to use that as my link target. I didn’t have
to wish “if only there was an a element with a name
attribute there…” as countless web page authors have wished about countless
potential web linking destinations since the web began. This is progress, and
I’m psyched. I’m going to get in the habit of adding id attributes to
my block-level HTML elements; all it takes is a short XSLT stylesheet. The
more we all do it, the more we can do with the web.

Update, May 19th: When I wrote this, I didn’t know about Chris Lilley’s summary of XML ID issues for the W3C’s Technical Architecture Group, How should the problem of identifying ID semantics in XML languages be addressed in the absence of a DTD? It’s required reading for anyone interested in ID issues in XML.

Do you see more use of id attributes in HTML? Am I missing anything here?

Simon St. Laurent

AddThis Social Bookmark Button

XML Europe is now over, but conference-goers looking for XML and XML-related topics can find plenty of them at WWW2003, OSCOM, OSCON, Applied XML Developers Conference 2003 West, and Extreme Markup Languages.

XML Europe, held two weeks ago in London, kicked off the summer conference season with two notable keynotes. XML Europe provided coverage on multiple levels of a huge variety of XML-related subjects.

WWW2003, which starts tomorrow in Budapest, is an annual conference about the World Wide Web. The main schedule is a series of presentations at a fairly high level, while the developer’s day digs deeper into specific topics. Too late to get to Budapest? xmlhack is sponsoring an IRC-based blog from the site.

Next week, the third Open Source Content Management (OSCOM) conference will be in Cambridge, Massachusetts. The program covers a broad range of content management topics, including subjects like Web Services and RDF, all with an open source angle.

Looking forward to July, O’Reilly’s own Open Source Conference )(OSCON), will be in Portland, Oregon from July 7-11. OSCON starts with two days of tutorials and then has three days of sessions, including an XML track. XSLT, Web Services, schemas, Microsoft Office XML, security, and XQuery will all get a close look in an open source context.

Across town in Beaverton, Oregon on July 10 and 11, the Applied XML Developers Conference 2003 West will be exploring a mostly Web Services-oriented agenda. If you need the latest from some of the brightest minds at IBM, Microsoft, and some of the smaller Web Services vendors and promoters, this is a good place to be.

Finally, Extreme Markup Languages will close the summer with five days of high-powered conversation about markup. The beauty of Montreal in August combined with the depth and intensity of the conference presentations is a great combination. Extreme will be including the Knowledge Technologies conference, making it a good place to learn about Topic Maps and RDF as well as markup directly. The program for this year isn’t yet posted, but programs for past years (2000 2001 2002) should give you an idea.

It should be a good - and busy - summer!

Know of more XML-oriented conferences?

Simon St. Laurent

AddThis Social Bookmark Button

Last week’s XML Europe conference started with two keynotes on the relationships between open source development and XML - Jon Bosak exploring the prospect of combining the two to change the world, and Daniel Veillard exploring how the open source community is using XML.

Jon Bosak has moved from the heart of XML development to ebXML and Universal Business Language (UBL). Bosak’s keynote expressed a somewhat unusual goal for a technology conference - “saving the world”. Bosak argued that the values at the heart of SGML and then XML offer the prospect of a better world if translated - through open standards with open source implementations - to a much larger context.

Daniel Veillard has also moved from work at the heart of XML, now working at RedHat rather than the W3C. His keynote looked from a different perspective, at how the open source community has reacted to XML and XML-related specifications. Veillard focused less on what XML promises and more on what developers had chosen to do with it.

In some ways, these keynotes presented completely different perspectives, but they also suggested that the synergy between XML and open source is growing, with strong prospects for the future. I’ve explored these issues before, but these keynotes offered far more details about what the combination has to offer.

We’ll be focusing more on the synergies between XML and open source at OSCON this July.

Are open source and open data genuinely compatible? Or is there still cause for paranoia about XML?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.oreillynet.com/pub/wlg/3190

Tim O’Reilly emphasizes Paul Graham’s observations on a sketching mode for programming. One of my favorite aspects of XML - and markup, more generally - is that it allows a similar improvisation process for data structures.

Tim notes:

The reason why dynamic languages like Perl, Python, and PHP are so important is key to understanding the paradigm shift. Unlike applications from the previous paradigm, web applications are not released in one to three year cycles. They are updated every day, sometimes every hour. Rather than being finished paintings, they are sketches, continually being redrawn in response to new data.

Data structures don’t usually have quite the same need for constant flexibility as programs in flux, but far too much effort has been spent describing how XML lets people standardize their information exchange structures without taking note of the flexible process XML provides for getting there.

Perhaps the greatest innovation in XML 1.0 was that it discarded the dictum that all markup structures must conformed to a previously-created design. A large share of XML projects start with XML documents, developers building their structures around existing data to create sample documents rather than working in the abstract to build a set of appropriate structures which are then serialized. There’s no need for a DTD or a schema at this stage, and many vocabularies don’t even bother with such things when they’re mature.

Like programs, markup vocabularies often make a shift from doodles and sketches to something more formal. Prototypes set patterns which are then set into something resembling stone, and rules take over from the joyful improvisation which brought us to that point. That seems to be a common pattern in technology, but it’s worth celebrating the freedom available in less formal technologies, freedom which lets us make our own decisions and experiment on real content before choosing how best to solve a problem.

This doesn’t mean that XML is easy for everyone to do; taking advantage of the freedom it offers requires skill. Improvisation can seem simpler at first than pre-planned structure, but good improvisation is itself an artform. Sketching comes in all levels of quality.

Perhaps my favorite aspect of XML is that the option of doing your own thing is always available, even after a committee has done its work. If you find a particular language rule-bound and contrary, you can use its rules to tranform its content (with XSLT or equivalent) into your own favorite flavor. Exposing the data and its structures gives everyone a chance to make it their own, and talk back in a language other systems can understand.

Ever want to hear jazz improvisation in the middle of a classical concert?

Timothy Appnel

AddThis Social Bookmark Button

Related link: http://servlet.java.sun.com/javaone/sf2003/conf/sessions/display-3294.en.jsp

Andrew Oliver writes: This session displays Sun’s cynical view perfectly. He continues, the outcome is stated in the topic! [link via Sam Ruby]

What are your thoughts on the JCP and Open Source?

Micah Dubinko

AddThis Social Bookmark Button

Related link: http://lists.w3.org/Archives/Public/www-forms/2003May/0011.html

The topic of why W3C standards are so complicated has come up again. Steven Pemberton has a well-reasoned and balanced take on the subject.

Why do you think web standards to so complicated?

Lisa Balbes

AddThis Social Bookmark Button

Related link: http://www.ornl.gov/TechResources/Human_Genome/

On the 50th anniversary of the double helix structure of DNA, the goals of the Human Genome Project were officially completed. When started in 1990, it was projected to take 15 years and cost $3 billion. It actually took 13 years and $2.6 billion. This draft covers 99% of the genome and closed 99.5% of the gaps in the rough draft. The 24 April 2003 issue of Nature will lay out 15 grand challenges that will be taken on next, to build on this success.

Bob DuCharme

AddThis Social Bookmark Button

Frank Shepard was a salesman for a Chicago legal publisher. Shortly
after the American Civil War, he noticed that when one court case overruled,
criticized, or otherwise cited another, lawyers often jotted a note about it
in the margin of the reporter volume with the cited case’s text. For
example, upon learning that the judge in the case known as “La Bourgogne” (210
U.S. 95) made a negative references to the “Moore v. American
Transportation Company” (65 U.S. 1) case, a lawyer might turn to page 1 in volume 65
of the U.S. Supreme Court case reporter and write “210 U.S. 95, negative” in
the margin next to the Moore case. This way, if if the Moore case ever came up
in court, the lawyer would have a better idea of its exact value.

Shepard had an idea: if he printed gummed labels for each case listing
the cases that cited it, he could save the lawyers the trouble of writing in
these references by hand. He built a business out of selling these inter-case
links to the legal profession and named the company after himself: href="http://www.lexisnexis.com/shepards/">Shepard’s. (Full disclosure:
since Reed Elsevier acquired Shepard’s in the mid-1990s, Shepard’s Citations
has been a product of my employer, LexisNexis. Other than some occasional XSLT
advice to the folks in Colorado Springs, where Shepard’s has been based since
1947, I don’t do any work on that particular product.) In one sense, the
stickers they produced in 1873 were already more sophisticated than web links,
because if more than one case had cited the same case, the sticker for that
case added a one-to-many link to it.

To help the lawyers quickly learn why one case had been cited by
another, Shepard’s started including href='http://www.lib.odu.edu/research/sguides/soccase.shtml#shep'>one-letter
codes to show that the citing case had overruled, criticized, modified, or
applied some other treatment to the cited case. Now their links had link types:
indications about the nature of the links to give a clue about why
they might be worth traversing.

The stickers, or “Adhesive Annotations,” became very popular. While
sitting on the Massachusetts Supreme Judicial Court, future United States
Supreme Court Justice Oliver Wendell Holmes Jr. wrote “I regard Shepard’s
Massachusetts Annotations as the most thorough labor-saving device that has
even been brought to my attention. No one owning a set of reports can afford
to be without one.”

Before the nineteenth century came to a close, the company began
producing alternatives to the sticker collections: bound books that listed,
for each case, the cases that cited it and codes describing the citing case’s
treatment. Today, we call this separation of the links from the linked
resources “out-of-line links.”

The books became so popular that their inventor’s last name became a
verb. Any lawyer or law student knows that to href='http://www.lectlaw.com/files/lwr17.htm'>Shepardize a case is to find
out all relevant cases that cite it. Of course, automating the storage and lookup
of these links is much easier with software, and it’s all online now. When you
view a case using LexisNexis, clicking the “Shepardize” link displays a list
of citing cases with links to the full text of those cases. This saves a lot
of running around a law library, which was how the links were followed for the
first century of their existence. (LexisNexis’s chief competitor, WestLaw, has
a competing on-line product called KeyCite.)

The success of Frank Shepard’s invention tells us several things about linking:

  • Link typing can add real value to a linking application.
    If a lawyer who’s going to bring up a case in court Shepardizes it and sees
    only codes for positive treatment, there’s little need to look up the citing
    cases. If other cases criticized the case to be cited, however, it’s his job
    to find out why. (Too bad it’s href='http://www.oreillynet.com/pub/wlg/3094'>so difficult to find other
    examples of link typing adding obvious value!)

  • Out-of-line links can sometimes be more useful than in-line links.
    The web and other hypertext systems leading up to it have conditioned many
    to think of a link as something that connects the resource they’re looking at to a
    single other resource somewhere else, but links can be more than that. Shepard’s customers
    found that having all the citation links in a single set of books instead of
    as a set of stickers to be spread around hundreds of volumes can make the
    research go much more quickly, especially with the treatment codes added to
    the link identifiers to give clues about whether the links are worth traversing.

  • It’s not about the technology, but about the information. Just as
    a well-written song can work well when performed by different bands, a good
    linking application can still have value when implemented using different
    technologies.

What other linking applications have “ported” well across technologies over time?