May 2004 Archives

Bob DuCharme

AddThis Social Bookmark Button

Related link: http://www.cafeconleche.org/oldnews/news2004May5.html

While writing a report on XML Europe for distribution at work, I couldn’t find the excellent notes taken by a former neighbor from my Brooklyn days, Elliotte Rusty Harold, so I e-mailed him to ask. (It turns out that he archives Cafe Con Leche news items on individual pages for each day of news; for example, his notes on the second day of the conference are at http://www.cafeconleche.org/oldnews/news2004April20.html
.) I also suggested that he add ID values to his block-level elements so that documents like my report could link to his discussions of specific talks at the conference.

I’ve written before in this forum (1, 2) about the value of adding ID values to HTML and XML block-level elements. As I wrote to Elliotte, “Simple automated ways to add genuinely useful metadata are few and far between, so I think it’s worth jumping on any we can find.” Tim Bray wrote an excellent essay on metadata with the assertion, catchy enough that Simon St. Laurent blogged it on the O’Reilly Network, that “there is no cheap metadata.” Tim contradicted himself, however, by listing some metadata that’s free: “filename, created/modified dates, who created it, what kind of file (HTML, Excel, PowerPoint), how big it is.” This metadata, although free, has definite value. The knowledge that Google’s number one hit for my search term is a four meg Word file and the number two hit is a 200K HTML file strongly influences my choice of which link to follow first, and providing criteria for making link traversal decisions is the whole point of link metadata.

As Tim alludes, the best metadata comes from a paid staff making human judgments about the best metadata to add. I’ll call this “judgment-call metadata” to distinguish it from metadata generated with algorithms. This is expensive, but makes sense at a business like my employer because lawyers will pay extra for summaries of court decisions and for the ability to search legal cases using keywords from a carefully maintained taxonomy. But what about users without the kind of working budget that lawyers have?

Some useful metadata is still pretty cheap. I managed to convince Elliotte that the trouble of adding IDs to block elements was worth it. Larry Page and Sergey Brin, who didn’t start off with giant server farms but by doing academic research, identified and took advantage of a new kind of web metadata that was cheaper than human editors, and it certainly worked out well for them.

The Google page rank algorithm and automatic addition of ID values are just two sources of inspiration to spur us into looking for new sources of cheap metadata—or at least to look for new, inexpensive incentives for people to add judgment-call metadata. I’d love to see the semantic web movement more concerned with finding and generating usable metadata and less focused on what to do with that metadata come the revolution. FOAF files are fun, but demonstrating the potential value of the semantic web will require more metadata than information about which of our friends also have FOAF files.

A bit of lobbying might help. I’ve asked O’Reilly to include the Subject, Secondary Subject, and Topic values entered with each of these O’Reilly network weblog postings in the RSS feeds about them. Is your blogging tool collecting more metadata about each entry than it includes in its RSS feeds? Why? Ask the people behind it.

What about new incentives for adding judgment-call metadata? Stephen Cayzer’s work at HP Labs (see his XML Europe paper), which demonstrates how better user interfaces can make the entry of metadata less trouble for the user, will hopefully inspire others to think more about acquiring good metadata and postpone some of their ideas about what to do with that metadata.

The success of javadoc compared with the overall slow progress of literate programming should also give us some ideas: why do the majority of Java programmers consider the trouble/payoff ratio for adding javadoc comments and tags to be low enough that they follow through and do it, while the developers who follow through with all the principles of literate programming are still a tiny minority? What sweet spot has the javadoc system hit that people bother to add this metadata to their code even though their code will compile and run just fine without it?

Where will new sources of inexpensive, parsable metadata come from?

Michael(tm) Smith

AddThis Social Bookmark Button

Related link: http://www.debian.org/News/2004/20040506

From a news item the Debian project released at the beginning of May:

The upcoming stable Debian release (code-named sarge) will feature fully integrated XML support. Multiple toolchains for XSL(T) processing, a fully standards-compliant XML catalog system, and a Debian XML policy document for both Debian developers and users.

I reckoned the XML support in Debian testing was already pretty good, so not sure what all is new here. Mainly, maybe it’s not so much the tools, which were already in place, but other stuff, like the XML policy doc, packaging guidelines and a claim of FHS compliance.

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://story.news.yahoo.com/news?tmpl=story&cid=581&e=1&u=/nm/20040520/tc_nm/tec…

Bill Gates is talking about blogs, and I’m wondering what that might mean over the next few years.

Reuters reports Gates as saying:

“What blogging and these notifications are about is that you make it very easy to communicate.”

Gates is right about that - blogging is certainly easier, for cases where it’s appropriate, than setting up a website in more traditional ways, and the infrastructure built up around various flavors of RSS and Atom make it easy for people to publish once and have automatic distribution to interested parties.

I wonder, though, what form Microsoft’s involvement in the space might take. Having just finished writing a book on Office 2003’s XML capabilities, I know that the parts for making familiar applications talk with weblogs are all there, but it’s still a long way from having a button in Word that “publishes to blog”. (Don Box seems to be working on that problem, and also used InfoPath as a blog editor for a while.

It’s a space worth watching, anyway.

Any thoughts on what an MS blogging system might look like?

Michael(tm) Smith

AddThis Social Bookmark Button

Related link: http://xmlhack.com/read.php?item=2187

Big thanks to Micah Dubinko, Uche Ogbuji, Eric van der Vlist, Simon St.Laurent, Edd Dumbill, and the others who made XMLhack such a great resource over the last five years. I don’t think I remember reading the very first XMLhack item when Edd originally posted it, but somewhere between that and the first one I ended up posting (geez, it’s been more than three years ago now…), I became a devotee — and feel very fortunate to have had the opportunity to contribute to it.

I’ll really miss it.

AddThis Social Bookmark Button

Related link: http://cnt.ucsdx.net/courses/WSeducation.php

I’m psyched to be working with Michele Leroux Bustamante on an interoperability demo for Web Services Education Interoperability Day. It’s a welcome shift from crafting Powerpoint and technical architecture strategies

The demonstration is focusing on leading edge web service specifications such as WS-Policy, WS-Trust, WS-Security, and SAML. As we all know, there’s not much out of the box integration when using open source projects, and there is even less when mixing cross-platform web services (.NET/Java), security, identity management, and open source.

I’m currently firing up Axis Web Services augmented with open source identity management from OpenSAML and SourceID to test federated authentication and authorization across Java and .NET components. I’ll keep you posted on lessons learned.

Any success mixing web services and open source identity management projects?

Timothy Appnel

AddThis Social Bookmark Button

So I've spent the better part of yesterday and this morning reading email and response to my earlier O'Reilly post. I still stand by what I said and think it generally holds up. There are some good counterpoints being made too in addition to some real rubbish and loony ranting. Let me recap what mostly worthwhile views I've been reading.

  • A significant number of users said they don't expect MT to be free and have no qualms about paying for it however the current fees and restrictions are out of line especially in regards to past requests for donations. Most complaints were about the personal licensing. I saw little noise about the commercial licenses. In fact, the commercial users that spoke up expressed support of those licenses as is, noting that their companies pay significantly more for software that does less.
  • Many MT users are operating multiple MT weblogs to do things like remainder links, photoblogs, book reviews that essentially for one weblog in appearance. Others are operating multiple weblogs for friends, family and so on.
  • Despite the fact MT3 does not have any nagware or crippling code, many people care about not stealing Six Apart by running illegal or improperly licensed copies of MT. (That was a bit encouraging.)
  • The damaging effects of the radio silence I mentioned in a past post still continue. Many did not expect the fees. Even testers were not made aware of them. Others pointed to an earlier SixApart post announcing MT3 as a significant and free upgrade. There was no retraction or explanation that things had changed. To a lessor extent some were annoyed about this release being a developer edition instead of a general use version without notice.
  • The new terms are unclear in a few places. For instance many hosting with ISPs point out the server they are on have multiple CPUs, however the new license restricts use to 1 CPU. The aforementioned hack of multiple MT blogs to create one physical blog is yet another.
  • Conspiracy theorist who equate TypeKey with big brother were miffed that you have to have a TypeKey account to download the free version of MT.
  • Some users a don't understand or are accustom to software licensing practices, costs and economics particularly when it comes to servers. Should I users who has one low-traffic weblog pay the same as someone who hosts a dozen and get lots of traffic?

I was once interviewing a musician about his avid fan base and their sometimes negative reaction to his bands work. He said something profound that has always stuck with me. To paraphrase, he said being hated is better then indifference because when your hated at least they care enough to have an opinion. If there is a silver lining in all of this, its that a lot of people still care about Movable Type and Six Apart's products.

That said, there is work to be done and it will be interesting to see how this will play out. I thought a comment Jay Allen made summed the situation up.

To all of this, I can only say, they are a young company and are bound to make mistakes. I know that they are also razor sharp and have the good of this community at heart, so it won't take them long to make these things clear.

Agreed, which is why I think all of this outcry is a bit over the top. It's not like this license came out and the user community said Hey, I have problem with because X or I don't understand Y because of this situation to which Six Apart said piss off, pay up or go away. These licensing terms have barely been available for 24 hours – no time for any company to react and respond coherently. Let us not overlook that version 2.661 is still available and is free without limitations. Also, none of their software has some poison pill or nagware built in either for those who do not acquire a proper license. I'm also really astounded by the evil empire mentality that many has adopted towards a company that in the past has a proven track record of trying to do the right thing. But, I digress.

There are things to learn from this situation as michaelashby commented to my earlier post, Eating Should Include A Balanced Diet. Here are my thoughts to potentially finding a better balance:

  • Six Apart needs to start communicating – a lot and soon. Start by explain how this happened in straight forward terms. Also acknowledge that you are listening and are to address the feedback to the new licensing. While the situation seems urgent, the worst thing they could do is rush to a decision and it be out of line for them or for users. I think we'll see that they are not some greedy evil empire as some has accused them of being.
  • The prices and limitations need to be tweaked especially when it comes to personal licenses in response to the uproar. To not do something in response to the extraordinary outcry would add more fuel to the fire.
  • Offer credits and discounts to donors to move to TypePad. From what I can tell looking at a lot of these sites, many would be better off on TypePad were they get hosting and lot of other features with no limitation on weblogs or authors. One post noted that TypePad cost $180 a year. True, but how much do many of us pay for hosting MT? Dreamhost would be about $120. Pair Networks would cost about $216 – more then TyepPad. (This is a really rough comparison and is to a degree over simplified, but I think it gets my point across.) As part of the program make it extremely easy to transfer a weblog running in MT to TypePad Pro. (The current MT export does not handle templates which I suspect would be a big deterrent to many users migration.)
  • Clarify licensing terms such as the single CPU/shared hosting snag many have noted.
  • Clarify the term blog particularly in regards to those who using multiple MT blogs to hack together advanced features such as remaindered links, photoblogs etc. If the terms are to remain MT blog, declare a grace period in the licensing for these users while the company addresses this situation with tools/services or a few new license friendly how-to's that remove the need for multiple weblogs.

What do you think is a acceptable balance?

Bob DuCharme

AddThis Social Bookmark Button

Related link: http://weblog.infoworld.com/udell/2004/05/13.html#a1000

Remember when people used to say “hypermedia” instead of “hypertext” to hint at the wider possibilities? Jon Udell’s new posting updates us on his experiments with linking to selected portions of video streams on the web.

Timothy Appnel

AddThis Social Bookmark Button

In the wee hours of the morning today, Six Apart released Movable Type 3.0 to mixed reviews. (More on that in a bit.) This release is being called a developers edition that is not for general public use. It is also not a feature release says Six Apart. In many ways this release is like that of the original release of Mac OS X. There were few new features, but a significant changes to the underlying system that are poised to take the company in a whole new direction.

In that vein, MT is graduating to a platform rather then just a personal publishing system. This is great news and an important distinction for developers looking to extend and enhance MT for various non-traditional weblog uses like I have in my work. Six Apart is acknowledging the importance of developers to the evolution of MT. To kick off this renewed commitment to developers, they've announced the formation of a developer's network, plugin contest (more here), and new less restrictive and more diverse and costly licenses. First the happier side of the news.

Drilling down, new features for developers include:

  • The ability to create object callback plugins on pre and post saves, loads and removes. These will come in handy for doing automated mirroring, versioning and integrating subsystems that link to core information.
  • A plugin registration API that displays the plugin name and other metadata such as description, documentation and configuration script links in the MT content management interface.
  • A pluggable authentication system for comment boards. (The company launched their own hosted system that MT defaults to named TypeKey.)
  • Numerous performance enhancements including lazy fetching of data. (Developers can take advantage of this capability in their own plugins.)
  • A number of bug fixes. For instance, MTElse works with conditional plugin tags now.

From a user perspective MT 3 features a new lighter-weight interface which takes full advantage of CSS. It also reorganizes the interface to make comment and Trackback ping moderation easier to manage. Comments also have a number of new features which include moderation approval of messages and posters in addition to authentication. Email notifications have become more robust adding a verification step and (finally!) an unsubscribe feature.

Six Apart also announced new licensing which has been quickly panned by the push button publishing community. While there still will be a free version of MT, it is limited to 3 weblogs and 1 author. The reaction has been swift as many decry the new terms (specifically the fees) that run many weblogs with many authors that using MT will cost them. Many of these posts gripe that alternate server-based tools such as WordPress do not support multiple blogs and/or authors yet. What's a bit silly about these posts is that not one so far notes that the hosted version of MT (TypePad) allows for unlimited authors and weblogs (plus many other features not available in MT) at a price that rivals basic hosting packages.

The delineation between TypePad and MT have become clear with this release – TypePad is for general users wanting to blog and Movable Type is for developers and professional organizations wanting to do more then just weblogging.

Of the reactions I've read this morning I think Brian Stearns had the most poignant observation of this furor. Noting many of the initial Trackback pings to Mena's post he writes

For me this outlines that a large part of the weblog world was in it because it was free to do for the most part and an easy way to do something innovative (at least when they started). I think a large part of the internet world is cheap and not willing to pay for things so I will not be surprised to see people dump MovableType to start using a free weblog tool or discontinuing their weblogs altogether.

Agreed, Brian. Rumor around the MT community is that Six Apart was collecting less then 50 cents (US) for each copy of MT downloaded. That is absurd for a piece of commercial software!

This outcry raises a bigger more important point which is the reason for my post. As a developer and one who makes a living writing code, this reaction to Six Apart's new licensing is really disheartening and on a certain level frustrating to see. I am a firm believer and backer of open sourcefree software. (CORRECTION: I meant free software and slipped. I do back open source and open code software though.) I've personally released quite a bit of open source code myself and will continue to do so. However this apparent expectation of the vocal part of community that it is their right to have all great works of software at no cost is bothersome. If users don't have the funds or won't pay on principle for my time, effort or talent – how do I eat?

How are professional developer supposed to eat?

Michael Fitzgerald

AddThis Social Bookmark Button

Related link: http://subversion.tigris.org/

I don’t have a ton of complaints about RCS (which I started using about 14 years ago) or CVS, though I am sure others have more. My one fog-minded problem I had with CVS was configuration — just getting it to work as expected — which might have just been a documentation or operator (that’s me) problem. It’s been awhile, so I don’t remember what sources I was using, but I got frustrated and just created my own silly backup system with a script and ZIP. In the last few weeks, though, I stumbled on Subversion. I downloaded and installed it, and so far I am liking it. Actually, I am liking it a lot.

Subversion has really good documentation, which made the set up easy and sort of fun. I downloaded the Win32 version of Subversion to my Windows XP box (Jostein Anderson’s installer version). (Subversion also has builds for Redhat, Debian, SUSE, Mandrake, FreeBSD, pkgsrc, and Mac OS X.) Once installed, Subversion was available at the command prompt (there’s a UI version for Windows, too), and I was able to set up a repository and start checking things in with just a handful of commands.

Here is a cheat sheet for using Subversion on Windows on the command line (probably works on other platforms if you just change the file paths to match your platform).

Create a repository:

svnadmin create c:/repository

Get help on any command:

svn help import
svn help checkout
svn help ci

Import a directory (c:/Stories) to the repository:

svn import Stories file:///c:/repository/Stories -m "initial import"

Checkout (co) the directory (I moved the original directory out of the way first, just to keep things straight in my own mind):

svn checkout file:///c:/repository/Stories Stories

The Stories has a file called genx.html. Edit. Edit. Edit. Commit (ci):

svn ci genx.html -m "Incremental backup"

Look at the log in XML:

svn log genx.html --xml

Result:

<?xml version="1.0" encoding="utf-8"?>
<log>
<logentry
   revision="6">
<author>mike</author>
<date>2004-05-10T23:24:56.959208Z</date>
<msg>Incremental backup</msg>
</logentry>
<logentry
   revision="5">
<author>mike</author>
<date>2004-05-10T21:49:15.263054Z</date>
<msg>Incremental changes</msg>
</logentry>
<logentry
   revision="4">
<author>mike</author>
<date>2004-05-10T21:36:39.796747Z</date>
<msg>Incremental changes</msg>
</logentry>
<logentry
   revision="3">
<author>mike</author>
<date>2004-05-07T21:20:05.425290Z</date>
<msg>Added to set up section</msg>
</logentry>
<logentry
   revision="1">
<author>mike</author>
<date>2004-05-07T19:27:47.276308Z</date>
<msg>Initial import</msg>
</logentry>
</log>

If you are not entirely tickled with CVS, try Subversion. It’s clean, it runs on lots of platforms, it’s nicely documented, and it’s a pleasure to use. There is promise of cool new features (such as HTTP WebDAV/DeltaV repositories on Apache servers), and from what I’ve seen, I think the Subversion folks will deliver.

Will you try Subversion, or is it too much of a hassle to move away from CVS?

Bob DuCharme

AddThis Social Bookmark Button

Transclusion is the inclusion of all or part of a resource within another one. The name and earliest implementations, like the name and earliest implementations of hypertext, came from Ted Nelson.

XML’s original way to do this, external general entities, could only transclude entire documents, and its SGML-rooted syntax scared some people off. I’ve written of a way to implement transclusion with XSLT, which does let you insert a subtree of another document by specifying that subtree with an XPath expression, but it’s a bit kludgy. The best way to perform transclusion in XML is XInclude, for several reasons:

  • It’s a W3C specification designed to complement the others. Many agree
    that XML 1.0 DTDs had too many jobs to do, and transclusion was pretty far outside of the general responsibilities of validation. Simple
    utilities that perform transclusion well and don’t try to do anything else fit
    well with the growing popularity of the pipelining approach to XML processing.
  • By using the W3C XPointer specification, XInclude lets you be much more granular than even XPath in specifying a portion of another document to include. You can specify a subtree easily enough, but you can also grab a text range of the document being retrieved for insertion into the host document.

(See Elliotte Rusty Harold’s July 2002 introduction to XInclude in XML.com for more background.) There was a good deal of talk (in addition to Eliot Kimber’s presentation) about XInclude at XML Europe, and with good reason: the W3C spec for it became a Candidate Recommendation the week before, and more and more implementations are available. Not all implement the W3C XPointer spec, but libxml does. This was my first exposure to real hands-on XPointer use, and it’s fun.

I love how, if you set the include/@parse attribute to “text” and point the include element at an XML file, the XInclude processor will escape the < and & characters. This will be so useful to me as I write about XML or code that I’ll probably write an Emacs macro to insert <pre><xi:include href=”" parse=”text”/></pre> into documents with one keystroke.

XInclude also makes polling of URI collections very simple. This makes it great for RDF applications, because a key strength of RDF is its ability to store data in geographically distributed locations to be aggregated as needed. If you run libxml’s xmllint utility with the -xinclude option on the following file, you can pipe the output to a program that parses RDF/XML and loads the triples right into a triplestore:

<foafs xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include xi:href="http://www.snee.com/bob/foaf.rdf"/>
  <xi:include xi:href="http://norman.walsh.name/foaf"/>
  <xi:include xi:href="http://heddley.com/edd/foaf.rdf"/>
  <xi:include xi:href="http://simonstl.com/foaf.rdf"/>
</foafs>

Or if you want to just pull the rdf:RDF element out of a larger document, your xi:include instruction could follow this model, which takes advantage of XPointer:

 <xi:include xi:href="http://www.example.com/some/path/somefile.xml"
             xpointer="xmlns(rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#)
                       xpointer(//rdf:RDF)"/>

A REST implementation of XInclude that lets us pass a URL to it and then sends an XInclude-processed version back would be a big step forward for transclusion, the web, and for two little-noticed but valuable W3C specs.

Have you used XInclude at all? How about XPointer with or without XInclude?

Michael Fitzgerald

AddThis Social Bookmark Button

Related link: http://idealliance.org/papers/dx_xmle04/papers/03-02-02/03-02-02.html

I didn’t see it at XML Europe 2004 (in fact, I didn’t go, though I wished I could have), but according to Eric van der Vlist, Alexander Peshkov of RenderX, Inc. gave a presentation on writing a RELAX NG schema for XSL-FO.

This leads me to a question: I wonder why the W3C did not provide a schema for XSL-FO 1.0 written in XML Schema? A search of the XSL-FO 1.1 working draft yielded no reference to XML Schema, so I am not sure such a schema is in the queue. I guess if someone needed it, he or she could use Trang to convert Mr. Peshkov’s RELAX NG schema to XML Schema.

Schemas can do more than just validate instances. They can concisely express to the human reader (in this case, a reader of the XSL-FO spec) just how to put a working XSL-FO document together. Ordinary English does not always do the trick (have you read the RSS specs?) I would hope that the W3C would provide a schema, in either RELAX NG or XML Schema, along with the XSL-FO spec in the future.

Do you think, like I do, that the XSL-FO spec begs for a supporing schema?

Edd Dumbill

AddThis Social Bookmark Button

Related link: http://www.go-mono.com/archive/beta1/beta1.html

The first beta release of Mono is out! Headed for a 1.0 release in late June, Mono’s starting to look very polished indeed.

I’ve been using Mono for several months now, mostly with the IDE MonoDevelop.
Talking with other hackers who’ve started playing with Mono, we’ve agreed that the best thing about Mono is that it’s
fun to program in.

It’s always a matter of subjective opinion, of course, but here’s why Mono and C# is fun. You benefit from object-orientation and comprehensive API library.

A bit like Java.

You don’t however have stupid restrictions like one public class per file, or a rigid deeply nested directory structure. You don’t have to give up make or autotools.

A bit like C.

You don’t however have to worry about memory management, or traipse through the heap in the increasingly cranky gdb debugger. And compilation is blisteringly fast. So fast, it may as well be interpreted.

A bit like Python.

Except that the strong typing means you catch more errors
up front. And that your libraries can be re-used from other languages that interoperate with the common runtime.

So if you’re finding your development cycle on Linux getting bogged down, why not give Mono a whirl?

What do you like best about Mono?

Micah Dubinko

AddThis Social Bookmark Button

Related link: http://dubinko.info/blog

I finally have a personal content management system that lets me access the data, even when the software isn’t running.

More than a little inspired by Danny O’Brien’s Life Hacks talk at ETech, and using more than a little of David Mertz’s public domain code from Text Processing in Python, I finally took a first major path towards the Brain Attic concept I first wrote about in this very weblog.

A funny thing happened on the way to XML, though. It turns out that plain old text is a better format for writing, and reading (which happens much more often). As an author and editor of the XForms specification, I don’t say this lightly. Your favorite text editor is the greatest productivity tool there is.

All my important textual data–my working (and searching) set–is now spread out over a tree of intelligently named directories. All *.txt files. I can move to any OS and be instantly productive. I can easily copy these files to my iPod or any other PDA.

I have scripts to convert structured text to XHTML, suitable for printing or, say, submitting a manuscript. I have scripts and XSLT to produce a weblog and RSS feed from a text file (now active, check it out).

XML is, of course, still important, and as long as people need to edit XML, they’ll need XForms. But something more fundamental, something missed by practically every existing piece of software, is the most important thing:

It’s the data, stupid.

How do you manage all your “stuff”? Talk Back.

Advertisement