April 2002 Archives

Sam Ruby

AddThis Social Bookmark Button

If you listen to the tale Paul
Prescod
tells, Google had in its possession a vastly superior API
which it chose to forsake, relegating it to a lifetime locked away in an ivory tower, waiting
for a prince to rescue it.  Meanwhile, a much hyped evil and
ugly stepsister has been paraded around town.  At this point, we have
all the essential elements of a fairy tale.

A fairy tail indeed.


I have seen no evidence that Google has behaved like an evil
stepmother.  To the contrary, the one thing that has consistently
shown through is that Google has taken a low key, pragmatic, and
essentially hype-free approach to all things technical.  This instance has been no
exception.


What was the seminal event that sparked countless people’s imagination,
inspired more than two dozen implementations and 10,000 developers sign up
in the
first week alone
?  No, this isn’t the result of coercion on the
part of Google, or substantial financial incentives to the participants, or even a sustained marketing campaign. 
Instead, it is the result of a relatively quiet post
to a mailing list for an excellent but otherwise relatively obscure (in the
US at least) programming language.


Google’s Genius?  To pick a wire format for which there are dozens
of toolkits poised to directly translate the protocol into readily
consumable bits.  To directly test
interop against a small but diverse set of platforms.  To
provide early access to an undisclosed number of other interested
parties.  To provide a sample that runs on wide range of operating
systems and instruction set architectures. To document the wire protocol
adequately, including all the optional type annotations.  And to
provide sufficient metadata, in the form of WSDL, so that a large number of
developers can be instantly up and running.

In short, they did their homework.

In return, Google was likened to the wizard Saruman, a benign and
powerful force inexplicably turned from the path of virtue
.

HTTP GET 
width="11" border="0">

At about the same time as Paul’s article, Simon St. Laurent posted a
series of articles that suggest that SOAP is unclean
and unRESTful
The key difference between the Google
approach and the Amazon
approach which he apparently likes better?  The use of HTTP Get. 
This is also a central theme of Paul’s writings too.


I’ve taken some time the last few days to read up on the topic of
REST.  This term was coined by Apache Software Foundation’s Chairman
Roy T. Fielding in his PhD.
dissertation
.  Suffice it to say that this paper has been both
very influential and deeply misunderstood and misrepresented.  Here
are three related quotes from Roy Fielding himself:

In fact, there are a number of tradeoffs between GET and POST.  Roy
mentions size.  Safety
is another consideration.  I’ll add a third: security.  I
understand and appreciate that Amazon and Google have only chosen to employ
a rather light weight security mechanism at the present time for their free
services.  Placing an
associate ID or key in the payload is like placing a key under the
doormat.  Given the way URLs are tracked and cached, placing it in the URL is like taping it to the door.  As I have stated
before, if
and when Google ever decides to commercialize this particular service, I’d like to
suggest that they consider alternatives such as X.509 certificates,
Kerberos tickets, or security tokens from mobile devices. 


There are other tradeoffs as well.  Paul points out that using HTTP
GET enables one to participate in XInclude.  This is a valid
consideration for static queries.  For more ad hoc queries, a facility
like the IO
JSP Taglib
or the SOAP
Cocoon Taglib
may be more appropriate.


One thing I like about Roy is that he is rather direct.  His
opinion on most CGI programs is rather clear and succinct: Most
CGI scripts, in fact, provide interfaces to applications that suck

Take a CGI implemented using HTTP POST and convert it to using HTTP GET and
Roy’s opinion will not change.  Take this same design and convert it to SOAP, and
Roy’s reaction is again very predictable.


So the question as to whether or not a given interface meets the
criteria of REST does not rely on the protocol syntax.  It relies on
the nature of the interaction, and in particular how state is represented
and transferred.  As a general rule, pure query interfaces with no
side effects meet this criteria.  Even if they use HTTP POST.

GoogleML 
width="11" border="0">

Moving beyond HTTP GET, we look for other areas of disagreement. 
As Paul has made it clear, he is *PRO-WSDL*
It apparently is also not the SOAP encoding that Paul disagrees with, as Paul states "My opposition is
to the SOAP-RPC protocol, not the SOAP
encoding
.".
  He
mentioned SOAPAction in passing, something that I have verified is not
significant in this service.  Futhermore, SOAPAction promises to become optional
in upcoming versions of the SOAP specification.  Paul mentions optional
arguments, something that has long been a part of the SOAP
specification
.  Both Apache Axis and Microsoft ASP.Net support optional parameters.


Much of the simplification in Paul’s examples comes from omitting type
specifications.  I agree that including types in this message is entirely
unnecessary.  I have verified
that the Google API in no way requires such annotation.  It is hard to
say whether "most" SOAP toolkits will inline the types into
the message - the Apache ones currently default to sending such
information on the theory that it is readily available and might be useful.
In Axis, this can be easily overridden.  The default for Microsoft’s
ASP.Net  is to not send such information.  And the comment that
"I could just as easily have left them in" leads me to
believe that this isn’t a crucial issue either.


What’s left?  I guess there is the envelope.  We certainly
could discuss this, but somehow this issue does not quite seem to rise to
the level of Paul’s call to arms for "like-minded Hobbits, Dwarves,
Elves and men and go on a quest to educate the world about the limitations
of SOAP-RPC interfaces
".


Perhaps the most illumining part of Paul’s essay is when he describes
his optimized doSpellingSuggestion API.  In this case, he declares
that XML is overkill for the job.  Unquestionably, omitting XML in some cases creates a tighter data
stream.  It can also require custom marshallers
and parsers to be written.  More tradeoffs to consider.

An Analogy 
width="11" border="0">

It is impossible to escape the fact that there is
much active hostility directed at the SOAP protocol from within the
REST community.  I’ve been giving this
some deep thought lately, and I finally came up with an analogy that might
explain this situation.


Few Object Oriented Programming advocates would list Perl among their
top choices in a programming language.  No one will deny that it is
possible to write OO code in Perl.  In fact, there clearly are features
in the language designed to support objects.  But does Perl require
you to write in an OO style?  Well, no.  Does it even guide you
in that direction?  Again, no, not particularly.  In fact, the
Perl motto is TMTOWTDI.


But it goes deeper than that.  Few, if any, of the beginners samples
on how to use Perl start from an object orientation.  This leads many
towards programming practices which some find inappropriate.  One
might also note that a significant fraction of the CGI programs that Roy
declares as sucky are, in fact, written in Perl.


One could make a similar case against SOAP.

Conclusion 
width="11" border="0">

I will readily agree that the architecture, analysis and design of any
complex distributed system need to focus on the concept of state in
general, and on its representation and transfer in particular.  Once
that work is complete, there remain a large number of implementation
tradeoffs that need to be made.  Some of these deal with ease of use
and the rate of adoption.  Adopting a canonical means to represent
such information may have a positive influence on such important secondary
characteristics of one’s implementation.

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.w3.org/TR/

Apparently in a rush before meetings at the Eleventh International WWW Conference, the W3C has released eighteen documents in two days.

Topics covered include:

  • SVG - two Candidate Recommendations, plus a XHTML+MathML+SVG Profile working draft
  • XSLT 2.0 and XPath 2.0 working drafts
  • XQuery - four working drafts
  • RDF - four working drafts
  • Character Model for the World Wide Web - Last Call Working Draft
  • Web Content Accessibility Guidelines - Requirements working draft
  • Web Services - two Requirements working drafts
  • XHTML Media Types - a Note

Do you follow W3C publications, or wait until the projects are final?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://lists.w3.org/Archives/Public/www-tag/2002Apr/0235.html

After Anne Thomas Manes described the W3C as “at heart, an academic organization” and insisted that “the W3C TAG must be willing to accomodate the requirements of
big business”, Roy Fielding reminded Manes that:

the W3C
was created by big businesses specifically to prevent their own marketing departments from destroying the value inherent in the Web through their own, and their competitors’, short-sighted, quarterly-revenue-driven
pursuit of profits. It was not created by academics. Open source developers actively opposed the creation of a pay-to-play consortium. The only reason it is at MIT is because that’s what was needed to attract the people with a clue to an underpaid job.

After noting that:

The Web creates more business value, every day, than has been generated
by every single example of an RPC-like interface in the entire history
of computers.

Fielding concludes:

If this thing is going to be called Web Services, then I insist that it actually have something to do with the Web. If not, I’d rather have the WS-I group responsible for abusing the marketplace with yet another CORBA/DCOM than have the W3C waste its effort pandering to
the whims of marketing consultants. I am not here to accommodate the requirements of mass hysteria.

As one of those regularly troubled that the keeper of the Web is in fact a pay-to-play consortium, it’s nice to hear that some people at that consortium value the Web itself rather than “big business”, “market consultants”, or “mass hysteria.”

Should big business rule the Web?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://lists.xml.org/archives/xml-dev/200204/threads.html#00648

While some developers seem delighted that Google has provided a SOAP API, a number of developers are questioning whether SOAP provides any value beyond hype to the work being done, and suggest that maybe a plain old XML-over-HTTP approach would be more appropriate and just as useful.

Matt Sergeant’s journal entry also questions whether the new API is an improvement on the prior XML interface. He’ll be talking about this at the O’Reilly Open Source Conference as well.

Does a dash of SOAP make Google’s information tastier?

Sam Ruby

AddThis Social Bookmark Button

Update to IBM patents threaten ebXML via CNet: IBM patent plan royalty-free

Simon St. Laurent

AddThis Social Bookmark Button

As IBM and Microsoft grow ever cozier, posting specifications with unpleasant intellectual property ramifications, and setting up organizations that seem quite likely intent on building “Reasonable and Non-Discriminatory” (RAND) tollbooths, maybe it’s time for the rest of us to appreciate what HTTP and XML have given us, and take a REST.

Covering Web Services and the intellectual property territory they represent is growing pretty terrifying. At first I thought David Berlind’s suggestions that “IBM and Microsoft plan Net takeover” were fairly ridiculous, but IBM’s apparent patent bombing of ebXML (which IBM now claims to have retracted) certainly makes it more plausible, as do the intellectual property policies of the Web Services Interoperability Organization and UDDI.org. I used to worry about the W3C, but vendors seem to be heading for more controllable consortia as the W3C focuses on royalty-free specifications.

While SOAP itself seems, for the moment, to be unencumbered, WSDL, UDDI, various security add-ons, and possibly alternate transport mechanisms are all looking questionable.

REST, even apart from the ongoing debates over its merits, is pretty much built on HTTP and HTTP principles, with possibly a dose of XML. There isn’t a whole lot in REST (or in XML) that is strikingly new, and hopefully there isn’t anything there that is, well, patentable.

However you feel about the technical merits of Web Services or REST, it seems pretty clear that REST is a critical option that offers developers an apparently safe harbor from intellectual property claims. REST may let us get our work done, whatever the outcome of various patent claims, and may also give patent-holders a reason to ponder a more generous approach to their intellectual property.

Do patents on various aspects of Web Services affect your decisions about which technologies to use?

Edd Dumbill

AddThis Social Bookmark Button

The frenzy over Google’s new SOAP API is just plain silly. Today I was sent details by a proud PR representative (I’ll not mention them, but you’ll likely hear from them yourselves) of his company’s Google-over-email service, using the SOAP interface. What a waste of space for something that can be done in one line of shell script! Here’s how…

Grab your local Linux/BSD box of preference, put this bit of script into a file, say /usr/local/bin/google.sh, and make it executable.

#!/bin/sh
/bin/cat >/tmp/msg$$ && (/usr/bin/formail -b -t -I 'Content-Type: text/html; charset=ISO-8859-1' -r </tmp/msg$$ && QUERY=`/bin/grep Subject /tmp/msg$$ | /bin/sed -e 's/Subject: *//;'` && /bin/rm /tmp/msg$$ && /usr/bin/lynx -source "http://www.google.com/search?num=100&q=$QUERY") | /usr/sbin/sendmail -t -F 'Google mail server' -f google@example.com

Then, edit your /etc/aliases file, and add in something like:

google: "|/usr/local/bin/google.sh"

Run newaliases and you’re all set. Send email to the google user on your host, and put your query string in the subject field. You’ll get back a nice HTML email with up to 100 results from Google.

It’s more or less a one-liner. (You’ll need lynx and formail, which comes with procmail, installed.)

Please don’t come running to me with SOAP demos until they do something useful.

Think this curmudgeon’s got it wrong? Feel free to teach me a trick or two.

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://xmlhack.com/read.php?item=1615

While Google’s generated a lot of buzz with developers about its SOAP interface, Amazon’s offering a more REST-like approach through its Associates program.

Amazon’s use of XML and HTTP is much simpler than the Google SOAP approach - download the Google developer package and visit the Amazon explanation (Associates only, alas) to see the difference.

Does it matter at all that Amazon’s XML is much cleaner when Google’s XML is easily plugged into Web Services frameworks?

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://zdnet.com.com/2102-1106-884681.html

In an article titled “IBM drops patent bombshell“, ZDNet writer David Berlind notes the appearance of IBM patents on the previously unencumbered ebXML specification - a project of OASIS and the United Nations. Responses to the announcement don’t seem particularly happy.

(IBM later claimed to retract this, but see the followup story for more.)

Aren’t patents a wonderful way to encourage innovation across developer communities?

Sam Ruby

AddThis Social Bookmark Button

To explore that question, first one must understand what the Google API
provides.  In order to use the new Google API, you need to create a Google
account
in order to obtain a license key.  That license key must be
passed on every Google API call.  This key is passed in the clear. 
What can happen if somebody captures this key?  Well, I guess they can
issue queries.  Perhaps they can even deny you service for the rest of the
day.  If they do so, this likely will be noticed, and if repeated, they are
in danger of being caught.  To many, the risks are acceptable, and a simple
key is an adequate solution.

All this time, your usage is limited to the Terms
and Conditions
.  What they essentially say is experiment, play, and
develop; but if you want to create a commercial service, you need to get prior written
consent from Google
.  Fair enough.

Now lets assume that somebody pursues this.  At this point, the game
changes.  Neither party will be particularly amused if the key that is used
for other purposes by a third party.  

There are alternatives that may be more appropriate for commercial use than a
simple user key.  Alternatives such as X.509 certificates, Kerberos
tickets, and security tokens from mobile devices.  It may even be
appropriate to have these tokens cryptographically signed by a third party.

Announced
yesterday was the WS-Security
specification
The stated goal of WS-Security is to enable applications to construct secure SOAP message exchanges
It does this by providing three main mechanisms: security token propagation, message integrity, and message confidentiality.

Arguably
the most important sentence is the document is as follows:

This document supercedes existing web services security specifications from IBM and Microsoft including SOAP-SEC; Microsoft’s WS-Security and WS-License; and IBM’s security token and encryption documents.

By standardizing on a common vocabulary, we can achieve
interoperability.  The result is a loosely-coupled,
language-neutral, platform-independent way of linking applications. 
Securely.

Simon St. Laurent

AddThis Social Bookmark Button

Related link: http://www.infoworld.com/articles/hn/xml/02/04/10/020410hnctohailstorm.xml

Microsoft’s chief software architect for Hailstorm offered some observations on XML and Web Services recently. Looking beyond the (now ended) Hailstorm project, Mark Lucovsky described a lot of issues in XML development more generally.

Despite the frequent description of Web Services as a mechanism for building distributed object systems, Lucovsky suggests that XML itself is important:

He also encouraged attendees to consider when building Web services whether to program to an object-centric or XML-centric model. One pitfall to look out for in the object-centric path is how to obtain extensibility when you are programming at a high level, he said.

Building on this, he notes that Microsoft had difficulties adapting to the XML world:

The primary skill a workforce needs is a deep understanding of XML, particularly name spaces and versioning.

“The big thing we had to deal with [at Microsoft] was learning and embracing XML,” Lucovsky said. “XML is more difficult than people think.”

For those of who’ve argued long and hard that XML isn’t merely serialized objects, this is good to hear. While most of us wish that XML was a little less difficult than it has grown to be, information modeling isn’t and really can’t be a magical process. Objects and XML (not to mention relational databases) have some features in common and many features apart.

Taking advantage of XML does require some effort, but I think developers are finding the effort worth the trouble - extensibility is powerful stuff.

Why can’t XML just be like familiar friendly objects and databases?

Sam Ruby

AddThis Social Bookmark Button

Related link: http://radio.weblogs.com/0101679/stories/2002/04/05/neurotransmitters.html

The REST vs. RPC debate rages on. I’d like to inject into this argument an alternative perspective, one that is data centric and multicast. Just like RSS and weblogs.

Edd Dumbill

AddThis Social Bookmark Button

I get far too much email to deal with, so I came up with a procmail hack that sends my “personal FAQ” to new correspondents — ensuring they receive it only once.

The first version of this hack is designed to work with the exim mail agent.

SENDER=$1
:0 Whc: autorespond-xml.lock
* ^To.*edd@xml.com
* !^FROM_DAEMON
* !^FROM_MAILER
* !^X-Loop: edd@xml.com
* !^X-MailScanner-SpamCheck.*
* ? test -f $HOME/.autorespond-xml
* !? grep -qi "$SENDER" $HOME/.autorespond-xml.cache
| (formail -rI"Precedence: junk" 
    -A"X-Loop: edd@xml.com" 
    -I"From: Edd Dumbill <edd@xml.com>"; 
    cat $HOME/.autorespond-xml 
  ) | $SENDMAIL -oi -t && 
      echo $SENDER >>$HOME/.autorespond-xml.cache

In the procmail_pipe entry of exim, ensure the command
looks like this (all one line):

command = "/usr/bin/procmail -a ${sender_address} -d
${local_part}"

I also did another version of this that doesn’t require
altering the
exim config, but it does abuse formail in strange and
interesting ways.

:0 Whc: autorespond-xml.lock
* ^To.*edd@xml.com
* !^FROM_DAEMON
* !^FROM_MAILER
* !^X-Loop: edd@xml.com
* !^X-MailScanner-SpamCheck.*
| formail -rD 200000 $HOME/.autorespond-xml.cache 
    -I"Message-ID: $1"

  :0 ehc      # if sender not in the cache
  | (formail -rI"Precedence: junk" 
    -A"X-Loop: edd@xml.com" 
    -I"From: Edd Dumbill <edd@xml.com>"; 
    cat $HOME/.autorespond-xml 
    ) | $SENDMAIL -oi -t

I adapted one of the recipes from procmailex (5).
The version
requiring the exim change seems cleaner to me, as it doesn’t
abuse the
semantics of formail quite so badly. But I’m a novice at
this game.

Anyone got a better solution to the problem? I’d be glad to hear.