by

I usually speak extemporaneously rather than reading a prepared speech, but when I'm giving a new talk for the first time, I'll often write out a version of it in advance, to get my thoughts in order. This is the talk I prepared for my keynote at JavaOne on June 8, 2000. The streaming video of the actual talk will also be available, and eventually a transcript as well, so when that happens, you'll be able to compare what I meant to say, and how it actually turned out.

Replay the Webcast of the Keynote from JavaOne
(See Sun's keynote summary page for more details)

GTS Video
(Java technology: no plug-in required)
Netscape 4.7 required for the Mac.

Yahoo! Broadcast
(plug-in required)
28.8K  56.6K  100K

AltaVista Tech Live
(plug-in required)
56K  100K

I have a feeling I was invited here because I'm the person associated with the open source movement who is most likely to say nice things about Sun. So why don't I get that out of the way right up front.

It looks like more people paid a lot of money to get into this room than went to any talk at LinuxWorld for free, so you guys must be doing something right!

But seriously, I'm not here to talk about the differences between Sun's Java license and various open source licenses, but to talk about Sun's slogan, The Network is the Computer, and the way that it is wrapped up in both the history and the future of open source.

Preamble: Ecology and Architecture

Before I start, though, I want to introduce a few concepts that give a foundation for thinking about the spread of technologies as well. The first of these concepts come from ecology.

You may think that I want to bring in ecology because O'Reilly's books all have animals on the cover, but it really does go deeper than that!

First off, ecology teaches us that it takes a web of cooperating species to create a truly rich environment. Each of us depends on thousands, if not millions, of other organisms, each pursuing its own selfish goals, yet somehow weaving a cooperative web that, for the most part, benefits all. I believe that open source has many parallels to a functioning ecology. Each developer builds for his own use, and that of his friends, but also makes it easy for collateral benefits to accrue to others he or she doesn't know. And the open source developer contributes even his failures back into the environment, to enrich the soil from which other innovations can grow.

Second, ecology teaches us that a rich environment takes time to evolve. One species prepares the ground for another. For example, lichens and mosses break down rock, creating soil that can support more complex plants. Ecological succession takes time. But as the substrate laid down by simple organisms grows richer, the possibilities for complexity increase.

(Those of you who are science fiction fans will find a wonderful depiction of this process in Kim Stanley Robinson's trilogy about the terraforming of Mars, Red Mars, Green Mars, and Blue Mars. These books really got me thinking about the way ecologies evolve and change, and the way that, in the open source world, one project makes the next one more possible.)

But there's an interesting twist. I read recently that the recovery of the blasted land around Mount St. Helens in Washington shows the role of chance -- the species that somehow survived the volcanic eruption -- in just how an ecosystem evolves. Those random plants and animals that survived had a chance to shape the rebirth of the entire ecosystem. But you don't need to be starting an ecosystem from scratch to see this effect. Anyone who has a garden finds constant "volunteers."

So there are three themes here: cooperation, evolution, and surprise.

One of the things that I like best about open source software is that it so clearly demonstrates the power of chance and unintended cooperation in helping the computer industry to evolve. I'll try to highlight that idea as I go forward.

Another set of core concepts that I want to share with you come from Lawrence Lessig's remarkable book Code and other Laws of Cyberspace. Lessig is a constitutional lawyer, and the principal focus of his book is on the way that attempts at government regulation need to take into account the changing architecture of cyberspace -- and the way that cyberspace is changing the architecture of our society as a whole. I don't have time to discuss Lessig's points in detail -- I highly recommend that you read his book for that -- but what I do want to share with you is the way he led me to think about the nature of the computer system and network architecture that has supported the open source movement. What I discovered when I thought about it provides much of the substance of this talk.

With that preamble, let me turn to my theme: The network really is the computer. Many of the things I'm going to mention are equivalent to the mosses and lichens I talked about a moment ago. What excites me so much, though, is the future they hint at.

Open source as an outgrowth of wide area networking

Most of you know that I'm a big champion of open source technologies, things like Apache, Linux and Perl. But my emphasis in talking about open source has never been on the details of licenses, but on open source as a foundation and expression of the Internet.

The key point about having source was that you could see how other people did things. This radically lowered the barriers to learning, and as learning by example means you don't have to spend your energy reinventing the wheel, imitation soon sparked innovation.

By now, it's a truism that the Internet runs on open source. Bind, the Berkeley Internet Name Daemon, is the single most mission critical program on the Internet, followed closely by Sendmail and Apache, open source servers for two of the Internet's most widely used application protocols, SMTP and HTTP.

I've worked hard to get this concept across, to make sure that people realize that open source means more than Linux.

But the relationship between open source and the Internet goes deeper than the programs that implement and support important Internet standards. I believe that the open source movement as a whole is rooted in the spread of wide area networking.

I look back at the early days of UNIX at Bell Labs. The license under which UNIX was distributed was liberal, but not open source by today's definition. There were two things that were important:

1. The fundamental architecture of UNIX supported work by independent developers. Rather than building a single monolithic system, Thompson and Ritchie developed some simple system services, and a powerful concept for how programs could cooperate. As explained so elegantly by Kernighan and Pike in The UNIX Programming Environment, programs were expected to read a stream of ascii text as standard input, and write a stream of ascii text as standard output. As a result, simple programs could be connected in a pipeline, like legos or tinker toys, to accomplish more complex tasks.

This same principle is evident in the development of the Internet. Open standards tell you what you need to write and what you need to read in order to be able to cooperate with another program. What you do internally is up to you.

This is fundamentally a loosely coupled architecture that lowers the barriers to entry to participation in the market, or if you like, in the ecosystem. Anyone can write a program, for his or own purposes, for his or her own small niche, that nonetheless magically becomes a part of the entire system. Ecology in action.

2. Source code was available for inspection.

Much has been made of the merits of various licenses. While it is certainly true that one license or another may meet the needs and values of certain groups of developers and their users better than another, it seems to me that the mere fact of source availability, under any license, so far outweighs the differences between licenses that we'd be better off spending less time arguing, and more time building on each other's work.

That early AT&T license was not open source, and it contained terms that later allowed AT&T to foolishly close down the party, but it was the source availability itself that led to the explosion of creativity behind the first major cooperatively developed operating system.

The key point about having source was that you could see how other people did things. This radically lowered the barriers to learning, and because learning by example means you don't have to spend your energy reinventing the wheel, imitation soon sparked innovation.

We saw a similar explosion of creativity in the early days of the web. Tim Berners-Lee's original web implementation was not just open source, it was public domain. However, NCSA's web server and Mosaic browser were not technically open source. Still, source was freely available, and that allowed Apache to rise like a phoenix from the ashes when the NCSA server was abandoned once its developers went to work on proprietary alternatives.

But even more significantly, the "View Source" menu item migrated from Tim's original browser, to Mosaic, and then on to Netscape Navigator and MSIE. Though no one thinks of HTML as an open source technology (because of the fixation on licensing), it's been absolutely key to the explosive spread of the web. Barriers to entry for "amateurs" were low, because anyone could look "over the shoulder" of anyone else producing a web page. Dynamic content created with interpreted languages continued the trend towards transparency.

But let me go back to the history of wide area networking and the spread of open source.

In the early days, source was sent out on tape, but where things really started to take off was when source could be distributed over the net.

Usenet, that vast distributed bulletin board, was one of the first great success stories of voluntary, distributed collaboration that also characterizes open source. You "signed up" for usenet by finding a neighbor willing to give you a mail and news feed. This was a true collaborative network, where mail and news were relayed from one cooperating site to another, often taking days to travel from one end of the net to another. Hub sites formed an ad-hoc backbone, but everything was voluntary.

The uucpnet and usenet were used for email (the first killer app), but also for software distribution and collaborative tech support. Usenet newsgroups like comp.sources.unix took UNIX beyond the major centers of development at Bell Labs and UC Berkeley, and made it possible for widely distributed individuals and institutions to participate. Programs like Larry Wall's patch, which made it possible to distribute modifications rather than complete source files, compensated for relatively low bandwidth connections to make source even more widely available.

Once the Internet's higher speed connections became widely available, the culture of cooperation was already firmly in place. The mechanisms that the early developers used to spread and support their work became the basis for a cultural phenomenon that reached far beyond the tech sector. The heart of that phenomenon was the use of wide-area networking technology to connect people around interests, rather than through geographical location or company affiliation. This was the beginning of a massive cultural shift that we're still seeing today.

Just as the spread of literacy in the late middle ages disenfranchised old power structures and led to the flowering of the Renaissance, it's been the ability of individuals to share knowledge outside the normal channels that has led to our current explosion of innovation.

This power of wide area networking to connect widely separated individuals is key to the success of open source. It's what allows a program written, "to scratch your own itch", as Sendmail creator Eric Allman put it at the first open source summit, can easily find others with the same itch. No dollars need be spent on assembling a team, no dollars need to be spent on marketing. Just release the product for free, source included, and let other like-minded people with the same problem see if it gives them a leg up.

A final note: while the open source community doesn't generally claim the IETF as its own, the Internet standards process has a great many similarities with an open source software project. The only substantial difference is that the IETF's output is a standards document rather than a code module. Anyone can participate, simply by joining a mailing list and having something to say, or by showing up to one of the three annual face to face meetings. Standards are decided on by participating individuals, irrespective of their company affiliations. (Though commercial participation is welcomed and encouraged, companies, like individuals, need to compete on the basis of their ideas and implementations, not their money or disproportional representation.) The IETF is where open source and open standards meet.

I'd like to argue that open source is the "natural language" of a networked community, that the growth of the Internet and the growth of open source are interconnected by more than happenstance. As individuals found ways to communicate through highly leveraged network channels, they were able to share information at a new pace and a new level. Just as the spread of literacy in the late middle ages disenfranchised old power structures and led to the flowering of the renaissance, it's been the ability of individuals to share knowledge outside the normal channels that has led to our current explosion of innovation. Just as ease of travel helped new ideas to spread, wide area networking has allowed ideas to spread and take root in new ways. Open source is ultimately about communication.

This is one reason behind one of O'Reilly's new open source ventures, the company Collab.Net, which we founded with Brian Behlendorf of the Apache project. Unlike many other OSS projects, Apache wasn't founded by a single visionary developer but by a group of users who'd been abandoned by their original "vendor" (NCSA) and who agreed to work together to maintain a tool they depended on. (In fact, the name "Apache" actually comes from this fact--the developers started out simply by agreeing to share their patches to the NCSA server, and so they called what they did "a patchy server".)

Apache gives us lessons about intentional wide-area collaborative software development that can be applied even by companies that haven't fully embraced open source licensing practices. For example, it is possible to apply open source collaborative principles inside a large company, even without the intention to release the resulting software to the outside world. More importantly, though, Collab.Net is teaching companies that it's not enough to slap an open source license on a piece of software; you need to build community and collaborative development processes around it as well.

Because this is, after all, JavaOne, it's probably appropriate to mention in this context that Collab.Net and Sun are working together on the open source release of Sun's NetBeans project, which provides an extensible Java-based IDE. In fact, I'm going to let Brian Behlendorf, who is one of the founders of both the Apache project and Collab.Net, and Roman Stanek, the founder of NetBeans, give a brief demo of Collab.Net's work on NetBeans.

At this point Brian gives a demo of the netbeans.org site, pointing out that:
  • the site is separate from Sun, because it's important for anyone to feel they can participate
  • the site gives access to CVS source trees, bug reports, and mailing lists via the web
Roman points out that Sun chose to open source NetBeans because they want developers to have the freedom to extend it, and that they chose the Mozilla licensing model, with a license based on the MPL.

If you believe me that open source is about Internet-enabled collaboration, rather than just about a particular style of software license, you'll open a much larger tent. You'll see the threads that tie together not just traditional open source projects, but also collaborative "computing grid" projects like SetiAtHome, user reviews on Amazon.com, technologies like collaborative filtering, new ideas about marketing such as those expressed in The Cluetrain Manifesto, weblogs, and the way that Internet message boards can now move the stock market. What started out as a software development methodology is increasingly becoming a facet of every field, as network enabled conversations become a principal carrier of new ideas.

In some ways, you can say that what the Internet is enabling is not just networking of computers, but networking of people, with all that implies. As the network becomes more ubiquitous, it becomes clearer and clearer that who it connects is as important as what it connects.

The Web as a Network Services Architecture

There's another implication in what Brian and Roman just showed you that I want to bring to your attention. Like many other applications in the age of the Internet, open source project management is turning into a hosted application.

"Don't think of the Web as a client-server system that simply delivers web pages to web servers. Think of it as a distributed services architecture, with the URL as a first generation "API" for calling those services."
- John Udell

This is where Lessig's point that we need to think about changes in architecture really starts to hit home.

Think for a minute about the most useful new computer applications of the last few years. Few of them are actually applications that you install on your local PC. Instead, they are delivered through the window of your browser: ecommerce applications like amazon.com, eBay or e*trade, or useful information applications like mapquest.

There are enormous implications in this shift for the open source and free software community. For example, I've tried to get Richard Stallman to realize that the GPL loses its teeth in a world where developers no longer need to distribute software in order for users to make use of it. A hosted web application could be built entirely with GPL'd software, and yet have no requirement that the source code be released, since the application itself is never distributed, and distribution is what triggers the GPL's source code availability clause.

But I don't want to spend time here on the implications for licensing. Instead, I want to talk about the implications for that marvelous aspect of the fundamental UNIX design: the pipe, and its ability to connect small independent programs so that they could collectively perform functions beyond the capability of any of them alone.

What is the equivalent of the pipe in the age of the web?

For an answer, I'm going to quote from Jon Udell, the former Byte magazine editor and author of the book Practical Internet Groupware. Jon is one of the most prescient technology observers I know. Several years ago, he turned me on to a concept that has more implications than I can count. This is one of the REALLY BIG IDEAS that is going to shape the next five or ten years of computing.

Jon starts with a simple premise. Don't think of the web as a client-server system that simply delivers web pages to web servers. Think of it as a distributed services architecture, with the URL as a first generation "API" for calling those services.

As Jon said in his keynote for the Zope track at the recent Python conference:

"To a remarkable degree, today's Web already is a vast collection of network services. So far, these services are mainly browser-oriented. My browser "calls" a service on Yahoo to receive a page of a directory. Or it "calls" a service on AltaVista to receive a page of search results.

"One of the nicest things about the Web, however, is that browsers aren't the only things that can call on the services offered by websites. Programs written in any URL-aware language -- including Python, Perl, JavaScript, and Java -- can "call" these Web services too. To these programs, the Web looks like a library of callable components. What's more, it's very easy to build new Web services out of these existing components, by combining them in novel ways. I think of this as the Web's analog to the UNIX pipeline."

Jon goes on to describe one such program he wrote, a program he calls the web mindshare calculator. He knows that Altavista provides a links keyword that allows you to return the number of sites that link to a given URL. He knows that Yahoo provides a categorization of related sites. So he wrote a simple perl program that, given a starting point in the Yahoo tree, traverses the tree to its bottom, and feeds the resulting list of URLs, one by one, to Altavista with the links keyword. The output is a list, in descending order, of the most linked-to sites in any given Yahoo category.

As he says, this is the Web's analog to the UNIX pipeline. It allows you to use two websites in a way that their creators didn't quite intend, but which extends them and makes them more useful.

This idea of creating additional "unauthorized" interfaces to various web sites is fairly widespread, especially among the advanced users who make up O'Reilly's target audience.

For those of you who don't know RSS, it's an XML-based rendering of stories that a web site wants to make available for syndication.

We created one such program in our market research group, a program that gives our editors and product managers faster access to competitive information from Amazon. We call this program amarank, since it returns a list of books, sorted by amazon rank and number of positive reviews, for some set of search criteria. Every publisher, and any bookstore or library worth its salt, uses Amazon for bibliographic research. But what a waste of time to click around in a browser! Amarank lets us ask questions like: "Show me the top 30 books on Java at Amazon" or "how do books on Perl stack up against books on Java" or "show me all the books on Jini published since January."

Because we don't want the user to wait while amarank crawls Amazon's site, and we want the data in a form that can be subjected to further processing, the output is sent to a mail program, and ultimately, a spreadsheet such as Excel or Gnumeric.

Some other examples:

  • The Finance-QuoteHist Perl module fetches historical quotes from such sources as The Motley Fool and FinancialWeb.
  • YahooChart is a perl module that grabs stock charts from Yahoo! Finance.
  • DeepLeap is a new online assistant that, among its many talents, provides a quick search through Google and local movie information via Yahoo! Movies. It also can be used to send information from any web page to your Palm Pilot, reformat a page for your printer, or be notified of page changes via instant messenger, email or pager.

Now, many of these programs are written in Perl, because it's so good at dealing with the barely-structured HTML text that is returned as the data from a web query.

Jon's contention, and mine, is that this whole area is about to explode, as XML takes off, giving more structure to data, and as web sites realize that they can benefit from exposing a more explicit API for access by other web programs.

In what Jon calls the second generation object web, web sites will provide explicit APIs for interaction with other programs.

Jon goes on:

"Last year, Dun and Bradstreet reengineered their whole online business around the idea of wrapping up back-end data sources and middle-tier components in XML interfaces that are accessible via HTTP or HTTPS. Traditionally, D&B customers bought packaged reports, not raw data, and the creation of a customized report was a slow and painful process. The new scheme turned the old one upside down. It defined an inventory of data services, and empowered developers to transact against those services using an XML-over-HTTP protocol. In their case, the mechanism was based on the webMethods' B2B server, but conceptually that's not too different from XML-RPC. Prior to last year, developers who needed custom D&B feeds had to ask D&B to create them, which took forever, and then had to maintain private network links to D&B and speak a proprietary protocol over those links. In the new scheme, D&B publishes a catalog listing a set of data products. A developer fetches the catalog over the Web, using any XML-oriented and HTTP-aware toolkit, and writes a little glue code to integrate that feed into an application."

We've put these ideas to work in a new tool we've built at O'Reilly, Rael Dornfest's RSS aggregator, which he calls Meerkat.

For those of you who don't know RSS, it's an XML-based rendering of stories that a web site wants to make available for syndication. These stories are composed of a title, a link (back to your site), and an optional description or blurb. Anyone who wants to can come along and grab these stories for incorporation into their own sites -- with links back to the full stories on the originating site.

Meerkat takes this a few steps further. Rather than just using RSS to load individual stories into various web pages, Rael built a tool that helps our editorial team manage the flow of syndicated content. Meerkat's back-end searches the net for RSS files representing technology/computer/geek/science-related content. These files are stored in a MySQL database, with an editorial interface organized along category/channel/chronological lines and sporting the power of regular expression searches. This interface allows editors to select stories along to dispatch stories to individual target publications. Rael has also made a public version of this interface available as a general purpose RSS reader.

Rael found that people were doing to Meerkat what we do with Amazon, taking his data and doing things with it beyond what he intended, and so he made it easier for them. He created a simple API consisting of a set of standard "flavors" that cause data to be returned in formats acceptable to various other programs, from PHP serialized data and JavaScript to Apple's Sherlock, as well as additional URL "keywords" that allow the requestor to control the format and content of the returned data.

I'm not going to go into the details of Rael's API. What I want to call out is how important what he did was from both an architectural point of view, and from the point of view of building a web site that is consonant with the collaborative computing culturer. More important than the fact that he built the site with open source tools like mySql and PHP is that he engineered Meerkat not only so it could meet his objectives, but also so that it could be used in ways he did not know or intend.

What is going to make this whole area explode is the emergence of standards for XML based metadata exchange and service discovery. Web-based services will publish their APIs as XML, and will pass data back and forth as XML, over existing Internet protocols such as HTTP and SMTP. This will help to recreate the loosely coupled architecture that I suggested earlier was so important to the original flowering of open source culture around UNIX.

Once again quoting Jon Udell:

"As Web programmers, we're all in the game of creating -- and using -- network services. A Web server running CGI scripts makes a pretty shabby ORB, but compared to what was available before 1994, it's a wonderful thing. It will get a lot more wonderful with some fairly basic improvements. AltaVista, and Yahoo, and every other site that offers services to the Web ought to be implementing those services, internally, using XML interfaces. When the client is a browser, the XML can be rendered into HTML. When the client is a program, the XML can be delivered directly to that program."

Of course, there are an awful lot of web sites out there that will need to be retrofitted so that they can be used more cooperatively.

In that regard, I saw a very interesting startup here on the JavaOne show floor yesterday, an Irish company called Cape Clear. If I understand it right, they do introspection on JavaBeans or Corba components, and express the resulting interfaces as XML, which thus makes JavaBeans and Corba accessible to XML transport protocols like XML-RPC and SOAP.

I was very excited to see that Sun just yesterday threw its support behind SOAP. IBM also just released Java based SOAP support, and what's more, contributed the code to the XML-Apache project. XML is busting out all over, and that's a good thing!

Automated Service Discovery

The final thing I'm going to talk about is wrapped up in the question:

What do ICQ, Jini, and Napster all have in common?

To me, the essence of all of these systems is an approach to service discovery that fosters informal, peer to peer networking, rather than hierarchical client-server computing. Napster is a kind of "instant messaging" where the question isn't "are you online to chat?" but "do you have Metallica's latest?"

We have to allow for false starts, for early ideas that don't quite work but pave the way for others that do, and for technologies that are designed in such a way that other people can surprise us by the way that they use them.

The role of a central server, if any, is not to return the data, but just a pointer to who has it. (And need I say it? That's an example of metadata.) The next step is to apply this model to the provision of more complex services.

As is often the case, Sun was one of the first to articulate this vision, of devices that expose the services they offer, and other devices that automatically discover and use those services. The initial focus of Jini has been on networked hardware devices, yet the same principles apply in the world of the web that I've been talking about for the past fifteen minutes.

What we're going to see are sites you can query for the services that they provide, and the methods they expose for calling those services. We'll see search engines and directories that help us find those services, and of course, lots of opportunities for companies like O'Reilly to document and explain them.

I'm seeing companies sprouting up all over the net who have a hold of some part of the same elephant. For example, there was an intriguing announcement in the past couple of days about the beta release of the infrasearch technology, based on Gnutella. (In case you haven't heard of it, Gnutella is a robust free software implementation of a Napster-like program.)

Whether infrasearch scales or not is unknown. But the idea is completely right on: build a search engine that looks for services, not for static pages, and ask questions of those services. The simple example they show, of putting an algebraic equation in the search engine, and returning the answer from a calculator, is just what I've been talking about.

I don't know which of these various architectures for service publication and discovery will catch on. It won't necessarily be the best one.

In fact, as Clayton Christenson pointed out in The Innovator's Dilemma, less is often more. It's the technology that's just good enough, and easier to use, that often wins. After all, going back to ecological succession: it's not until the lichens and mosses have had their day that it's time for higher plants and animals to arrive.

Read the Outtakes

Some material got cut as the talk evolved, but I thought I'd keep them in the written version as a small digression.

We have to allow for false starts, for early ideas that don't quite work but pave the way for others that do, and for technologies that are designed in such a way that other people can surprise us by the way that they use them.

This returns me to the theme with which I began my talk, open source. The real power of open source is that it lowers the barriers to entry, allowing people to participate more easily in development and invention. In the age of the network, which distributes the power of participation to anyone in reach of a computer, not just the people on your own development staff, the technology that makes it easiest for anyone to join in moving things forward will ultimately win.


Discuss this article in the O'Reilly Network Forum.

Return to the O'Reilly Network Hub.