This essay is an excerpt from the forth-coming book Peer-to-Peer: Harnessing the Disruptive Potential of Collaborative Networking. It presents the goals that drive the developers of the best-known peer-to-peer systems, the problems they've faced, and the technical solutions they've found. The book will be available at the O'Reilly Peer-to-Peer Conference in February and in bookstores in March.
On Sept. 18, 2000, I organized a "peer-to-peer summit" to explore the bounds of peer-to-peer networking. In my invitation to the attendees, I set out three goals:
This is exactly what we did with the open-source summit. By bringing together people from many projects, we were able to get the world to recognize that free software was more than GNU and Linux; we introduced a lot of people, many of whom, remarkably, had never met; we talked shop; and ultimately, we crafted a new "meme" that completely reshaped the way people thought about the space.
The people I invited do tell part of the story: Gene Kan from Gnutella and Ian Clarke from Freenet were obvious choices. They matched the current industry buzz about peer-to-peer file sharing. Similarly, Marc Hedlund and Nelson Minar from Popular Power made sense, because there was already a sense of some kind of connection between distributed computation and file sharing. But why did I invite Jeremie Miller of Jabber and Ray Ozzie of Groove, Ken Arnold from Sun's Jini project and Michael Tiemann of Red Hat, Marshall Rose, author of BXXP and IMXP, Rael Dornfest of Meerkat and RSS 1.0, Dave Stutz of Microsoft, Andy Hertzfeld of Eazel, Don Box, one of the authors of SOAP, and Steve Burbeck, one of the authors of UDDI. (Note that not all of these people made it to the summit; some sent substitutes.)
As I said in my invitation:
[I've invited] a group of people who collectively bracket what I consider a new paradigm, which could perhaps best be summarized by Sun's slogan, "the network is the computer." They're all working on parts of what I consider the next generation net story.
This article reports on some of the ideas discussed at the summit. It also continues the job of trying to reshape the way people think about that "next generation net story" and the role of peer-to-peer in telling that story. The concepts we use are, at bottom, maps of reality. Bad maps lead to bad decisions. If we believe peer-to-peer is about illegal sharing of copyrighted material, we'll continue to see rhetoric about copyright and censorship at the heart of the debate, and we may push for ill-advised legal restrictions on the use of the technology. If we believe it's about a wider class of decentralized networking applications, we'll be focused on understanding what those applications are good for and on advancing the state of the art.
Harnessing the Power of Disruptive Technologies
This article also gives some background on one of the tools I used at the meeting -- something I'll call a "meme map" -- and presents the results of the meeting as one of those maps. This map is also useful in understanding the thinking behind the O'Reilly's P2P directory. This broader map has two benefits. First, the peer-to-peer community can use it to organize itself -- to understand who is doing related work, and identify areas where developers can learn from each other. Second, the meme map helps the community influence outsiders. It can create excitement where there previously was indifference and turn negative impressions into positive ones.
First, though, a bit of background.
Recently, I started working with Dan and Meredith Beam of Beam, Inc., a strategy consulting firm. Dan and Meredith help companies build their "business models" -- one-page pictures that describe "how all the elements of a business work together to build marketplace advantage and company value." It's easy to conclude that two companies selling similar products and services are in the same business; the Beams think otherwise.
For example, O'Reilly and IDG compete in the computer book publishing business, but we have different business models. IDG's strategic positioning is to appeal to the "dummy" who needs to learn about computers but doesn't really want to; O'Reilly's is to appeal to the people who love computers and want to go as deep as possible.
IDG's marketing strategy is to dominate retail outlets and "big box" stores in hopes of putting product in front of consumers who might happen to walk by in search of any book on a given subject. O'Reilly's marketing strategy is to build awareness of our brand and products in the core developer and user communities who then buy directly or drive traffic to retail outlets. One strategy pushes product into distribution channels to reach unknown consumers; the other pulls products into distribution channels in response to queries from consumers who are already looking for the product.
Both companies are extremely successful, but our different business models require different competencies. I won't say more lest this article turn into a lesson for O'Reilly competitors, but, hopefully, I have said enough to get the idea across.
Boiling all the elements of your business down to a one-page picture is a really useful exercise. But what is even more useful is that Dan and Meredith have you run the exercise twice, once to describe your present business and once to describe it as you want it to be.
At any rate, fresh from the strategic planning process at O'Reilly, it struck me that an adaptation of this idea would be useful preparation for the summit. We weren't modeling a single business; we were modeling a space, and the key projects, concepts and messages associated with it.
I call these pictures "meme maps" rather than "business models" in honor of Richard Dawkins' wonderful formulation of memes as ideas that spread and reproduce themselves and are passed on from mind to mind. Just as gene engineering allows us to artificially shape genes, meme-engineering lets us organize and shape ideas so that they can be transmitted more effectively, and have the desired effect once they are transmitted.
I built the free software map by picking key messages from the Free Software Foundation Web site. I also added a few things (the red bubbles in the lower right quadrant of the picture) to show common misconceptions that were typically applied to free software. (Please note that this diagram should not be taken as a complete representation of the beliefs of the Free Software Foundation. While I took the positioning as it appears on their Web site, no one from the FSF has reviewed this slide, and they might well highlight very different points if given the chance to do so.)
Free Software Meme Map (Click to enlarge)
There are a couple of things to note about the diagram. The greenish bubbles at the top represent the outward face of the movement -- the canonical projects or activities. In the case of the Free Software Foundation, these are programs like gcc, the GNU C Compiler, GNU emacs, GhostScript (a free replacement for PostScript) and the GNU Public License or GPL.
The box in the center lists strategic positioning, the key perceived user benefit, and the core competencies. The strategic goal is right up front on the Free Software Foundation Web site: to build a complete free replacement for the UNIX operating system. The user benefit is sold as one of standing up for what's right, even if there would be practical benefits in compromise. There is little sense of what the core competencies of the free software movement might be, other than that they have right on their side, and the goodwill of talented programmers.
In the Beam models, the beige bubbles in the lower half of the picture represent internal activities of the business. For my purposes, I used them to represent guiding principles and key messages. I used red bubbles to represent undesirable messages that others might be creating and applying to you.
As you can see, the primary messages of the Free Software movement, thought provoking and well articulated as they are, don't address the negative public perceptions that are spread by opponents of the movement.
Open Source Meme Map (Click to enlarge)
Now take a look at the diagram I drew for open-source. The content of this diagram was taken partly from the Open Source Initiative Web site, but also from the discussions at the open-source summit I organized in April 1998, and from my own thinking and speaking about open source in the years since. Take the time to read the diagram carefully; it should be fairly self-explanatory. It demonstrates what a well-formed strategic meme map ought to look like.
As you can see by comparing the two diagrams, they put a completely different spin on what formerly might have been considered the "same space." We did more than just change the name that we used to describe a collection of projects from "free software" to "open source." In addition:
While some further discussion of the open-source meme map might be worthwhile in another context, I present it here mainly to clarify the use of meme maps in creating a single unifying vision of a set of related technologies.
Here's the slide I showed to the group at the summit. Things have evolved somewhat since that time, partly as a result of efforts like ours to correct common misconceptions, but this picture still represents the view being bandied about by industries that feel threatened by peer-to-peer technologies:
Current Peer-to-Peer Meme Map (Click to enlarge)
Not a pretty picture. The canonical projects all feed the idea that peer-to-peer is about the subversion of intellectual property. The chief benefit presented to users is that of free music (or other copyrighted material). The core competencies of peer-to-peer projects are assumed to be superdistribution, the lack of any central control point and anonymity as a tool to protect the system from attempts at control.
Clearly, these are characteristics of the systems that put the peer-to-peer buzzword onto everyone's radar. But are they really the key points? A map is useful only to the extent that it reflects underlying reality. A bad map gets you lost; a good one helps you find your way through unfamiliar territory.
We spent a few hours brainstorming about important applications of peer-to-peer technology, key principles, and so on. I've tried to capture the results of that brainstorming session in the same form that I used to spark the discussion, as a meme map. Note that this is my personal take-away from the meeting. The actual map below wasn't fully developed or approved there.
New Peer-to-Peer Meme Map (Click to enlarge)
A quick walk through of the various projects and how they fit together leads us to a new understanding of the strategic positioning and core competencies for peer-to-peer projects. In the course of this walk-through, I'll also talk about some of the guiding principles that we can derive from studying each project, which are captured in the bubbles in the lower half of the diagram. This discussion is necessarily quite superficial, but suggests directions for further study.
One of the most interesting things about Napster is that it's not a pure peer-to-peer system in the way that radically decentralized systems like Gnutella and Freenet are. While the Napster data is distributed across millions of hard disks, finding that data depends on a central server. In some ways, the difference between MP3.com and Napster is smaller than it appears: one centralizes the files, while the other centralizes the addresses of the files.
The real genius of Napster is the way it makes participation automatic. By default, any consumer is also a producer of files for the network. Once you download a file, your machine is also available to pass along the file to other users. Automatic "pass along" participation decentralizes file storage and network bandwidth, but most importantly, distributes the job of building the Napster song database.
Dan Bricklin has written an excellent essay on this subject, The Cornucopia of the Commons. In this wonderful reversal of Hardin's tragedy of the commons, Bricklin explains why Napster demonstrates the power of collectively assembled databases in which "increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit.
This feature is also the source of Dave Winer's insightful comment that "The P in P2P is People."
Dave's comment highlights why the connection to the open-source movement is significant. Open-source projects are self-organizing, decentralized work groups enabled by peer-to-peer Internet technologies. If the P in P2P is people, then the technologies that allow people to create self-organizing communities, and the frameworks that have been developed for managing those communities, provide important lessons for those who want to work in the peer-to-peer space.
Open source isn't just about a set of licenses for software distribution, it's also a set of techniques for collaborative, wide-area software development. (This was one of the principles behind my call for organizing peer-to-peer standards activities, along the same lines as the self-organizing IETF, rather than as a centralized industry consortium.) As I've argued elsewhere, it was the peer-to-peer Usenet that was one of the key drivers of the early open-source community. Technologies that enable people to associate freely, end-to-end, are great levelers, and great hotbeds to promote innovation.
Napster also illustrates another guiding principle: that of redundancy, and tolerance of unreliability. I was talking recently with Eric Schmidt, CEO of Novell, about lessons from peer-to-peer. He remarked on a conversation he'd had with his 13-year-old daughter. "Does it bother you that sometimes songs are there, and sometimes they aren't?" "Does it bother you that there are lots of copies of the same song, and that they aren't all the same?" Her answer, that neither of these things bothered her in the slightest, seemed to him to illustrate the gulf between the traditional computer scientist's concern for reliability and orthogonality and the user's lack of care for these issues.
Another important lesson from Napster is that free riders, "super peers" providing more or better resources, and other variations in peer participation will ultimately decrease the decentralization of the system. Experience is already showing that a hierarchy is starting to emerge. Some users turn off file sharing. Even among those who don't, some have more files, and some have better bandwidth. As in Orwell's Animal Farm, all animals are equal, but some are more equal than others. While this idea is anathema to those wedded to the theory of radical decentralization, in practice, it is this feature that gives rise to many of the business opportunities in the peer-to-peer space. It should give great relief to those who fear that peer-to-peer will lead to the leveling of all hierarchy and the end of industries that depend on it. The most effective way for the music industry to fight what they fear from Napster is to join it, and provide sites that become the best source for high quality music downloads.
Even on Gnutella, the concept of super peers is starting to emerge. Clip2's DSS (Distributed Search Solutions) has developed a program that they call a Gnutella "Reflector," a proxy and index server designed to make Gnutella more scaleable. According to Kelly Truelove of Clip2, "Multiple users connect to such a Reflector as they might connect to a Napster central server, yet, unlike such a central server, the Reflector itself can function as a peer, making outgoing connections to other peers on the network."
Usenet was originally carried over the peer-to-peer dialup UUCPnet. Sites agreed to call one another, and passed mail and news from site to site in a store-and-forward network. Over time, though, it became clear that some sites were better connected than others; they came to form a kind of de-facto "Usenet backbone." One of the chief sites, seismo, a computer at the U.S. Geological Society, was run by Rick Adams. By 1987, the load on seismo had become so great that Rick formed a separate company, called UUnet, to provide connectivity services for a monthly fee.
As the uucpnet was replaced by the newly commercialized Internet, UUnet added TCP/IP services and became the first commercial Internet service provider. Interestingly enough, the IP routing infrastructure of the Internet is still peer-to-peer. Internet routers act as peers in finding the best route from one point on the net to another. Yet overlaid on this architecture are several layers of hierarchy. Users get their Internet connectivity from ISPs, who may in turn connect to each other in hierarchies that are hidden from the end user. Yet beneath the surface, each of those ISPs depends on the same peer-to-peer architecture.
Similarly, e-mail is routed by a network of peered mail servers, and it appears peer-to-peer from the user point of view, yet those users are in fact aggregated into clusters by the servers that route their mail, and the organizations that operate those servers.
Centralization and decentralization are never so clearly separable as anyone fixated on buzzwords might like.
But look a little deeper, and something else emerges: the clients are active participants, not just passive "browsers." What's more, the project uses the massive redundancy of computing resources to work around problems such as reliability and network availability of any one resource. But even more importantly, if you look further down the development timeline, when startups such as United Devices, Popular Power, Parabon and others have their services in the market, the "ecology" of distributed computation is going to be much more complex. There will be thousands (and ultimately, perhaps millions) of compute-intensive tasks looking for spare cycles. At what point does it make sense to have an architecture that allows a two-way flow of tasks and compute cycles?
Further, many of the key principles of a Napster are also at play in distributed computation. Both Napster and SETI@Home need to create and manage metadata about a large community of distributed participants. Both need to make it incredibly simple to participate.
Finally, both Napster and SETI@Home have tried to exploit what Clay Shirky memorably called "the dark matter of the Internet" -- the hundreds of millions of interconnected PCs that have hitherto been largely passive participants in the network.
Already, startups like MojoNation are making a link between file sharing and distributed computation. In the end, both distributed file sharing and distributed computation are aspects of a new world in which Sun's long-term slogan, "The Network is the Computer," is finally coming true.
Once you realize this, it becomes clear just how similar the Napster model is to instant messaging. In each case, a central authority manages an addressing system and a "namespace," and uses it to connect end users. In some ways, Napster can be thought of as an instant-messaging system in which the question isn't "Are you online and do you want to chat?" but "Are you online and do you have this song?"
Not surprisingly, a project like AIMster makes explicit use of this insight to build a file sharing network that uses the AOL Instant Messenger (AIM) protocol. This brings IM features such as buddy lists into the file-sharing arena.
The open-source Jabber instant-messaging platform takes things even further. While Jabber started out as a switching system between incompatible Instant Messaging protocols, it is evolving into a general XML routing system, and the basis for applications that allow users and their computers to ask even more interesting questions of each other.
Ray Ozzie's Groove Networks is an even more mature expression of the same insight. It provides a kind of groupware dial tone, or "LAN on demand" for ad-hoc groups of peers. Like Jabber, it provides an xml routing infrastructure that allows for the formation of ad-hoc peer groups that can share not only files and chat, but a wide variety of applications. Replication, security, and so on are taken care of automatically by the underlying Groove system. If systems like Groove deliver what they promise, we can see peer-to-peer as a solution to the IT bottleneck, allowing users to interact more directly with each other in networks that can span organizational boundaries.
A Web hyperlink can point to any other site on the network, without any central intervention, and without the permission of the site being pointed to. What's more, hyperlinks can point to a variety of resources, not just Web pages. Part of the Web's explosive growth versus other early Internet information services was that the Web browser became a kind of universal client, able to link to any kind of Internet resource. Initially, these resources were competing services such as ftp, gopher and wais, but eventually, through CGI, it became an interface to virtually any information resource that anyone wanted to make available. Mailto and news links even provide gateways to mail and Usenet.
There's still a fundamental flaw in the Web as it has been deployed, though. Berners-Lee created both a Web server and a Web browser, but he didn't join them at the hip the way Napster did. And as the Buddhist Dhammapadda says, "If the gap between heaven and earth is as wide as a barleycorn, it is as wide as all heaven and earth." Before long, the asymmetry between clients and servers had grown wide enough to drive a truck through.
Browsers had been made freely available to anyone who wanted to download one, but servers were seen as a high-priced revenue opportunity, and were far less widely deployed. There were free UNIX servers available (including the NCSA server, which eventually morphed into Apache), but by 1995, 95 percent of Web users were on Windows, and there was no Web server at all available to them! In 1995, in an attempt to turn the tide, O'Reilly introduced Website, the first Web server for Windows, with the slogan "Everyone who has a Web browser ought to have a Web server." However, by then, the market was fixated on the idea of the Web server as a centralized publishing tool. Microsoft eventually offered PWS, or Personal Web Server, bundled with Windows, but it was clearly a low-powered second-class offering.
Perhaps even more importantly, as Clay Shirky has pointed out, the rise of dynamic IP addressing made it increasingly difficult for individuals to publish to the Web from their desktops. As a result, the original "two-way Web" became something closer to television, a medium in which most of the participants are consumers, and only a relatively small number are producers.
Web site hosting services and participatory sites like Geocities made it somewhat easier to participate, but these services were outside the mainstream of Web development, with a consumer positioning and non-standard tools.
Recently, there's been a new emphasis on the "writeable Web," with projects like Dave Winer's editthispage.com, Dan Bricklin's trellix.com, and Pyra's blogger.com, making it easy for anyone to host their own site and discussion area. Wiki is an even more extreme project, creating Web sites that are writeable by anyone with an area set aside for public comment on a given topic. Wiki has actually been around for six or seven years, but has suddenly started to catch on.
The writeable Web is only one way that the Web is recapturing its peer-to-peer roots. Content syndication with RSS (Rich Site Summary) and Web services built with protocols like xml-rpc and SOAP allow sites to reference each other more fully than is possible with a hyperlink alone.
What SOAP does is formalize something that has been done for years by sophisticated programmers. It's relatively easy, using perl and a library like libwww-perl, to build interfaces to Web sites that do "screen scraping" and then reformulate and reuse the data in ways that the original Web developers didn't intend. It was even possible, as Jon Udell demonstrated, to take data from one Web site, and pass it to another for further processing, in a Web equivalent to the UNIX pipeline.
SOAP makes this process more explicit, turning Web sites into peers in providing more complex services to their users. The next generation of Web applications won't consist of single-point conversations between a single server and a single browser, but a multipoint conversation between cooperating programs.
One of the key issues that comes up once you start thinking about more complex interactions between sites on the Net, is that metadata management is critical. UDDI is a first step toward a standard for cataloging Web services in ways that will allow them to be discovered by sites that wish to use services provided by each other.
Similarly, content syndication formats such as RSS allow Web sites to cooperate in delivering content. By publishing RSS feeds, sites enable other sites to automatically pick up data about their stories. A site like the O'Reilly Network homepage is updated automatically out of a set of RSS news feeds from a Web of cooperating sites.
Right now, RSS provides only the simplest of metadata about Web pages, for simple syndication applications like creating news digest pages. But the new RSS 1.0 proposal will allow for more complex applications based on distributed data.
As you look at these technologies, you see a great deal of overlap between what's needed for peer-to-peer on devices and peer-to-peer in areas from Web services to file sharing. Key technologies include resource discovery, reliability through redundancy, synchronization and replication, and so on.
Sun first articulated this vision many years ago with the slogan "The Network is the Computer," but that slogan is only now coming true. And if the network is the computer, then the projects under the peer-to-peer umbrella are collectively involved in defining the operating system for that emergent global computer.
That positioning guides technology developers. But there is a story for users, too: You and your computer are more powerful than you think. In the peer-to-peer vision of the global network, a PC and its users aren't just passive consumers of data created at other central sites.
Perhaps more important still is a vision of the core competencies that peer-to-peer projects will need to bring to the table. High on the list is metadata management. Whether you're dealing with networked devices, file sharing, distributed computation, or Web services, users need to find one another and what they offer. While we don't have a clear winner in the resource discovery area, XML has emerged as an important component in the puzzle.
What do we mean by metadata? In the case of Napster, metadata means the combination of artist and song names that users search for. It also includes additional data managed by the central Napster server, such as the names and Internet addresses of users, the size of the music files, and the reported amount of bandwidth of the user's Internet link. (You can refer to this as the Napster "namespace," a privately-managed metadata directory that gives Napster the ability to link users and their files with each other.)
In considering Napster, it's worth noting that the "namespace" of popular music is simple, and well-known. The Napster model breaks down in cases where more complex metadata is required to find a given piece of data. For example, in the case of classical music, an artist/song combination is often insufficient, since the same piece may be performed by various combinations of artists.
A related observation, which Darren New of Invisible Worlds made at the summit, is that Napster depends on the music industry itself to "market its namespace." Without pre-existing knowledge of song titles and artists, there is nothing for the Napster user to search for. This will lead to additional centralization layers as unknown artists try to provide additional information to help users find their work. This is much the same thing that happened on the Web, as a class of portals such as Yahoo! grew up to categorize and market information about the peer-to-peer world of hyperlinked Web pages.
It's easy to see, then, how understanding and managing namespaces and other forms of metadata becomes central even to peer-to-peer applications. What's more, it is also the key to peer-to-peer business models. Controlling namespaces and resource discovery has turned out to be one of the key battlegrounds of the Web. From Network Solutions, which largely controls DNS domain name registration, to Yahoo and search engines, identifying and capitalizing on the ways that centralization impacts even radically decentralized systems has turned out to be one key to financial success.
Instant messaging turns out to tell a similar story. The namespace of an instant-messaging system and the mapping of identity onto user addresses is the key to those systems. You have only to witness the efforts of AOL to keep other instant-messaging vendors from reaching its customers to understand just how important this is. (Note, however, that in the end, an open namespace with multiple providers will create a more powerful network than a closed one, just as the open Web trumped closed information services like AOL and MSN. AOL now succeeds for its customers as a "first among equals" rather than as a completely closed system.)
In the case of a distributed computation application, metadata might mean some identifier that allows the distributed data elements to be reassembled, and the address of the user who is working on a particular segment. SETI@Home also tracks user identity as a way of providing a game-like environment in which users and companies compete to contribute the most cycles. Startups aiming to compensate users for their spare compute cycles will need to track how much is contributed. Depending on the type of problem to be computed, they might want to know more about the resources being offered, such as the speed of the computer, the amount of available memory, and the bandwidth of the connection.
We can see then that some of the key business battlegrounds for peer-to-peer will be in defining the standards for metadata, the protocols for describing and discovering network based resources and services, and owning the "namespaces" that are used to identify those resources.
Returning to Napster, though, it's also clear that the core competencies required of successful peer-to-peer projects will include seamless communication and connectivity, facilities that support self-organizing systems, and the management of trust and expectations.
Ultimately, peer-to-peer is about overcoming the barriers to the formation of ad-hoc communities, whether of people, of programs, of devices, or of distributed resources. It's about decoupling people, data and services from specific machines, using redundancy to replace reliability of connections as the key to consistency. If we get it right, peer-to-peer can help to break the IT bottleneck that comes with centralized services. Decentralization and user empowerment enable greater productivity. Edge services allow more effective use of Internet resources.
We're just at the beginning of a process of discovery. To get this right, we'll need a lot of experimentation. But if we can learn lessons from Internet history, we also need to remember to focus on interoperability, rather than treating this as a winner-takes-all game in which a single vendor can establish the standard for the network platform.
The peer-to-peer landscape is changing daily. New companies, applications, and projects appear faster than they can be catalogued. Especially with all the hype around peer-to-peer, the connections between these projects can be fairly tenuous. Is it marketing buzz or is it substance when everyone tries to join the parade?
While there's a danger in casting the net too widely, there's also a danger in limiting it. I believe that the story I've told gives us a good starting point in understanding an emergent phenomenon: the kind of computing that results when networking is pervasive, resources are abundant (and redundant) and the barriers are low to equal participation by any individual network node.
Tim O'Reilly is the founder and CEO of O’Reilly Media Inc. Considered by many to be the best computer book publisher in the world, O'Reilly Media also hosts conferences on technology topics, including the O'Reilly Open Source Convention, Strata: The Business of Data, the Velocity Conference on Web Performance and Operations, and many others. Tim's blog, the O'Reilly Radar "watches the alpha geeks" to determine emerging technology trends, and serves as a platform for advocacy about issues of importance to the technical community. Tim is also a partner at O'Reilly AlphaTech Ventures, O'Reilly's early stage venture firm, and is on the board of Safari Books Online, PeerJ, Code for America, and Maker Media, which was recently spun out from O'Reilly Media. Maker Media's Maker Faire has been compared to the West Coast Computer Faire, which launched the personal computer revolution.
Discuss this article in the O'Reilly Network General Forum.
Return to the P2P DevCenter.
Copyright © 2009 O'Reilly Media, Inc.