Published on (
 See this if you're having trouble printing code examples

Free Radical: Ian Clarke has Big Plans for the Internet

by Richard Koman

Related Articles:

A Directory of Peer-to-Peer Projects

How Ray Ozzie Got His Groove Back

Open Source Roundtable: Free Riding on Gnutella With audio

O'Reilly's Peer-to-Peer Conference

How the Peer-to-Peer Working Group Ought to Be Organized

More from the P2P DevCenter Previous Features

Freenet - one of the Big Three of P2P (the others, of course, are Napster and Gnutella) - has mostly been written about, even by founder Ian Clarke, as a censorship-proof network, where no one knows where a specific piece of information exists. Even the owners of Freenet nodes don't know what content exists on their computers. But Freenet is much more than an anonymity system: Clarke has built into it the seeds of a radically new Internet.

What he'll do with those seeds at Uprizer, his Santa Monica, Calif.-based company, is anyone's guess. All Clarke will say is that they're working on "network infrastructure," a phrase that qualifies as the P2P equivalent of "How do you like this weather?" The one slogan on Uprizer's Web site, however, gives a hint about Clarke's attitude about the current buzz around P2P: "P2P is a technology, not a bandwagon."

What makes Freenet technology radical is the way information is propagated across the system. On the Web, a person puts up a document on a server and clients ask for it. The more popular the content is, the more difficult it is to make it available. If your document is a new book by Stephen King, your server starts to buckle under the load, your bandwidth slows to a crawl, and not only the popular document, but all documents on your server become less and less available. To alleviate the problem, you have to add more servers, more redundancy.

In Freenet, as Clarke explains in this interview, a request for information not only delivers the information to the requesting node, but also replicates the document on the nodes closest to the requestor. This has two effects: information moves closer to people who want it and the more popular information is, the more copies of it exist. Unlike the Web, the more popular content is, the more - not less - available it is.

To give you a flavor of this fairly lengthy interview, here are a few choice tidbits:

On the distribution of content:

"On Freenet, popular information becomes more widely distributed, which means that you're not going to get what some people call "the slashdot effect," whereby an extremely popular piece of information becomes unavailable. The availability of information on Freenet increases in proportion to its popularity."

On Freenet's conceptual forebears:

"The intention of the original Arpanet was ... to create a decentralized system, the idea being that if there was a nuclear war, the only two things to survive would be cockroaches and the Internet. ... I think that really Freenet in some ways is the realization of the original creators of the Internet."

On the Domain Name System:

"One of the initial applications for Freenet that occurred to me was to replace the Domain Name System, because Freenet can attach a piece of information to an identifier. So the identifier might be the name of the computer and the piece of information might be the computer's IP address. So one of the opportunities will be for Freenet to replace this mechanism."

On Intel's attempt to jump-start a standards process:

"Trying to come up with standards for something like peer-to-peer is a little bit like trying to decide what color you should be painting your house before you've even started to build it."

Q: Peer-to-peer is the hot topic of the year. What do you make of that?

A: I think it's interesting. It is slightly strange since nobody really seems to know what peer-to-peer is, yet everyone seems to be raving about it. Everybody seems to be trying to nail down peer-to-peer, or at least some people are, yet other people are quite keen to broaden the definition as much as possible, so it will encompass what they're doing. I personally am not convinced of the value of the term "peer-to-peer" anymore, given that it seems to apply to everything from distributor processing, which has been around for a long time, to data distribution, which is perhaps a little bit more recent.

Q: Isn't there a parallel between sharing PC cycles and sharing PC storage?

A: There is a parallel but there are also a couple of important differences. First, in my view, one of the most important features of the Freenet network is that it's completely decentralized. Generally distributed processing relies on some form of centralization in order to orchestrate what's going on, so if your view of what is interesting by peer-to-peer is "Oh, well, we get to use all of these kind of client PCs," then distributor processing is quite a close analogy. But in fact what's interesting to me about these technologies is the fact that they're completely decentralized, and you're no longer beholden to some form of centralized server that allows the system to function.

Q: I wonder if you would just start at the beginning and describe your early thinking that turned into Freenet and how you envisioned this network at first.

A: There were two sides to my early thinking. There was a kind of philosophical interest, and there was also a technical interest. Philosophically I was very interested in the whole idea of freedom of information, and I was somewhat concerned by what I saw as increasing moves to impose censorship on the Internet. While in 1998, when I first started to think about this, this hadn't really begun in earnest, my fears have really been justified in the past two or three years in terms of a number of Western governments making increased efforts to both monitor and censor the Internet in ways that simply wouldn't be tolerated if applied to more conventional means of communication, such as the postal service or the telephone networks.

Q: What specific incidents were of great concern to you?

A: The first Western country to really impose what I viewed as somewhat Draconian censorship on the Internet was Australia, which came up with these laws whereby it had a list of Web sites that were censored and any Internet service provider in Australia that did not restrict access to that list of Web sites could be subjected to huge fines. The way that that list was generated was - in terms of the accountability of the people who were coming up with this list of what should and shouldn't be censored on the Internet - extremely dubious.

Subsequently, the United Kingdom had a Regulation of Internet Powers bill, which has now become law, that allows the security services to monitor all Internet traffic, and that was extremely worrying.

Q: You mentioned censorship in Western nations and I wonder if you have been thinking also about censorship in more totalitarian countries.

A: Well, certainly. In countries like China and Saudi Arabia, the Internet is very, very heavily censored. Certainly Freenet could still be used there to communicate securely and to share information securely. But whereas in Western countries it's very unlikely that encryption, for example, would be banned, that is possible in countries like China. Now in terms of their ability to enforce that ban - it will be extremely costly to do that, they could just ban Freenet full stop.

Q: What were your technical interests in creating a fully decentralized network?

A: If you look at the way the Internet works, it's very centralized in terms of its architecture. It's very inefficient. If you look at the Web, if 1,000 people in the U.K. request the same document from the United States, the same information travels across the Atlantic 1,000 times. That and other examples of centralization struck me as being highly inefficient in terms of network-bandwidth usage, which was, and still is, a somewhat valuable commodity.

So one of the things that Freenet does is it actually moves information around and dynamically replicates information to reduce the load on the network bandwidth. So in that specific example, if 1,000 people in the U.K. request the same document from the U.S. and they were using Freenet, it would only need to travel over the Atlantic once, and thereafter it would be stored locally and distributed within the U.K. - or within Europe, depending on where the demand was.

Q: And the other problem with the way the Web is organized is that the more popular a piece of information is the less available it becomes.

A: That's certainly true, and that is the other real deficiency that Freenet addresses - in that popular information does become more widely distributed, which means that you're not going to get what some people call "the slashdot effect" whereby an extremely popular piece of information becomes unavailable. The availability of information on Freenet increases in proportion to its popularity.

Q: Can you just explain a bit technically how that works? How the system is architected?

A: You could look at it like an ant colony where instead of food you have pieces of information, and instead of ants you have requests, which travel around this network. Freenet, when you request a piece of information on Freenet, you ask your local Freenet node for that information. If it has the information itself, it will obviously return it to you. If not, it will forward that request on to another node that is more likely to have that information - and nodes in the network actually learn with time how to better route information through the network - so they additionally move information closer to where the demand for that information is, so that when you request a piece of information, immediately after you requested it a copy of that information will reside on your computer and the computers close to you for a short amount of time. If you or other people close to you then request that information, they will receive that information immediately. So this is really the way that it dynamically moves information closer to demand.

Q: And how do you accomplish the spawning of copies of the information based on interest in it or requests for it?

A: Well, the actual act of requesting the information results in the information being propagated further, because the request will pass through a number of computers in order to reach the computer that actually has the information, and when that is passed back, a copy of it is stored on each of the computers which participated in the request, so the more requests there are for a given piece of information, the more widely distributed it becomes.

Q: And those intermediary nodes store this information for a substantial amount of time or just temporarily?

A: They store the information in least-recently-used cache, which means that the more frequently a piece of information is requested the longer it is stored by the node. If a piece of information - if demand for a piece of information drops off, then that information will effectively be removed from the node.

Q: So is your system of nodes similar to the e-mail system of servers where messages bounce from one server to the next until they get closest to where the recipient lives?

A: Yes, that's a reasonably close analogy. Of course e-mail relies on the domain-name system in order to route messages, and the domain-name system is kind of fundamentally centralized in its architecture. But there are similarities.

Q: I'd like to talk about the domain-name system for a second, since you mention it, because one implication of not being able to identify which machine stores a piece of data is that the whole notion of attaching one name to one machine - you're calling that into question. The whole question of fighting over domain names seems to me to go away.

A: Well, it is an interesting question. I mean certainly one of the initial applications for Freenet that occurred to me was as a way to replace the domain-name system, because Freenet can attach a piece of information to an identifier. So the identifier might be the name of the computer and the piece of information might be the computer's IP address. So one of the opportunities will be for Freenet to replace this mechanism.

Of course, the problem that you then encounter is that whereas it costs some money to register a domain name with the domain-name system, in Freenet it would be free, and that would create the problem of people just taking apart dictionaries and registering them so you could end up with a much more extreme version of what people currently call "domain hogging" or "cybersquatting." There have been proposals for how to address that: there's an idea called "hash cache" where prior to inserting something into the system you're forced to perform this extremely time-consuming calculation, which is one option. The problem, of course, with hash cache is that it discriminates against people with less powerful computers. Another option which we've been pondering is an idea called "think cache" where it actually requires that a person do some thinking prior to ...

Q: ... Not that! ...

A: ... Um? ...

Q: ... Not thinking, for God's sake ...

A: ... Yeah, that would obviously discriminate against stupid people, but that might not be such a bad idea, but these are the kind of problems where you have an anonymous system. But aside from that, it's probably a much more effective solution because it's much more efficient and it doesn't rely on this somewhat shaky hierarchy of domain-name servers that the Internet relies on, which periodically breaks down.

Q: Right. If you're looking at the Third World as a world of machines that go down rather frequently and connectivity that isn't always up, the Freenet system might be a way to maintain availability.

A: Well, never mind the Third World. You've just described the First World. The reliability of the Internet even now, even in well-developed countries, still leaves a lot to be desired.

Q: It also seems to me to address certain economies of, well, I was going to say economies of scale, but really scale can be thought of economically inefficient. Being a very large information provider means you have to have these giant server farms and redundant connectivity solutions. It becomes very expensive, I think, to be a Yahoo and have to own all of the redundancy.

A: Well, what we saw in terms of Network Solutions with the domain-name system was that if you create this reliance on centralized servers, you basically create these artificial monopolies, which seem destined to abuse those monopolies, and people who need to rely on them generally end up getting a bad deal. So there are a number of reasons, both economically and practically, and also technical efficiency reasons why you should really avoid centralized servers if it's possible. The problem to date has been that not having centralized servers is actually quite a difficult problem to solve in computer science, and I think that Freenet is probably one of the first examples of a piece of software that is designed to address the issue head-on and provide a scalable solution.

Q: Can the issue of anonymity be divorced from the architecture of Freenet?

A: I think to a degree it can. I mean Freenet, the current Freenet architecture, is quite amenable if you want to create an anonymous publication system, which was our intention with Freenet, but anonymity is not an inherent part of the architecture.

Q: So as you look into the future, you indicated that we're quite early into the development of decentralized networks. As you look out a couple of years, do you see a lot of substantial prospect of success for a different kind of Internet, a decentralized Internet?

A: I believe that it's really inevitable, even if you look at the past as opposed to looking at the future. If you look at the intentions of the original Arpanet, it was to create a decentralized system. The idea being that if there was a nuclear war, the only two things to survive would be cockroaches and the Internet, but what has happened since then is that due to the conceptual problem and the difficulties in designing completely decentralized systems, centralization such as the domain-name system has kind of been bolted on top of the Internet and the Internet has really grown into this really hierarchical architecture. So I think that really Freenet in some ways is the realization of the original creators of the Internet.

Q: Could the Freenet architecture be adopted by other software applications?

A: Well, I think that the Freenet architecture - architectures based on the same principles as Freenet - have very significant benefits to anyone who cares to adopt them. The first one being the improved reliability you get from not relying on a server, because it means you're got no single point of failure. The second real benefit is the scalability.

Current information sharing systems really do have scalability issues. If you take Napster and forget about all of the copyright arguments and just look at it from an architectural point of view, it has quite serious scalability problems, and then it's got this huge number of users and they all rely on a central server, and a time comes when you've bought the most expensive server you can and it's simply not big enough. So what Napster had to do is get multiple servers and split them up, and now people who happen to be attached to one server cannot communicate with people attached to another server. So I think that really reliability and scalability are the two core benefits that Freenet-style architectures could bring to existing systems.

Q: Is searching possible on Freenet?

A: Well, it depends on precisely what you mean by searching. The way Freenet works at the moment, it's a little bit like a file system. If you have a file name you can retrieve the information that was stored - except to consider a file name, we use keys. In terms of doing a fuzzy search, that's as yet still not possible with Freenet.

However, what is possible is the creation of Yahoo-style directories of information that is stored in the system, but fuzzy searching is one of the major things that we plan to implement in the near future, and we're confident that the Freenet architecture is in fact very well-suited to that. The way that Freenet organizes information, Freenet groups together similar information, and what happens because of that is you get this kind of collaborative filtering effect, and so you actually get that as an added bonus of any fuzzy-searching system that makes use of Freenet. So Freenet doesn't yet do fuzzy searching, but it will have the potential to it in the near future.

Q: Does fuzzy searching compromise anonymity?

A: No, not really. The only thing that you lose is, at the moment, if there's a piece of information stored on your computer by Freenet, there's no way to tell what it is because of the way that it's encrypted. With searching information, it will not be encrypted in the same way. So it will actually be able to tell what pieces of searching information are stored on your computer. But that's not really that serious an issue, given that the searching information is very widely distributed and the fact that searching information as stored on your computer does not indicate that the information itself is stored on your computer.

Q: What do you think about attaching metadata descriptions to the content and doing searches on the metadata rather than the actual ...

A: Well, that's exactly what we do, or that's exactly what we're planning, that each piece of content would actually be described in terms of metadata and you would search on the basis of metadata.

Q: Did you attend the Intel P2P Working Group?

A: I attended a small part of it. I was lucky enough to see Tim O'Reilly's little speech at that.

Q: I just wonder what the need for standards is at this time.

A: Well, trying to come up with standards for something like peer-to-peer is a little bit like trying to decide what color you should be painting your house before you've even started to build it.

Q: ... A bit premature ...

A: Not just a bit, I think it's very premature. Nobody agrees on what peer-to-peer is yet. You can't come up with standards with something until it's very well-defined and people are very confident they know what it is. And right now, peer-to-peer encompasses everything from Freenet, which is really distributed information distribution, to stuff like SETI@Home, which is distributed processing. And it's very difficult to see where the common ground is between those two things, which will be useful for defining a standard.

Richard Koman is a freelancer writer and editor based in Sonoma County, California. He works on SiliconValleyWatcher, ZDNet blogs, and is a regular contributor to the O'Reilly Network.

Return to the P2P DevCenter.

Copyright © 2009 O'Reilly Media, Inc.