Print

Gnutella and Freenet Represent True Technological Innovation
Pages: 1, 2, 3, 4

The Holy Grail: searching for dynamically generated data

Gnutella is a fairly simple protocol. It defines only how a string is passed from one site to another, not how each site interprets the string. One site might handle the string by simply running fgrep on a bunch of files, while another might insert it into an SQL query, and yet another might assume that it's a set of Japanese words and return rough English equivalents, which the original requester may then use for further searching. This flexibility allows each site to contribute to a distributed search in the most sophisticated way it can. Would it be pompous to suggest that Gnutella could become the medium through which search engines operate in the 21st century?



Status of Gnutella

Gnutella was started by a division of America Online called Nullsoft. America Online cut off support when it heard about the project, afraid of its potential use for copyright infringement. But a programmer named Brian Mayland reverse engineered the protocol and started a new project to develop clients. None of the developers of current software have looked at code from Nullsoft. Gnutella is an open source project with clients registered under the GNU License.

Limitations and risks of Gnutella

Early experiments with Gnutella suggest it is efficient and useful, but has problems scaling. If you send out a request with a time-to-live of 10, for instance, and each site contacts six other sites, up to 106 or 1 million messages could be exchanged.

The exponential spread of requests opens up the most likely source of disruption: denial-of-service attacks caused by flooding the system with requests. The developers have no solution at present, but suggest that clients keep track of the frequency of requests so that they can recognize bursts and refuse further contact with offending nodes.

Furthermore, the time-to-live imposes a horizon on each user. I may repeatedly search a few hundred sites near me, but I will never find files stored a step beyond my horizon. In practice, information may still get around. After all, Europeans in the Middle Ages enjoyed spices from China even though they knew nothing except the vaguest myths about China. All they had to know was some sites in Asia Minor, who traded with sites in Central Asia, who traded with China.

Spencer Kimball, a developer of the Linux client for Gnutella, says this subnetting can serve to protect Gnutella from attack. Gnutella has already suffered service disruptions, mostly because of bugs in clients, and in the future it is certain to be attacked with vicious and sophisticated attempts to bring it down. While some groups of sites have slowed down temporarily or become severed from other groups, the system has never actually come down.

People may misuse Gnutella for other reasons besides denial of service, of course. One site was recently reported to use it for a sting: The site advertised file names that appeared to offer child pornography, then logged the IP address and domain name of every download request. The reason such information was available is that Gnutella uses HTTP; there is no difference between the user information Gnutella offers and that offered by any Web browser.

A final limitation of Gnutella worth mentioning is the difficulty authenticating the source of the data returned. You really have no idea where the data came from -- but that's true of e-mail and news right now too. Clients don't have to choose anonymity; they can identify themselves as strongly as they want. If a Gnutella client chooses to return a URL, that's just as trustworthy as a URL retrieved in any other manner. If a digital signature infrastructure becomes widespread, clients could use that too. I examine reliability and related policy issues in the article The Value of Gnutella and Freenet.

Pages: 1, 2, 3, 4

Next Pagearrow