advertisement

February 2003 Archives

O´Reilly´s Digital Media Blogs have been expanded and are now located at a new home. To find our new blogs, please visit:
Robert Kaye

AddThis Social Bookmark Button

Related link: http://codecon.info

On day three of CodeCon 2.0, Andrew Loewenstern presented Khashmir, a distributed hash table library based on the Kademlia algorithm. The Kademlia algorithm follows in the footsteps of the Chord algorithm, but improves on its performance in a few areas. The basic idea behind distributed hash table (DHT) algorithms is that like a regular hash tables they can be used to store key-value pairs, but the nodes of the hash table may reside on different peer servers. This algorithm is perfectly suited for implementing distributed indexing schemes in P2P systems. The Khashmir library utilizes UDP via the AirHook library — using UDP allows the Kashmir library to enter into NATted networks, which is crucial for getting widespread acceptance of P2P systems. The API for Khashmir is pretty simple — it supports the findeNode(), valueForKey(), storeValueForKey(), and addContact() functions as its basic operations.

Deep Green was presented by Michael F. Korns. The Deep Green presentation stood out a bit from the other presentations, since it was a financial application with a long history and it makes money! Deep Green is a web application that was designed to allow a single person to run a mutual fund of up to $1Bln by themselves. The Deep Green back-end system includes three million machine learning agents (neural nets, baysian classifiers, nonlinear multivariate regression, etc) that continually analyze the current and historical stock market data to provide the end user with information on how to manage the mutual fund. The most confusing aspects of the presentation was the business arrangements surrounding Deep Green. Korns and Associates, the developers of Deep Green use the system to manage an internal mutual fund that has outperformed all other mutual funds. The proceeds from this internal fund are used to further develop Deep Green. Furthermore, the company Invest by Agent is tasked with bringing the Deep Green software to market, but currently has no customers. Michael’s presentation was the most enigmatic presentation at CodeCon, and I am still confused about how all the pieces fit together. But I must say that hacking on code to make money to continue hacking seems like an admirable model.

Roberto Bayardo presented YouServ. The main idea behind YouServ is to allow users to host web pages if they don’t have a web server that can be available 100% of the time. Running a web server is not an easy task for people who have transient net connections or are running firewalls or NATs. Thus the websites may not be up all the time which makes it difficult for users to view the web pages or for Google to index the pages. YouServ takes a P2P approach to provide an easy to use cooperative web server that works automatically behind firewalls or NATs, ensures secure content access, provides a single login for restricted access, and provides search capabilities. The system relies on a central server to act as the single login access point and as presence manager which keeps track of the nodes in the P2P network that are currently available to serve the content of the cooperative web server. The central server stores no content and only takes care of the lightweight tasks to coordinate the network. Each of the peers in the network does the heavy lifting work of serving out the content of the pages in the network, but never deal with user authentication. To provide search services, the main server caches site summaries and is the first step in providing the user with search services. Once the main server uses the cached site summaries to narrow down the list of P2P nodes which could contain matching content, the P2P nodes themselves are contacted to complete the searching service. YouServ is written in 100,000 lines of Java 1.3 code and is self contained except for its dependence on bind for DNS services.

The next presentation was by Rich Bodo on Bayonne, GNU’s VoIP project. Unfortunately this presentation was quite confused — Rich used two computers, two sets of slides and he would quickly jump between between these which made the presentation hard to follow. Then, his demonstration did not work and he attempted to recompile kernel modules while the audience was watching, leaving the audience dangling for minutes at a time. In the end I gathered that the system uses a three tiered model (though I’m still not clear exactly what the three tiers are), is written in C++ and has a web services interface. It also contains its own internal script language that can be used to script advanced telephony applications. He briefly showed a script that would implement a minimal voicemail system using Bayonne, but he was unable to get the script to run. If you want more concrete details, please check the website — I’m still confused.

Last, but not least, was Raph Levien’s excellent Advogato presentation. Advogato, lets open source developers rate (certify) one another to establish a reputation rating for each of its members. Its accomplishes this with its crafty trust metric that determines membership in a community. The input to his trust metric is a graph of certifications (person A certifies person B at this level, person B certifies person C at that level…) and assumes that the input data is not 100% clean. This means the input data could be subject to a malicious attack, spammers, stupid users and legitimate controversy in the certifications. In light of these adverse conditions, Advogato works without a central authority that ensures that everyone is playing fair and still produces good results. The trust metric is resistant to a variety of attacks but it is not perfect — it gives good results for most cases. Raph didn’t go into the details of how the algorithm works due to the higher math required to grok the algorithm. The trust metric employs network flow theory, eigenvectors, power law vectors and social network theory — enough math to scare off even most CodeCon geeks. Raph also discussed the success of Advogato and of the other prominent trust metric on the net: Google’s PageRank system. The PageRank system also employs a trust metric and has a latency of only 100ms for a dataset of over 3 billion nodes. Impressive to say the least.

And that wrapped up CodeCon 2.0. Compared to last year CodeCon has matured immensely and both Len Sassaman and Brahm Cohen have taken their lessons from CodeCon 1.0 to heart and delivered an excellent confernence. Congratulations to Len & Brahm and their small army of volunteers and thanks to Up Networks, Google, No Starch Press, and LinuxFund for sponsoring CodeCon 2.0.

CodeCon 2.0 — what did you think?

Bill Venners

AddThis Social Bookmark Button

Related link: http://www.artima.com/wbc/interprog.html

In January 2003, I attended a Writing Better Code summit in Portland, Oregon, organized by Scott Meyers and Bruce Eckel. At the three-day summit, 15 people gathered to discuss code quality and how they could improve it. Throughout this discussion, one theme was clear: good code is written by good programmers. Therefore, one great way to improve the code quality within an organization is to hire better programmers. The trouble is, recognizing a good programmer among a pool of job applicants is not easy.

Finding good programmers is hard because good programming is dependent on much more than just knowledge of programming language syntax. You need someone who, despite wearing striped pants with a polka dot shirt, has a good sense of taste in OO design. You need someone who is creative enough to find innovative solutions to problems, yet anal retentive enough to always line up their curly braces. You need someone who is humble enough to be open to suggestions for improvement, but arrogant enough to stand firm and provide leadership when they are the best person to provide it. How can you tell all this about a stranger by spending 30 minutes with them in a conference room?

The final morning of the Writing Better Code summit, Bruce Eckel announced he was “hijacking” the meeting. Bruce wanted each person at the table to share his or her interview techniques. He wanted to know how we recognize a good programmer in an interview. In this article, I highlight some interview techniques discussed that morning.

Bill Venners

AddThis Social Bookmark Button

Related link: http://www.artima.com/suiterunner/xmlreporter.html

Artima SuiteRunner is a free open source testing toolkit for Java released under the Open Software License. You can use this tool with JUnit to run existing JUnit test suites, or standalone to create unit and conformance tests for Java APIs.

One advantage of using Artima SuiteRunner to run your JUnit tests is Artima SuiteRunner’s reporter architecture. A reporter is an object that collects test results and presents them in some way to users. Artima SuiteRunner includes several build-in configurable reporters that can write to the standard error and output streams, files, and a graphical user interface. But Artima SuiteRunner also supports custom reporters. If you want to present results of tests in a different way, such as HTML, email, database, or log files, you can create your own custom reporter that presents results in those ways. This tutorial will show you how to create a custom reporter, using as an example a custom reporter that formats test results in XML.

Bill Venners

AddThis Social Bookmark Button

Related link: http://www.artima.com/intv/pycomm.html

Python creator Guido van Rossum says, “I’m an email junky. I’ve received many emails from both experienced and beginning Python users. Their suggestions register in my brain, and at some point, manifest into a better design decision.”

Here’s another excerpt from this Artima.com article:

In the early days I was fairly quick to adopt new ideas, and then I realized the community was growing and that meant more and more contributions. I had to be more selective. My first step was always saying no. Then, if people didn’t take no for an answer, I would ask for arguments. Why do you think this is useful not just for you but for a large number of Python users?

If you are writing one particular approach for a popular application area, but there are lots of different ways of doing it, I won’t put your particular way in the standard library if I can help it. But if there’s one obvious way, clearly one best approach, I’m much more likely to put it into the standard library.

Robert Kaye

AddThis Social Bookmark Button

Related link: http://codecon.info

Brandon Wiley of the Foundation for Decentralization Research presented href="http://tristero.sf.net/alluvium">Alluvium, a decentralized low bandwidth system for web casting, which applies basic swarmcasting techniques to streaming music across the Internet. Instead of downloading a continuous stream, Alluvium publishes a playlist and clients download the tracks from multiple sources across the net and just before playback the tracks get sequenced back into a stream on the destination machine. Alluvium utilizes the Open Content Networks download model and maps a set of streaming channels onto a set of files in the network. Alluvium will also include a client portion that sequences all the tracks into an Icecast stream that can be played by any streaming music player. And if that is not enough, it also works for live streams — it takes the live stream, breaks it into discrete files and then puts the files into the Open Content Network where the clients can download the files and stitch them into a coherent stream. Brandon would like to make freely available music available (preferably Creative Commons or Open Audio licensed content) in Alluvium streams to give emerging artists more exposure. Last, but not least, Brandon asked the community to donate a buck in order to prove to the IRS that the Foundation for Decentralized Research has public support, which is needed to gain tax exempt status.

Jim Young and James Hong presented their user interface design experiences learned from creating and maintaining the HOTorNOT site. According to the duo of web designers, the user interface is the first point of failure between your application and the user. Good interfaces are easy to learn, but still efficient for experienced users and will increase web site usage and retain more users. Web sites should cater to beginners and make common tasks easy, yet the training wheels that guide new users through using the site should disappear once the user has successfully mastered the skill that was being presented. One of the most important overall aspects of a well designed site is that the site needs to be fast to ensure that users don’t get bored and wander off. Another important lesson they learned from the project concerned the placement of banner ads — the click-through rate for banner ads greatly depends on the placement of the banner ad. If the ad is placed outside of scope of how the users interact with the site, the banner ads are less likely to get clicked on. The click through rates for HOTorNOT improved drastically when the banner ad was placed in the path of users using the site.

The Hydan steganography algorithm was presented by Rakan El-Khalil. Steganography (not stenography) is the concept of hiding messages in text files, images or even sound files. The Hydan algorithm is designed to hide messages in binary executable files on the i386 instruction set. Since steganography relies on redundancy in the medium to hide its messages it is difficult to hide messages in binary executables, since CPU instruction sets are designed to contain as little redundancy as possible. The core concept of the algorithm is to slightly change some instructions in a manner that will not alter the execution of the host program. For instance, it is possible to change a subtract instruction into an add instruction by simply negating the value that is being subtracted. Rakan then outlined a few methods for how to traverse a binary application (e.g. random walk) to look for changed instructions in order to retrieve the message from the binary executable. The algorithm requires about 150 binary executable bytes for each 1 byte of message that is to be embedded into an application.

Nick Mathewson presented MixMinion, a third generation anonymous remailer. Nick, at least we had to believe it to be Nick, pranced on stage wearing a facemask to illustrate his point of anonymity. I suppose thats a good gag for an anonymity hacker, but during the talk he slipping up and admitted to having written the code for MixMinion. And with that slip he pulled off his mask and reiterated his point that even the smallest slip-up will compromise anonymity. During his presentation he outlined the previous generations of anonymous remailers and their flaws and how MixMinion will attempt to avoid these flaws. The goals for MixMinion include a public specification (perhaps IETF/RFC bound) that will provide more public scrutiny, increased interoperability and will hopefully encourage the community to create other implementations. Many of the specific achievements of the new MixMinion remailer are beyond the scope of this coverage of CodeCon — for details, please check out the MixMinion project for more details.

Dan Kaminsky the TCP/IP hacker extraordinaire demonstrated and discussed tools from his Paketto Keiretsu. This package of TCP/IP tools does things that Vint Cerf had never imagined that anyone would do with TCP/IP. For instance, his blazingly fast scanrand port scanner utilizes TCP/IP sequence numbers and TTL values instead of actually opening connections to each of the ports being scanned. The Paketto Keiretsu includes a bunch of neat TCP/IP utilities that he demonstrated, including a tool that uses standard command line redirection to grab and put packets onto the network. When piping this output to the strings command he was able to pick out URLs of web sites that the audience was currently requesting via their wireless connections. Some of the useful things that Dan has found in his TCP/IP hacking go beyond the scope of system security. For instance, one of the tools can quickly establish the true hop count between two hosts, which could be used by P2P networks to reorganize the network dynamically to maximize network efficiency. Also, different operating systems have different delays and retry counts for certain network operations, and these delays and counts can be measured to classify the operating system of the remote host.

The second day of CodeCon 2.0 was packed with as many valuable presentations as the first day and nearly all the demonstrations ran flawlessly. Big thumbs up for day 2!

CodeCon 2.0 — getting better all the time?

Robert Kaye

AddThis Social Bookmark Button

Related link: http://codecon.info

This year CodeCon convened at Club NV, which is better suited to the task than last years venue. Club NV was little brighter, less cramped, and sported a better podium setup for the speakers. Also improved was the projection system, but the speakers still need to work out some better methods for displaying source code on the projection screen. Then again, reading code off a projection screen is not exactly what projection screens were designed for — call it an oxymoron of CodeCon. The informal introduction of speakers and the AV glitches we loved from the first CodeCon are in full force again this year.

Compared to the first year, CodeCon 2003 seems to have about the same attendance, which I think is a good thing. Too many times good conferences get overrun by idiots after a while — Comdex is probably the prime example here. And one last general observation — the number of female geeks present at CodeCon is once again grossly out of proportion to the male geeks. What is it going to take to get more female geeks involved with computer science?

The first presentation was by Paul Lambert on Cryptopy his Python crypto APIs — its quite amazing to see that many of the scripting languages are going through the pains of creating native APIs for their languages in order to avoid being dependent on C/C++ based APIs. The amount of duplicated effort going into these APIs is staggering, but duplicated effort is something that is common and accepted in Open Source world.

OpenRatings, presented by J. Paul Reed and Brian Morris (both of my alma mater of Cal Poly), is a good example of an open source project with the usual engineering challenges, combined with interesting legal and political aspects. OpenRatings is an open source web application where college students can rate and review their professors. However, some professors don’t take well to critcism, and thus feel that the site should be shut down. However, OpenRatings systems typically don’t run on campus resources, and thus university officials have no recourse to take down or censor the sites. The technical issues with the software are the typical run of the mill issues that need to be addressed in any tech application, but the political aspects of the project make it unique. The human aspects of users interacting with the system bring out the need for acceptable use policies and other social hacking aspects that address political issues of the project as the project matures. A lot of open source software projects are typically not exposed to these types of problems. However, the Open Ratings project has a getting started guide that packages up all the social learning from the past to give new instances of the project at other universities a head start on avoiding the political problems.

GNU Radio by Eric Blossom & Matt Ettus, provides a software kit for building and deploying software radios and fosters learning about DSP and communications systems. The kit uses software and hardware to build generic receivers for that can decode radio and television signals in software. This high tech, high horsepower approach to listening to the radio enables a host of new features that traditional receivers cannot do. These high tech radios can to broad spectrum monitoring, listening/decoding multiple radio/TV channels at the same time. Using generalized data capture devices and custom DSP software setups, GNU Radio can capture and decode radio, TV, HDTV, public safety communications and cell phone conversations. GNU Radio can work with the input from the receivers to decode just about anything on the public airwaves. Furthermore, GNU radio aims to use this technology to create cognitive radio systems that could analyze the public airwaves and optimize utilization of the spectrum. All of this cool and fun, but none of the mentioned applications have any real impact until Eric started talking about the MPAA’s Broadcast Flag. What good is a broadcast flag that is supposed to prevent devices from copying the broadcast content when it can be decided by generalized hardware?

Sam Joseph presented NeuroGrid, his decentralized fuzzy metadata search system. His approach to searching for documents on the net uses RDF to present metadata as a graph comprised of subject, predicate and object triples. Unlike other search systems, NeuroGrid relies on users to provide metadata for inclusion in the search system and employs user feedback to establish and strengthen search pathways in the NeuroGrid. And to top it all off, the system is decentralized with multiple NeuroGrid nodes working together to execute the search queries. It sounds like Sam still has a fair amount of testing to do to ensure the system will scale, but NeuroGrid sounds like one of the most advanced decentralized search systems out there.

And to close the first day of CodeCon, a panel comprised of Larry McVoy (BitKeeper), Jonathan Shapiro (OpenCM) and Greg Stein (Subversion) discussed current developments in version control. I am an avid user of CVS and have been quite satisfied with it for the last four years I’ve used it, so I was surprised to see three passionate people talking about how CVS is broken and that new approaches to version control are required. Greg Stein even went on to say that: “CVS is highly resistant to being fixed.”, which drew a round of laughter from the crowd. In turn each person described the efforts of their groups/companies to fix the flaws in CVS and each of the three panelists had a unique perspective on the future of version control. Larry McVoy from BitKeeper, which provides version control software for the Linux kernel development team, outlined the ability to commit to a local repository and then synchronizing the local repository with a central server later on, and advanced features that do automatic merging features. OpenCM focuses on secure and consistent source code repositories that can accurately reproduce past snapshots (which apparently CVS has issues with) and Subversion aims to be a drop in replacement for CVS that has support for simpler branching and tagging by using duplicate copies of the files in question. The message overall was that if CVS works for you, don’t sweat it and keep using it. But, for some development teams (most of them are apparently commercial dev teams) the limitations of CVS are not acceptable and these dev teams should look towards BitKeeper, OpenCM and Subversion as alternatives to CVS.

The overall format and execution of CodeCon has matured quite a bit since last year. Big kudos to Len Sassaman and Brahm Cohen for going though the pain and effort of putting on a true geek conference. Thank you!

CodeCon — what do you think?

Richard Koman

AddThis Social Bookmark Button

Related link: http://www.washingtonpost.com/ac2/wp-dyn/A37942-2003Feb20

In ruling that the Bells don’t have to offer their lines to competitors at regulated rates, the FCC has essentially given the high-speed bandwidth business to the phone company. They can easily price competitors out of the business; the only alternative if for competitors to build their own networks. In the ultimate Coke v. Pepsi choice, you choose: do you want to get broadband from your cable company monopoly or your phone company monopoly. And this from the people who advocate competition and the free market????

talk about it

Bill Venners

AddThis Social Bookmark Button

Related link: http://www.artima.com/intv/strongweak.html

Artima.com has published Part V of an interview with Python creator Guido van Rossum, in which he talks about the robustness of systems built with strongly and weakly typed languages, the value of testing, and whether he’d fly on an all-Python plane.

Here’s an excerpt:

Guido van Rossum: That attitude sounds like the classic thing I’ve always heard from strong-typing proponents. The one thing that troubles me is that all the focus is on the strong typing, as if once your program is type correct, it has no bugs left. Strong typing catches many bugs, but it also makes you focus too much on getting the types right and not enough on getting the rest of the program correct.

Strong typing is one reason that languages like C++ and Java require more finger typing. You have to declare all your variables and you have to do a lot of work just to make the compiler happy. An old saying from Unix developers goes something like, “If only your programs would be correct if you simply typed them three times.” You’d gladly do that if typing your programs three times was enough to make them work correctly, but unfortunately it doesn’t work that way.

All that attention to getting the types right doesn’t necessarily mean you don’t have other bugs in your program. A type is a narrow piece of information about your data. When you look at large programs that deal with a lot of strong typing, you see that many words are spent working around strong typing.

Lucas Gonze

AddThis Social Bookmark Button

Related link: http://xxx.lanl.gov/PS_cache/cond-mat/pdf/0301/0301086.pdf

You attack and kill a node. Its load is redistributed elsewhere. The nodes that take on this new work are now more likely to be overloaded, which means they’re weaker. You attack these overloaded nodes. When they go down their responsibilities, which include those of the first victim, are also redistributed. Nodes that take on this new load have even more new work than after the first attack, so are even more overloaded, and are more vulnerable.

“Cascade-based attacks on complex networks” analyzes such attacks. The abstract is:


We live in a modern world supported by large, complex networks. Examples range from financial markets to communication and transportation systems. In many realistic situations the flow of physical quantities in the network, as characterized by the loads on nodes, is important. We show that for such networks where loads can redistribute among the nodes, intentional attacks can lead to a cascade of overload failures, which can in turn cause the entire or a substantial part of the network to collapse. This is relevant for real-world networks that possess a highly heterogeneous distribution of loads, such as the Internet and power grids. We demonstrate that the heterogeneity of these networks makes them particularly vulnerable to attacks in that a large-scale cascade may be triggered by disabling a single key node. This brings obvious concerns on the security of such systems.

Their key finding is that self-healing networks have pretty good resistance to natural failures, but not to deliberate attacks on high-degree nodes.

This doesn’t apply to all peer-to-peer apps. BitTorrent, for example, doesn’t do load balancing in a way that would make it susceptible. A distributed hash table, on the other hand, would be very susceptible.

Thanks to Anatole Shaw for the link.


Update: Gojomo points out that I’m dead wrong for DHTs with each real node at multiple places in the virtual space. He’s right, and not just when each real node is at multiple virtual places. This attack only applies to networks with uneven distributions, which is exactly what the original paper said.

Lucas Gonze

AddThis Social Bookmark Button

Related link: http://www.aei.brookings.org/publications/abstract.php?pid=302

A paper from the Brookings Institution argues that a fundamentalist approach to copyright is harmful to the market for music. From the abstract:

This article questions the economic justification for copyright law’s prohibition against unauthorized copying. Building on the thesis of Stephen Breyer’s 1970 Harvard Law Review article, “The Uneasy Case for Copyright,” it contends that not only may copyright law’s prohibition against unauthorized copying (17 U.S.C. §106) not be necessary to stimulate an optimal level of new creations, but that §106 appears to have a net negative effect on such output!

The best case we can make against the copyright fundamentalists is economic. We are for deregulating the music industry, they are for government-sanctioned monopoly. We want a free market, they want protection from competition. This paper develops that argument in a disciplined way.

Bill Venners

AddThis Social Bookmark Button

Related link: http://www.artima.com/suiterunner/start.html

Artima.com has published an article that introduces the main features of Artima SuiteRunner, a free open source testing toolkit, by demonstrating how to start Artima SuiteRunner, debug a failed test, select and run Suites, select and run JUnit TestCases, and edit and save recipe files.

Here’s an excerpt:

Artima SuiteRunner is a free open source testing toolkit for Java released under the Open Software License. You can use this tool with JUnit to run existing JUnit test suites, or standalone to create unit and conformance tests for Java APIs. To get you up and running quickly, the Artima SuiteRunner distribution ZIP file includes a simple example. This article uses the example to introduce the main features of Artima SuiteRunner. This article will show you how to start Artima SuiteRunner using a recipe file, debug a failed test, select and run Suites, select and run JUnit TestCases, and edit and save recipe files.

Bill Venners

AddThis Social Bookmark Button

Related link: http://www.artima.com/intv/pycontract.html

Artima.com has published Part IV of an interview with Python creator Guido van Rossum, in which he talks about about the nature of contracts in a runtime typed programming language such as Python.

Here’s an excerpt:

In general in Python, there is a contract, but the contract is implicit. The contract isn’t specified by an interface. There’s nothing in what the parser sees at least that says x has to be an object that supports readline that you can call with no arguments and it returns a string that means a certain thing. But that contract is certainly in the documentation or specification.

In Java, if you say this is something that has a readline method that returns a string, what does it mean? Do you expect it to always return the same string? Does it ever return an empty string? There are all sorts of things that aren’t expressed by that interface that you still have to specify in documentation. That’s where the interesting competition between the different languages exists.