For the past several months, I've been researching and developing a "graph visualization" system. That's the technical term for the burgeoning field of creating pretty pictures from relational data. To those of you not steeped in graph theory, "graph" in this context refers not to the familiar X-axis and Y-axis plots from high school algebra but instead to a set of "nodes" that may be connected by "edges" to indicate a relationship. Figure 1 shows a few examples as they might appear in a computer science textbook, where graphs occupy roughly the same position as stacks and queues in the pantheon of data structures.
The power of graphs lies in their abstraction — nodes can represent cities, network hubs, or Mafia figures, and the edges can represent interstate connections, T1 lines, or who whacked who respectively. That abstraction, however, comes with a price: there is no inherent spatial relationship among the nodes, so if you want to see what a graph looks like, you must compute the position of each node yourself.
The field of graph visualization is devoted to finding ways to create meaningful layouts from among the millions of possible node positions. (If you're interested in learning more, Roberto Tamassia of Brown University's Center for Geometric Computing maintains a great list of graph resources.)
Since these diagrams summarize complex information quickly and neatly, graph visualization systems are tremendously useful analytical tools. Toward that end, I've been developing a system to produce large numbers of quality drawings that can be easily managed and manipulated.
Figure 1. Example graphs
This is the story of an unexpected discovery along the way.
Although there are several visualization programs floating around the Web, none did exactly what I wanted at a price I could afford. After some research, I decided to write my own system using a force-directed layout algorithm.
The method, based on a physical simulation, models nodes as charged particles and edges as springs. Initially, the nodes are plopped down randomly in a plane, and the program computes the force vectors that result as the particles push each other apart while the springs pull them together. Each node's position updates accordingly. The process repeats until the system reaches equilibrium. While it can take thousands of iterations to draw even a simple graph, the method is simple to code and works surprisingly well in a variety of circumstances.
Once I'd completed the prototype, I was ready to generate lots of samples. Since combing Introduction to Algorithms for sample graphs was truly painstaking, I hit upon a more automated approach: using the letters in a word as nodes and the relationship between letters as edges. For example, the word "david" can be converted into a graph with four nodes ("d", "a", "v", and "i") and four edges ("d-a", "a-v", "v-i", and "i-d"). The following table shows "david" encoded as an XML input file to the program and the program's resulting output.
|"David" Graph in XML Format||"David" Graph Drawing|
This approach has several advantages. First, and most obvious, there are lots and lots of words; you can download free "word lists" with thousands of entries, so finding test data is a breeze. Second, words are easy to convert into graphs — it took all of 10 minutes to write a program to encode a word into the XML-format shown in the example. Third, words have many different structures, so they produce a variety of shapes. Finally, and most important, it's easy to verify that the program works correctly since there is always a route through the nodes that spells the original word. (In the example above, we can find "david" by starting at "d" and working our way counterclockwise.)
Here are a few representative examples of the diversity of shapes found in just six words. Assuming you can use the same letter multiple times, notice how you can always find a route through the edges to spell the original word.
At this point the law of unintended consequences took over. While I had planned to build a sophisticated data analysis tool by merging ideas from physics and graph theory, everyone who saw the test cases were so intrigued that the conversation usually never got past "Cool — 'utilitarian' looks like a rocket from the 50s!" or (from a chemist friend) "Ohhh, that one looks like Benzene!"
So, sensing an opportunity to provide the world with an amusing diversion based on graph theory — and how many people have that on their resumes? — I've developed a puzzle based on the idea. I hope you enjoy the samples on the O'Reilly Network over the coming month, and I look forward to your comments.
Combining elements of crosswords and mazes, RouteWord presents the player with a clue and a network of linked letters. The challenge is to find the route through the letters to reveal the hidden word.
|Argentina Puzzle||Argentina Solution|
Each letter of the solution appears once in the diagram, no matter how many times it appears in the word.
Consecutive double letter pairs (i.e., the "ss" in "impressed") are treated as a single letter.
Letters and links (both forwards and backwards) can be used multiple times.
Links may cross, but this does not indicate anything special.
There is always a route through the edges that reveals the solution.
Example 1. "diversion"
Visit http://www.routeword.com/ to sign up to receive a daily RouteWord via email, as well as to create and submit your own words and clues for publication.
Andrew Odewahn is the CTO of O'Reilly Media. The author of two books on database development, he has experience as a software developer and consultant in a number of industries, including manufacturing, pharmaceuticals, and publishing.
Return to ONLamp.com.
Copyright © 2009 O'Reilly Media, Inc.