Version Control with Subversion: Introductionby Ben Collins-Sussman, Brian W. Fitzpatrick, C. Michael Pilato
[ Editor's Note: This is a preview of Chapter 1 from O'Reilly's upcoming Version Control with Subversion. ]
Version control is the art of managing changes to information. It has long been a critical tool for programmers, who typically spend their time making small changes to software and then undoing those changes the next day. But the usefulness of version control software extends far beyond the bounds of the software development world. Anywhere you can find people using computers to manage information that changes often, there is room for version control. And that's where Subversion comes into play.
This chapter contains a high-level introduction to Subversion: what it is; what it does; how to get it.
What is Subversion?
Subversion is a free/open-source version control system. That is, Subversion manages files and directories over time. Atree of files is placed into a central repository. The repository is much like an ordinary file server, except that it remembers every change ever made to the files and directories. This lets you recover older versions of data, or examine the history of how your data changed. In this regard, many people think of a version control system as a sort of time machine.
Subversion can access its repository across networks, which allows it to be used by people on different computers. At some level, the ability for various people to modify and manage the same set of data from their respective locations fosters collaboration. Progress can occur more quickly without a single conduit through which all modifications must occur. And because the work is versioned, you need not fear that quality is the trade-off for losing that conduit—if some incorrect change is made to the data, just undo that change.
Some version control systems are also software configuration management (SCM) systems. These systems are specifically tailored to manage trees of source code, and have many features that are specific to software development—such as nativelyunderstanding programming languages, or supplying tools for building software. Subversion, however, is not one of these systems. It is a general system that can be used to manage any collection of files. For you, those files might be source code—for others, anything from grocery shopping lists to digital video mixdowns and beyond.
In early 2000, CollabNet, Inc. began seeking developers to write a replacement for CVS. CollabNet offers a collaboration software suite called SourceCast, of which one component is version control. Although SourceCast used CVS as its initial version control system, CVS's limitations were obvious from the beginning, and CollabNet knew it would eventually have to find something better. Unfortunately, CVS had become the de facto standard in the open source world largely because there wasn't anything better, at least not under a free license. So CollabNet determined to write a new version control system from scratch, retaining the basic ideas of CVS, but without the bugs and misfeatures.
In February 2000, they contacted Karl Fogel, the author of Open Source Development with CVS (Coriolis, 1999), and asked if he'd like to work on this new project. Coincidentally, at the time Karl was already discussing a design for a new version control system with his friend Jim Blandy. In 1995, the two had started Cyclic Software, a company providing CVS support contracts, and although they later sold the business, they still used CVS every day at their jobs. Their frustration with CVS had led Jim to think carefully about better ways to manage versioned data, and he'd already come up with not only the name Subversion, but also with the basic design of the Subversion repository. When CollabNet called, Karl immediately agreed to work on the project, and Jim got his employer, RedHat Software, to essentially donate him to the project for an indefinite period of time. CollabNet hired Karl and Ben Collins-Sussman, and detailed design work began in May. With the help of some well-placed prods from Brian Behlendorf and Jason Robbins of CollabNet, and Greg Stein (at the time an independent developer active in the WebDAV/DeltaV specification process), Subversion quickly attracted a community of active developers. It turned out that many people had had the same frustrating experiences with CVS, and welcomed the chance to finally do something about it.
The original design team settled on some simple goals. They didn't want to break new ground in version control methodology, they just wanted to fix CVS. They decided that Subversion would match CVS's features, and preserve the same development model, but not duplicate CVS's most obvious flaws. And although it did not need to be a drop-in replacement for CVS, it should be similar enough that any CVS user could make the switch with little effort.
After fourteen months of coding, Subversion became self-hosting on August 31, 2001. That is, Subversion developers stopped using CVS to manage Subversion's own source code, and started using Subversion instead.
While CollabNet started the project, and still funds a large chunk of the work (it pays the salaries of a few full-time Subversion developers), Subversion is run like most open-source projects, governed by a loose, transparent set of rules that encourage meritocracy. CollabNet's copyright license is fully compliant with the Debian Free Software Guidelines. In other words, anyone is free to download, modify, and redistribute Subversion as he pleases; no permission from CollabNet or anyone else is required.
When discussing the features that Subversion brings to the version control table, it is often helpful to speak of them in terms of how they improve upon CVS's design. If you're not familiar with CVS, you may not understand all of these features. And if you're not familiar with version control at all, your eyes may glaze over unless you first read Chapter 2, in which we provide a gentle introduction to version control in general.
- Directory versioning
- CVS only tracks the history of individual files, but Subversion implements a virtual versioned filesystem that tracks changes to whole directory trees over time. Files and directories are versioned.
- True version history
- Since CVS is limited to file versioning, operations such as copies and renames— which might happen to files, but which are really changes to the contents of some containing directory—aren't supported in CVS. Additionally, in CVS you cannot replace a versioned file with some new thing of the same name without the new item inheriting the history of the old—perhaps completely unrelated— file. With Subversion, you can add, delete, copy, and rename both files and directories. And every newly added file begins a with a fresh clean history all its own.
- Atomic commits
- Acollection of modifications either goes into the repository completely, or not at all. This allows developers to construct and commit changes as logical chunks, and prevents problems that can occur when only a portion of a set of changes is successfully sent to the repository.
- Versioned metadata
- Each file and directory has a set of properties—keys and their values— associated with it. You can create and store any arbitrary key/value pairs you wish. Properties are versioned over time, just like file contents.
- Choice of network layers
- Subversion has an abstracted notion of repository access, making it easy for people to implement new network mechanisms. Subversion can plug into the Apache HTTP Server as an extension module. This gives Subversion a big advantage in stability and interoperability, and instant access to existing features provided by that server—authentication, authorization, wire compression, and so on. Amore lightweight, standalone Subversion server process is also available. This server speaks a custom protocol which can be easily tunneled over SSH.
- Consistent data handling
- Subversion expresses file differences using a binary differencing algorithm, which works identically on both text (human-readable) and binary (humanunreadable) files. Both types of files are stored equally compressed in the repository, and differences are transmitted in both directions across the network.
- Efficient branching and tagging
- The cost of branching and tagging need not be proportional to the project size. Subversion creates branches and tags by simply copying the project, using a mechanism similar to a hard-link. Thus these operations take only a very small, constant amount of time.
- Subversion has no historical baggage; it is implemented as a collection of shared C libraries with well-defined APIs. This makes Subversion extremely maintainable and usable by other applications and languages.
Figure 1-1 illustrates what one might call a mile-high view of Subversion's design.
Figure 1-1. Subversion's architecture
On one end is a Subversion repository that holds all of your versioned data. On the other end is your Subversion client program, which manages local reflections of portions of that versioned data (called working copies). Between these extremes are multiple routes through various Repository Access (RA) layers. Some of these routes go across computer networks and through network servers which then access the repository. Others bypass the network altogether and access the repository directly.
Subversion is built on a portability layer called APR (the Apache Portable Runtime library). This means Subversion should work on any operating system that the Apache httpd server runs on: Windows, Linux, all flavors of BSD, Mac OS X, Netware, and others.
The easiest way to get Subversion is to download a binary package built for your operating system. Subversion's website often has these packages available for download, posted by volunteers. The site usually contains graphical installer packages for users of Microsoft operating systems. If you run a Unix-like operating system, you can use your system's native package distribution system (RPMs, DEBs, the ports tree, etc.) to get Subversion.
Alternately, you can build Subversion directly from source code. From the Subversion website, download the latest source-code release. After unpacking it, follow the instructions in the INSTALL file to build it. Note that a released source package contains everything you need to build a command-line client capable of talking to a remote repository (in particular, the apr, apr-util, and neon libraries). But optional portions of Subversion have many other dependencies, such as Berkeley DB and possibly Apache httpd. If you want to do a complete build, make sure you have all of the packages documented in the INSTALL file. If you plan to work on Subversion itself, you can use your client program to grab the latest, bleeding-edge source code. This is documented in "Get the Source Code."
Subversion, once installed, has a number of different pieces. The following is a quick overview of what you get. Don't be alarmed if the brief descriptions leave you scratching your head—there are plenty more pages in this book devoted to alleviating that confusion.
- The command-line client program
- Aprogram for reporting the state (in terms of revisions of the items present) of a working copy
- A tool for inspecting a Subversion repository
- A tool for creating, tweaking or repairing a Subversion repository
- A program for filtering Subversion repository dumpfile format streams
- Aplug-in module for the Apache HTTP Server, used to make your repository available to others over a network
- Acustom standalone server program, runnable as a daemon process or invokable by SSH; another way to make your repository available to others over a network
Assuming you have Subversion installed correctly, you should be ready to start. The next two chapters will walk you through the use of svn, Subversion's command-line client program.