Subversion for CVS Users - Version Control with Subversionby Ben Collins-Sussman, Brian W. Fitzpatrick, C. Michael Pilato
This appendix is a guide for CVS users new to Subversion. It’s essentially a list of differences between the two systems as “viewed from 10,000 feet.” For each section, we provide references to relevant chapters when possible.
Although the goal of Subversion is to take over the current and future CVS user base, some new features and design changes were required to fix certain “broken” behaviors that CVS had. This means that, as a CVS user, you may need to break habits—ones that you forgot were odd to begin with.
This excerpt is from Version Control with Subversion. Written by members of the development that maintains Subversion, this is the official guide and reference manual for the popular open source revision control technology. The new edition covers Subversion 1.5 and includes an introduction to Subversion, a guided tour of the capabilities and structure, detailed coverage of advanced topics, such as branching and repository administration, and best practice recommendations.
In CVS, revision numbers are per file. This is because CVS stores its data in RCS files; each file has a corresponding RCS file in the repository, and the repository is roughly laid out according to the structure of your project tree.
In Subversion, the repository looks like a single filesystem. Each commit results in an entirely new filesystem tree; in essence, the repository is an array of trees. Each of these trees is labeled with a single revision number. When someone talks about “revision 54,” he’s talking about a particular tree (and indirectly, the way the filesystem looked after the 54th commit).
Technically, it’s not valid to talk about “revision 5 of
foo.c.” Instead, one would say
foo.c as it appears in
revision 5.” Also, be careful when making assumptions about the
evolution of a file. In CVS, revisions 5 and 6 of
foo.c are always different. In Subversion, it’s
most likely that
not change between revisions 5 and 6.
Similarly, in CVS, a tag or branch is an annotation on the file or
on the version information for that individual file, whereas in
Subversion, a tag or branch is a copy of an entire tree (by convention,
/tags directories that appear at the top level
of the repository, beside
the repository as a whole, many versions of each file may be visible: the
latest version on each branch, every tagged version, and of course the
latest version on the trunk itself. So, to refine the terms even further,
one would often say “
it appears in
For more details on this topic, see the section called “Revisions”.
Here’s what this means to you, as a former CVS user:
The svn add and svn delete commands work on directories now, just as they work on files. So do svn copy and svn move. However, these commands do not cause any kind of immediate change in the repository. Instead, the working items are simply “scheduled” for addition or deletion. No repository changes happen until you run svn commit.
Directories aren’t dumb containers anymore; they have revision numbers like files. (Or more properly, it’s correct to talk about “directory
foo/in revision 5.”)
Let’s talk more about that last point. Directory versioning is a hard problem; because we want to allow mixed-revision working copies, there are some limitations on how far we can abuse this model.
From a theoretical point of view, we define “revision 5 of
foo” to mean a
specific collection of directory entries and properties. Now, suppose we
start adding and removing files from
foo, and then commit. It would be a lie to say
that we still have revision 5 of
However, if we bumped
number after the commit, that would be a lie, too; there may be other
foo we haven’t yet
received, because we haven’t updated yet.
Subversion deals with this problem by quietly tracking committed
adds and deletes in the
When you eventually run svn update, all accounts are
settled with the repository, and the directory’s new revision number is
set correctly. Therefore, only after an update is it truly safe
to say that you have a “perfect” revision of a
directory. Most of the time, your working copy will contain
“imperfect” directory revisions.
Similarly, a problem arises if you attempt to commit property changes on a directory. Normally, the commit would bump the working directory’s local revision number. But again, that would be a lie, as there may be adds or deletes that the directory doesn’t yet have, because no update has happened. Therefore, you are not allowed to commit property changes on a directory unless the directory is up to date.
For more discussion about the limitations of directory versioning, see the section called “Mixed Revision Working Copies”.
In recent years, disk space has become outrageously cheap and abundant, but network bandwidth has not. Therefore, the Subversion working copy has been optimized around the scarcer resource.
directory serves the same purpose as the
CVS directory, except that it also stores
read-only, “pristine” copies of your files. This allows you
to do many things offline:
- svn status
- svn diff
- svn revert
Also, the cached pristine files allow the Subversion client to send differences when committing, which CVS cannot do.
The last subcommand in the list, svn revert, is new. It will not only remove local changes, but also unschedule operations such as adds and deletes. Although deleting the file and then running svn update will still work, doing so distorts the true purpose of updating. And, while we’re on this subject…
The cvs status command has two
purposes: first, to show the user any local modifications in the working
copy, and second, to show the user which files are out of date.
Unfortunately, because of CVS’s hard-to-read status output, many CVS users
don’t take advantage of this command at all. Instead, they’ve developed a
habit of running cvs update or cvs -n
update to quickly see their changes. If users forget to use the
-n option, this has the side effect of merging repository
changes they may not be ready to deal with.
Subversion removes this muddle by making the output of svn status easy to read for both humans and parsers. Also, svn update prints only information about files that are updated, not local modifications.
svn status prints all files that have local modifications. By default, the repository is not contacted. While this subcommand accepts a fair number of options, the following are the most commonly used ones:
Contact the repository to determine, and then display, out-of-dateness information.
Show all entries under version control.
Run nonrecursively (do not descend into subdirectories).
The svn status command has two output formats. In the default “short” format, local modifications look like this:
$ svn status M foo.c M bar/baz.c
$ svn status -u M 1047 foo.c * 1045 faces.html * bloo.png M 1050 bar/baz.c Status against revision: 1066
In this case, two new columns appear. The second column contains
an asterisk if the file or directory is out of date. The third column
shows the working copy’s revision number of the item. In the previous
example, the asterisk indicates that
faces.html would be patched if we updated,
bloo.png is a newly added
file in the repository. (The absence of any revision number next to
bloo.png means that it doesn’t yet
exist in the working copy.)
At this point, you should take a quick look at the list of all possible status codes in “ svn status ” in Chapter 9, Subversion Complete Reference. Here are a few of the more common status codes you’ll see:
A Resource is scheduled for Addition D Resource is scheduled for Deletion M Resource has local Modifications C Resource has Conflicts (changes have not been completely merged between the repository and working copy version) X Resource is eXternal to this working copy (may come from another repository). See the section called “Externals Definitions” ? Resource is not under version control ! Resource is missing or incomplete (removed by a tool other than Subversion)
For more details on svn status, see the section called “See an overview of your changes”.
svn update updates your working copy, and prints only information about files that it updates.
Subversion has combined CVS’s
U codes into just
U. When a merge or conflict occurs, Subversion
C, rather than a whole sentence about
For more details on svn update, see the section called “Update Your Working Copy”.
Subversion doesn’t distinguish between filesystem space and “branch” space; branches and tags are ordinary directories within the filesystem. This is probably the single biggest mental hurdle that a CVS user will need to cross. Read all about it in Chapter 4, Branching and Merging.
Since Subversion treats branches and tags as ordinary directories, your project’s various lines of development probably live in subdirectories of the main project directory. So remember to check out using the URL of the subdirectory that contains the particular line of development you want, not the project’s root URL. If you make the mistake of checking out the root of the project, you may very well wind up with a working copy that contains a complete copy of your project’s content for each and every one of its branches and tags.
A new feature of Subversion is that you can attach arbitrary metadata (or “properties”) to files and directories. Properties are arbitrary name/value pairs associated with files and directories in your working copy.
For more information, see the section called “Properties”.
CVS marks conflicts with inline “conflict markers,” and
then prints a
C during an update or
a merge operation. Historically, this has caused problems because CVS
isn’t doing enough. Many users forget about (or don’t see) the
C after it whizzes by on their terminal. They
often forget that the conflict markers are even present, and then
accidentally commit files containing those conflict markers.
Subversion solves this problem in a pair of ways. First, when a conflict occurs in a file, Subversion records the fact that the file is in a state of conflict and won’t allow you to commit changes to that file until you explicitly resolve the conflict. Second, Subversion 1.5 provides interactive conflict resolution, which allows you to resolve conflicts as they happen instead of having to go back and do so after the update or merge operation completes. See the section called “Resolve Conflicts (Merging Others’ Changes)” for more about conflict resolution in Subversion.
In the most general sense, Subversion handles binary files more gracefully than CVS does. Because CVS uses RCS, it can only store successive full copies of a changing binary file. Subversion, however, expresses differences between files using a binary differencing algorithm, regardless of whether they contain textual or binary data. That means all files are stored differentially (compressed) in the repository.
Subversion takes the more paranoid route. First, it never performs any kind of keyword or line-ending translation unless you explicitly ask it to do so (see the section called “Keyword Substitution” and the section called “End-of-Line Character Sequences” for more details). By default, Subversion treats all file data as literal byte strings, and files are always stored in the repository in an untranslated state.
Second, Subversion maintains an internal notion of whether a file is “text” or “binary” data, but this notion is only extant in the working copy. During an svn update, Subversion will perform contextual merges on locally modified text files, but it will not attempt to do so for binary files.
To determine whether a contextual merge is possible, Subversion
property. If the file has no
svn:mime-type property, or has a MIME type that
is textual (e.g.,
assumes it is text. Otherwise, Subversion assumes the file is binary. Subversion also
helps users by running a binary-detection algorithm in the svn import and svn
add commands. These commands will make a good guess and then
(possibly) set a binary
property on the file being added. (If Subversion guesses wrong, the user
can always remove or hand-edit the property.)
Unlike CVS, a Subversion working copy is aware that it has checked out a module. That means if somebody changes the definition of a module (e.g., adds or removes components), a call to svn update will update the working copy appropriately, adding and removing components.
Subversion defines modules as a list of directories within a directory property; see the section called “Externals Definitions”.
With CVS’s pserver, you are required to log into the server (using the cvs login command) before performing any read or write operation—you sometimes even have to log in for anonymous operations. With a Subversion repository using Apache httpd or svnserve as the server, you don’t provide any authentication credentials at the outset—if an operation requires authentication, the server will challenge you for your credentials (whether those are username and password, a client certificate, or even both). So, if your repository is world-readable, you will not be required to authenticate at all for read operations.
The exception to this behavior, however, is in the case of accessing
an svnserve server over an SSH tunnel,
svn+ssh:// URL scheme. In
that case, the ssh program
unconditionally demands authentication just to start the tunnel.
Perhaps the most important way to familiarize CVS users with Subversion is to let them continue to work on their projects using the new system. And while that can be somewhat accomplished using a flat import into a Subversion repository of an exported CVS repository, the more thorough solution involves transferring not just the latest snapshot of their data, but all the history behind it as well, from one system to another. This is an extremely difficult problem to solve; it involves deducing changesets in the absence of atomicity and translating between the systems’ completely orthogonal branching policies, among other complications. Still, a handful of tools claim to at least partially support the ability to convert existing CVS repositories into Subversion ones.
The most popular (and mature) conversion tool is cvs2svn (http://cvs2svn.tigris.org/), a Python program originally created by members of Subversion’s own development community. This tool is meant to run exactly once: it scans your CVS repository multiple times and attempts to deduce commits, branches, and tags as best it can. When it finishes, the result is either a Subversion repository or a portable Subversion dump file representing your code’s history. See the web site for detailed instructions and caveats.
 That is, providing you don’t run out of disk space before your checkout finishes.
If you enjoyed this excerpt, buy a copy of Version Control with Subversion.