Women in Technology

Hear us Roar



Article:
  Keeping Your Life in Subversion
Subject:   What about file integrity checking?
Date:   2005-01-10 06:38:23
From:   chris_marshall
I use CVS to synchronize my home directories across several machines, as you do, although I only use it for synchronization, not history. Also like you, I manage larger collections of files (my music and pictures) via rsync.


The directory I manage this way is where I keep my most important text files. I call it sys-config because I started it as a way of storing the config files


I periodically wipe out my CVS repository and re-import sys-config because of how hard it is to rename files and directories and move them around. This doesn't bother me since I am not interested in keeping track of history for sys-config.


I am curious how you handle the threat of one of your hard disks rotting out from under you, especially for files you haven't looked at in years. How do you know they aren't decaying?


You need to automate the generation and checking of md5sums to fight that. I have toyed around with some scripts but not settled on one approach yet. You need a way to separate the files that are changing a lot from the ones that don't or you will generate a lot of false alarms (md5sum checks will fail for files that you edited and didn't recalculate the first time).


Main Topics Newest First

Showing messages 1 through 1 of 1.

  • What about file integrity checking?
    2005-01-12 12:44:12  joeyh [View]

    This is a very good question.

    First, note that subversion uses a db database with only a few files, not one file per file in the repository, unless you're using the new FSFS backend, with which I am not completly familair.

    It would be nice if subversion kept a md5sum or something of what should be in the repository database files so it could detect small changes due to disk problems. Then you could detect repository breakage immediatly instead of continuing to use a broken repository. Unfortunatly, it does not do this.

    What it does do is keep a md5 checksum of each revision of each file in the repository. It checks these checksums at various points during updates and so on, so if a file in your working copy gets corrupted it will detect that. And it will detect if a disk problem corrupts the content of a file stored in the repository. For example:

    svn: Checksum mismatch on rep '1':
    expected: a70149cb192b21fe371f05ca73e65416
    actual: a2627e0896be2165de90823eaa240a56

    So I don't consider svn's checksum coverage to be complete, but I am sure that I'm getting out the same files that I checked in. To protect against general repository corruption that is not caught by subversion's internal checks, you need to do backups of your repository, preferably using something like svnadmin dump or hot-backup.

    Personally, I do offsite incremental backups of my svn repository using duplicity; these end up gpg encrypted and sha1-summed, and are written to reliable media.