ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


FreeBSD Basics Understanding CPIO

by Dru Lavigne
07/11/2002

In the previous article, I demonstrated the usage of the tar archiver utility. This week I'll continue by introducing the cpio archiver utility.

While both tar and cpio will achieve the same results, the cpio utility approaches things a little bit differently. The tar utility assumes that you want to recursively archive everything under the specified directory or directories, meaning that you have to explicitly tell tar if you want to exclude certain portions of that directory structure. In contrast, the cpio utility expects to be explicitly told which files or directories you wish to archive; this behavior is commonly referred to as "receiving from standard input." In other words, cpio expects to receive a list that contains one file per line, and if you remember from the "Finding Things in Unix" and "Find: Part Two" articles, that is exactly the type of list that the find utility creates. The ls utility can also create this type of list, meaning that you will see either of the ls or the find utilities used in conjunction with cpio. And since cpio archives a list of files it receives from standard input, you usually use a pipe (|) whenever you create an archive with the cpio utility.

The tar utility also assumes that you want to write the archive to your first SCSI tape drive, unless you explicitly specify a file using the f switch. In contrast, the cpio utility writes to what is known as standard output. This means that you will be using a redirector (either < or >) whenever you are creating, listing, or extracting a cpio archive file. Again, that file may be an actual file, or it may be your floppy, or it may be a tape device, since in Unix everything is a file.

This may sound a bit more complicated at first, but a few examples should convince you that it really isn't.

Let's start by creating a cpio archive. In the last article, I created a test user account and created a directory structure named www in this user's home directory so I would have some files on which to practice using the archiving utilities. I'll log in as the test user, cd into the www directory, and see what happens if I use the ls command with the cpio utility:

cd www
ls | cpio -ov > backup.cpio

You'll note that I first cded into the directory that contained the files I wished to archive. I used the ls utility to make a list of the files in the current directory and used a pipe (|) to send that list to the cpio utility. The o switch invokes what is known as "copy out mode," which tells cpio to create an archive. The v switch tells cpio to be verbose, meaning it will list each file as it archives it. Finally, I used the > redirector to write the results (the archive) to a file called backup.cpio. I can call this file anything I like; I chose to give it a cpio extension to remind me that it is a cpio backup file. I can verify the file type using the file utility:

file backup.cpio
backup.cpio: cpio archive

Instead of using the redirector, I could have also used the F switch to specify which file to write the archive to. So the following command will achieve the same results:

ls | cpio -ovF backup.cpio
Learning the Unix Operating System

Related Reading

Learning the Unix Operating System
A Concise Guide for the New User
By Jerry Peek, Grace Todino-Gonguet, John Strang

Once the archive was created, cpio told me how many blocks it wrote to the archive; in my case, it was 48 blocks.

So to create an archive, use the o switch or copy-out mode. To either view or extract the contents of the archive, use what is known as "copy-in mode." You invoke this mode by using the i switch. If you just want to view the contents of the archive, also include the t switch, which will list the contents of the archive without extracting them:

cpio -it < backup.cpio

You'll note that this time I used the other redirector (<), as I wanted the contents of the backup.cpio file to be sent to the cpio utility. I can also include the v switch, if I want to see a verbose listing of the backup:

cpio -itv < backup.cpio

Remember that it is important to view the contents of an archive before attempting to restore it, as you want to ensure that the files don't begin with a /.

To restore this archive, I simply cd into the directory to which I'd like to restore the archive, and repeat the above command without the t switch. I'll cd back into my home directory and create a directory named backupand do the restore there:

cd
mkdir backup
cd backup
cpio -iv < ~/www/backup.cpio

You'll note something interesting if you try this exercise yourself; if you use the ls -F command, you'll see that you did indeed restore all of the files and directories that were in the www directory. But if you cd into any of those subdirectories, you'll note that they are empty. Even more interestingly, if you try to remove any of those subdirectories, you still have to use the R switch, as they are still valid directories.

What happened here? Since the cpio utility received its file list from the ls utility (and the ls utility can only list the files in the current directory), cpio was unaware of all of the files that existed below the current directory. Remember, cpio will only archive the files that are sent to it in a list. This may seem odd at first, but it is an ideal way to archive just the files in the current directory. In order to do this with the tar utility, you would have to create an exclude file, as tar wants to recursively copy everything in and below the current directory.

This doesn't mean that cpio can't archive recursively; it simply means that if you want to just archive the current directory, you use ls and if you want to archive recursively, you use find instead.

Let's try that backup and restore again, this time using the find utility. First, I'll remove the old backup and empty out the backup directory:

rm www/backup.cpio
rm -R backup/*

Then I'll cd into the directory I wish to back up (www) and archive its contents:

cd www
find -d . -print | cpio -ov > backup.cpio

When using the find utility with cpio, it is always a good idea to include either the d or the depth switch. Remember from the find article that this switch prevented permissions from interfering with a backup. When using this switch, either put -d right after the word find and before the directory to search (in this case, "."), or put the word -depth after the directory to search, like so:

find . -depth -print | cpio -ov > backup.cpio

So as a recap on the find command, I told find to search the current directory (".") and to "print" its contents; the | was used to send those contents to the cpio utility, which created an archive (-o) and wrote that archive to a file called backup.cpio. When I created this archive, I noted that cpio wrote 43097 blocks, which is many more than the 48 I received with the ls command.

Now let's see what happens when I try to restore this archive:

cd ../backup
cpio -iv < ~/www/backup.cpio

I received an interesting message on my screen when I did this restore:


<snip>
cpio: mod_tsunami/Makefile: No such file or directory
cpio: mod_tsunami/distinfo: No such file or directory
cpio: mod_tsunami/pkg-comment: No such file or directory
cpio: mod_tsunami/pkg-descr: No such file or directory
cpio: mod_tsunami/pkg-plist: No such file or directory
mod_tsunami
Makefile
.
43097 blocks

It looks like cpio read all 43097 blocks but complained about missing files or directories. Indeed, if I do an ls on any of the restored subdirectories, I'll discover that they are once again empty! Don't worry, all of those files and directories are in that archive file; I've simply demonstrated the default extraction behaviour of cpio. Unlike tar, the cpio utility does not recreate any directories during the restore unless you specifically ask it to with the d switch. And, unlike tar, the cpio utility will not overwrite any existing files unless you specifically ask it to with the u switch.

So let's try that restore again, this time using the d switch to create the directories and the u switch to overwrite the files I've already restored:

cpio -ivdu < ~/www/backup.cpio

This time I don't receive any error messages and I've successfully restored all of the subdirectories and their files.

There're a few more switches you may consider using when backing up and restoring with cpio. If I compare the modification times of a file before it was archived and after it was restored, I will see this:

ls -l www/zope/Makefile	
-rw-r--r--  1 test  wheel  4308 May 11 09:53 www/zope/Makefile
ls -l backup/zope/Makefile
-rw-r--r--  1 test  wheel  4308 Jun  2 11:38 backup/zope/Makefile
ls -l www/backup.cpio
-rw-r--r--  1 test  wheel  22065664 Jun  2 10:39 www/backup.cpio

You'll note that the original file was created on May 11, that it was backed up on June 2 at 10:39, and that it was restored on June 2 at 11:38. If you want to preserve the file's original time, include the a switch when creating the archive, and the m switch when restoring the archive:

cd www
find -d . -print | cpio -ova > backup.cpio
cd ../backup
cpio -ivdm < ~/www/backup.cpio

If you try this and repeat the ls -l command, you'll see that the original times of the archived files were kept intact.

The nice thing about using the find utility with cpio is that you have all of find's switches available to you, to fine-tune which files you would like to back up. For example, if you'd like to do an incremental backup, use find's -newer switch. In this example, I'll back up all of the files in my home directory that have changed since 11 PM on June 1st:

cd
touch -t 06012300 June1
find -d . -newer June1 -print | cpio -ova > backup.cpio

Here I used the touch utility to create an empty file with a timestamp of month 06 day 01 time 2300, then I told find to use the time on that file as the reference point when searching the current directory. Alternatively, if I wasn't concerned so much about the time as the date, I could have used find's atime, ctime, or mtime switches. And if I only want to archive files of a certain size, I can use find's size switch.

Before ending today's article, I'd also like to demonstrate cpio's third mode, which is known as "copy-pass mode." This is an interesting mode, as it archives and extracts in the same command, making it ideal for copying one directory structure and recreating it in another location.

Let's say I want to copy the www directory structure from the home directory of the test user to the home directory of the user genisis. I'll have to become the superuser, as I'll be creating the archive in one user's home directory and recreating it in another user's home directory:

su
Password:
cd ~test/www
find -d . -print | cpio -pvd ~genisis/www

Note that I first cded into the directory I wanted to archive, in this case the www subdirectory of the test user's home directory. Then, with the cpio command, I invoked copy-pass mode with the p switch and specified that I wanted the archive recreated in the www subdirectory of the home directory of the user genisis.

If I run this command and then do an ls -l of genisis' home directory, I'll see that I've successfully recreated the entire www directory structure. However, I'll want to fine-tune that above command as those restored files still belong to the user "test." I'll repeat that command using the u switch so it will overwrite that last restore, and I'll include the R switch, which tells cpio to change the ownership of the files as it recreates them:

find -d . -print | cpio -pvdu -R genisis ~genisis/www

When using the R switch, follow it by the name of the user you wish to become the owner of the files, then follow that by the name of the directory to restore the files to.

Finally, if I want to keep the original times of the files instead of having them changed to the time the files were restored, I'd also add the a and m switches:

find -d . -print | cpio -pvduam -R genisis ~genisis/www

This should get you started with the cpio command. If you're planning on using cpio to copy between different computers, you'll want to read its manpage first, as there may be considerations, especially if the computers are running different versions of Unix or different architectures.

In next week's article, I'll continue the archiver series by introducing the pax command and, if space permits, the dd command.

Dru Lavigne is a network and systems administrator, IT instructor, author and international speaker. She has over a decade of experience administering and teaching Netware, Microsoft, Cisco, Checkpoint, SCO, Solaris, Linux, and BSD systems. A prolific author, she pens the popular FreeBSD Basics column for O'Reilly and is author of BSD Hacks and The Best of FreeBSD Basics.


Read more FreeBSD Basics columns.

Return to the BSD DevCenter.


Copyright © 2009 O'Reilly Media, Inc.