The Linux Logical Volume Manager (LVM) is a mechanism for virtualizing disks. It can create "virtual" disk partitions out of one or more physical hard drives, allowing you to grow, shrink, or move those partitions from drive to drive as your needs change. It also allows you to create larger partitions than you could achieve with a single drive.
Traditional uses of LVM have included databases and company file servers, but even home users may want large partitions for music or video collections, or for storing online backups. LVM and RAID 1 can also be convenient ways to gain redundancy without sacrificing flexibility.
This article looks first at a basic file server, then explains some variations on that theme, including adding redundancy with RAID 1 and some things to consider when using LVM for desktop machines.
An operational LVM system includes both a kernel filesystem component and userspace utilities. To turn on the kernel component, set up the kernel options as follows:
Device Drivers --> Multi-device support (RAID and LVM) [*] Multiple devices driver support (RAID and LVM) < > RAID support <*> Device mapper support < > Crypt target support (NEW)
You can usually install the LVM user tools through your Linux distro's packaging system. In Gentoo, the LVM user tools are part of the
lvm2 package. Note that you may see tools for LVM-1 as well (perhaps named
lvm-user). It doesn't hurt to have both installed, but make sure you have the LVM-2 tools.
To use LVM, you must understand several elements. First are the regular physical hard drives attached to the computer. The disk space on these devices is chopped up into partitions. Finally, a filesystem is written directly to a partition. By comparison, in LVM, Volume Groups (VGs) are split up into logical volumes (LVs), where the filesystems ultimately reside (Figure 1).
Each VG is made up of a pool of Physical Volumes (PVs). You can extend (or reduce) the size of a Volume Group by adding or removing as many PVs as you wish, provided there are enough PVs remaining to store the contents of all the allocated LVs. As long as there is available space in the VG, you can also grow and shrink the size of your LVs at will (although most filesystems don't like to shrink).
Figure 1. An example LVM layout (Click to view larger image)
A simple, practical example of LVM use is a traditional file server, which provides centralized backup, storage space for media files, and shared file space for several family members' computers. Flexibility is a key requirement; who knows what storage challenges next year's technology will bring?
For example, suppose your requirements are:
400G - Large media file storage 50G - Online backups of two laptops and three desktops (10G each) 10G - Shared files
Ultimately, these requirements may increase a great deal over the next year or two, but exactly how much and which partition will grow the most are still unknown.
Traditionally, a file server uses SCSI disks, but today SATA disks offer an attractive combination of speed and low cost. At the time of this writing, 250 GB SATA drives are commonly available for around $100; for a terabyte, the cost is around $400.
SATA drives are not named like ATA drives (hda, hdb), but like SCSI (sda, sdb). Once the system has booted with SATA support, it has four physical devices to work with:
/dev/sda 251.0 GB /dev/sdb 251.0 GB /dev/sdc 251.0 GB /dev/sdd 251.0 GB
Next, partition these for use with LVM. You can do this with
fdisk by specifying the "Linux LVM" partition type 8e. The finished product looks like this:
# fdisk -l /dev/sdd Disk /dev/sdd: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Start End Blocks Id System /dev/sdd1 1 30515 245111706 8e Linux LVM
Notice the partition type is
8e, or "Linux LVM."
Initialize each of the disks using the
# pvcreate /dev/sda /dev/sdb /dev/sdc /dev/sdd
This sets up all the partitions on these drives for use under LVM, allowing creation of volume groups. To examine available PVs, use the
pvdisplay command. This system will use a single-volume group named
# vgcreate datavg /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
vgdisplay to see the newly created
datavg VG with the four drives stitched together. Now create the logical volumes within them:
# lvcreate --name medialv --size 400G # lvcreate --name backuplv --size 50G # lvcreate --name sharelv --size 10G
Without LVM, you might allocate all available disk space to the partitions you're creating, but with LVM, it is worthwhile to be conservative, allocating only half the available space to the current requirements. As a general rule, it's easier to grow a filesystem than to shrink it, so it's a good strategy to allocate exactly what you need today, and leave the remaining space unallocated until your needs become clearer. This method also gives you the option of creating new volumes when new needs arise (such as a separate encrypted file share for sensitive data). To examine these volumes, use the
Now you have several nicely named logical volumes at your disposal:
/dev/datavg/backuplv (also /dev/mapper/datavg-backuplv) /dev/datavg/medialv (also /dev/mapper/datavg-medialv) /dev/datavg/sharelv (also /dev/mapper/datavg-sharelv)
Now that the devices are created, the next step is to put filesystems on them. However, there are many types of filesystems. How do you choose?
For typical desktop filesystems, you're probably familiar with ext2 and ext3. ext2 was the standard, reliable workhorse for Linux systems in years past. ext3 is an upgrade for ext2 that provides journaling, a mechanism to speed up filesystem checks after a crash. ext3's balance of performance, robustness, and recovery speed makes it a fine choice for general purpose use. Because ext2 and ext3 have been the defaults for such a long time, ext3 is also a good choice if you want great reliability. For storing backups, reliability is much more important than speed. The major downside to ext2/ext3 is that to grow (or shrink) the filesystem, you must first unmount it.
However, other filesystems provide advantages in certain situations, such as large file sizes, large quantities of files, or on-the-fly filesystem growth. Because LVM's primary use is for scenarios where you need extreme numbers of files, extremely large files, and/or the need to resize your filesystems, the following filesystems are well worth considering.
For large numbers of small files, ReiserFS is an excellent choice. For raw, uncached file I/O, it ranks at the top of most benchmarks, and can be as much as an order of magnitude faster than ext3. Historically, however, it has not proven as robust as ext3. It's been tested enough lately that this may no longer be a significant issue, but keep it in mind.
If you are designing a file server that will contain large files, such as video files recorded by MythTV, then delete speed could be a priority. With ext3 or ReiserFS, your deletes may take several seconds to complete as the filesystem works to mark all of the freed data blocks. If your system is recording or processing video at the same time, this delay could cause dropped frames or other glitches. JFS and XFS are better choices in this situation, although XFS has the edge due to greater reliability and better general performance.
With all these considerations in mind, format the partitions as follows:
# mkfs.ext3 /dev/datavg/backuplv # mkfs.xfs /dev/datavg/medialv # mkfs.reiserfs /dev/datavg/sharelv
Finally, to mount the file systems, first add the following lines to /etc/fstab:
/dev/datavg/backuplv /var/backup ext3 rw,noatime 0 0 /dev/datavg/medialv /var/media xfs rw,noatime 0 0 /dev/datavg/sharelv /var/share reiserfs rw,noatime 0 0
and then establish and activate the mount points:
# mkdir /var/media /var/backup /var/share # mount /var/media /var/backup /var/share
Now your basic file server is ready for service.
So far, this LVM example has been reasonably straightforward. However, it has one major flaw: if any of your drives fail, all of your data is at risk! Half a terabyte is not an insignificant amount to back up, so this is an extremely serious weakness in the design.
To compensate for this risk, build redundancy into the design using RAID 1. RAID, which stands for Redundant Array of Independent Disks, is a low-level technology for combining disks together in various ways, called RAID levels. The RAID 1 design mirrors data across two (or more) disks. In addition to doubling the reliability, RAID 1 adds performance benefits for reads because both drives have the same data, and read operations can be split between them.
Unfortunately, these benefits do not come without a critical cost: the storage size is cut in half. The good news is that half a terabyte is still enough for the present space requirements, and LVM gives the flexibility to add more or larger disks later.
With four drives, RAID 5 is another option. It restores some of the disk space but adds even more complexity. Also, it performs well with reads but poorly with writes. Because hard drives are reasonably cheap, RAID 5's benefits aren't worth the trouble for this example.
Although it would have made more sense to start with a RAID, we waited until now to introduce them so we could demonstrate how to migrate from raw disks to RAID disks without needing to unmount any of the filesystems.
In the end, this design will combine the four drives into two RAID 1 pairs: /dev/sda + /dev/sdd and /dev/sdb + /dev/sdc. The reason for this particular arrangement is that sda and sdd are the primary and secondary drives on separate controllers; this way, if a controller were to die, you could still access the two drives on the alternate controller. When the primary/secondary pairs are used, the relative access speeds are balanced so neither RAID array is slower than the other. There may also be a performance benefit to having accesses evenly distributed across both controllers.
First, pull two of the SATA drives (sdb and sdd) out of the
# modprobe dm-mirror # pvmove /dev/sdb1 /dev/sda1 # pvmove /dev/sdd1 /dev/sdc1 # vgreduce datavg /dev/sdb1 /dev/sdd1 # pvremove /dev/sdb1 /dev/sdd1
Then, change the partition type on these two drives, using filesystem type
fd (Linux raid autodetect):
Device Boot Start End Blocks Id System /dev/sdb1 1 30515 245111706 fd Linux raid autodetect
Now, build the RAID 1 mirrors, telling
md that the "other half" of the mirrors are missing (because they're not ready to be added to the RAID yet):
# mdadm --create /dev/md0 -a -l 1 -n 2 /dev/sdd1 missing # mdadm --create /dev/md1 -a -l 1 -n 2 /dev/sdb1 missing
Add these broken mirrors to the LVM:
# pvcreate /dev/md0 /dev/md1 # vgextend datavg /dev/md0 /dev/md1
Next, migrate off of the raw disks onto the broken mirrors:
# pvmove /dev/sda1 /dev/md0 # pvmove /dev/sdc1 /dev/md1 # vgreduce datavg /dev/sda1 /dev/sdc1 # pvremove /dev/sda1 /dev/sdc1
Finally, change the partition types of the raw disks to
fd, and get the broken mirrors on their feet with full mirroring:
# fdisk /dev/sda1 # fdisk /dev/sdc1 # mdadm --manage /dev/md0 --add /dev/sda1 # mdadm --manage /dev/md1 --add /dev/sdc1
That's quite a few steps, but this full RAID 1 setup protects the LVM system without having to reinstall, copy or remount filesystems, or reboot.
A file server isn't much use if you can't get files off of it. There are many ways to serve files, but the most common and powerful is Network File System (NFS). NFS allows other *nix machines to mount the file shares for direct use. It's also pretty easy to set up on Linux.
First, make sure the file server has NFS enabled in the kernel (2.6.15 in this example):
File systems Network File Systems <*> NFS file system support [*] Provide NFSv3 client support <*> NFS server support [*] Provide NFSv3 server support
Rebuild and reinstall the kernel and then reboot the file server. If you'd like to avoid rebooting, build NFS as a module and then load it with
Next, start the NFS service. Your Linux distro will have an
init script to do this. For instance, on Gentoo, you'll see:
/etc/init.d/nfs start * Starting portmap ... [ ok ] * Mounting RPC pipefs ... [ ok ] * Starting NFS statd ... [ ok ] * Starting NFS daemon ... [ ok ] * Starting NFS mountd ... [ ok ]
You can double-check that NFS is running by querying
portmapper with the command
rpcinfo -p | grep nfs:
program vers proto port service 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs
Next, you must specify which directories the NFS service should export. Add the following to /etc/exports:
/var/backup 192.168.0.0/24(rw,sync) /var/media 192.168.0.0/24(rw,sync) /var/share 192.168.0.0/24(rw,sync)
This lists the directories to share, the machines (or networks) to permit to mount the files, and a set of options to control how the sharing works. The options include
rw to allow read-write mounts and
sync to force synchronous behavior.
sync prevents data corruption if the server reboots in the middle of a file write, but sacrifices the performance advantages that
async would provide.
Next, export these file shares from the NFS service:
# exportfs -av exporting 192.168.0.0/24:/var/backup exporting 192.168.0.0/24:/var/media exporting 192.168.0.0/24:/var/share
Now, mount these file shares on each machine that will use them. Assuming the file server is named
fileserv, add the following lines to the client machines' /etc/fstab files:
# Device mountpoint fs-type options dump fsckorder fileserv:/var/backup /var/backup nfs defaults 0 0 fileserv:/var/media /var/media nfs defaults 0 0 fileserv:/var/share /var/share nfs defaults 0 0
Finally, create the mountpoints and mount the new shares:
# mkdir /var/backup /var/media /var/share # mount /var/backup /var/media /var/share
Now all the machines on your network have access to large, reliable, and expandable disk space!
As you rely more heavily on this new LVM-enabled disk space, you may have concerns about backing it up. Using RAID ensures against basic disk failures, but gives you no protection in the case of fire, theft, or accidental deletion of important files.
Traditionally, tape drives are used for backups of this class. This option is still viable and has several advantages, but it can be an expensive and slow solution for a system of this size. Fortunately, there other options using today's technology.
rsync is a powerful utility for copying files from one system to another, and it works well across the Internet. You could set up a backup system at a friend's house in a different city and arrange to periodically send backups there. This is easy to do with cronjob:
04 4 * * 4 rsync --delete -a /var/backup/ fileserv.myfriend.org:/backup/myself/backup \ > /var/log/crontab.backup.log 2>&1
Another approach is to attach a pair of external RAID 1 hard drives to your file server using Firewire, USB, or eSATA. Add one drive to /dev/md0 and the other to /dev/md1. Once the mirroring is complete, remove the drives and store them in a safe place offsite. Re-mirror weekly or monthly, depending on your needs.
Suppose that over the next year, the storage system fills up and needs to be expanded. Initially, you can begin allocating the unallocated space. For instance, to increase the amount of space available for shared files from 10GB to 15GB, run a command such as:
# lvextend -L15G /dev/datavg/sharelv # resize_reiserfs /dev/datavg/sharelv
But over time, all the unallocated disk space will be used. One solution is to replace the four 250G drives with larger 800G ones.
In the case where you use RAID 1, migration is straightforward. Use
mdadm to mark one drive of each of the RAID 1 mirrors as failed, and then remove them:
# mdadm --manage /dev/md0 --fail /dev/sda1 # mdadm --manage /dev/md0 --remove /dev/sda1 # mdadm --manage /dev/md0 --fail /dev/sdc1 # mdadm --manage /dev/md0 --remove /dev/sdc1
Pull out the sda and sdc hard drives and replace them with two of the new 800G drives. Split each 800G drive into a 250G partition and a 550G partition using
fdisk, and add the partitions back to md0 and md1:
# fdisk /dev/sda # fdisk /dev/sdc # mdadm --manage /dev/md0 --add /dev/sda1 # mdadm --manage /dev/md1 --add /dev/sdc1
Repeat the above process with sdd and sdb to move them to the other two new drives, then create a third and fourth RAID device, md2 and md3, using the new space:
# mdadm --create /dev/md2 -a -l 1 -n 2 /dev/sda2 /dev/sdd2 # mdadm --create /dev/md3 -a -l 1 -n 2 /dev/sdb2 /dev/sdc2
Finally, add these to LVM:
# pvcreate /dev/md2 /dev/md3 # vgextend datavg /dev/md2 /dev/md3
The file server now has 1.6TB of fully redundant storage.
So far, we've talked only about LVM and RAID for secondary disk space via a standalone file server, but what if you want to use LVM to manage the space on a regular desktop system? It can work, but there are some considerations to take into account.
First, the installation and upgrade procedures for some Linux distributions don't handle RAID or LVM, which may present complications. Many of today's distros do support it, and even provide tools to assist in creating and managing them, so check this first.
Second, having the root filesystem on LVM can complicate recovery of damaged file systems. Because boot loaders don't support LVM yet, you must also have a non-LVM /boot partition (though it can be on a RAID 1 device).
Third, you need some spare unallocated disk space for the new LVM partition. If you don't have this, use
parted to shrink your existing root partition, as described in the LVM HOWTO.
For this example, assume you have your swap space and /boot partitions already set up outside of LVM on their own partitions. You can focus on moving your root filesystem onto a new LVM partition in the partition /dev/hda4. Check that the filesystem type on hda4 is LVM (type 8e).
Initialize LVM and create a new physical volume:
# vgscan # pvcreate /dev/hda4 # vgcreate rootvg /dev/hda4
Now create a 5G logical volume, formatted into an xfs file system:
# lvcreate rootvg ---name rootlv -size 5G # mkfs.xfs /dev/rootvg/rootlv
Copy the files from the existing root file system to the new LVM one:
# mkdir /mnt/new_root # mount /dev/rootvg/rootlv /mnt/new_root # cp -ax /. /mnt/new_root/
Next, modify /etc/fstab to mount / on /dev/rootvg/root instead of /dev/hda3.
The trickiest part is to rebuild your
initrd to include LVM support. This tends to be distro-specific, but look for
initrd image must have the LVM modules loaded or the root filesystem will not be available. To be safe, leave your original
initrd image alone and make a new one named, for example, /boot/initrd-lvm.img.
Finally, update your bootloader. Add a new section for your new root filesystem, duplicating your original boot stanza. In the new copy, change the
root from /dev/hda3 to /dev/rootvg/rootlv, and change your
initrd to the newly built one. If you use lilo, be sure to run
lilo once you've made the changes. For example, with grub, if you have:
title=Linux root (hd0,0) kernel /vmlinuz root=/dev/hda3 ro single initrd /initrd.img
add a new section such as:
title=LinuxLVM root (hd0,0) kernel /vmlinuz root=/dev/rootvg/root ro single initrd /initrd-lvm.img
LVM is only one of many enterprise technologies in the Linux kernel that has become available for regular users. LVM provides a great deal of flexibility with disk space, and combined with RAID 1, NFS, and a good backup strategy, you can build a bulletproof, easily managed way to store, share, and preserve any quantity of files.
Bryce Harrington is a Senior Performance Engineer at the Open Source Development Labs in Beaverton, Oregon.
Kees Cook is the senior network administrator at OSDL.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.