When I first started using LVM I got bit by a few bugs. It’s all part of being an early adopter. As a result I never really used it on production hardware. It wasn’t until about 2 years ago that I gave it another look. In a similar manner I never really thought much of software raid beyond a novelty. Much of that has changed now and I use them both on a regular basis for a number of reasons.
I’m in one situation right now where I have a filesystem that I did stupid things to and fsck.ext3 segfaults on it. Normally this would be a really bad thing (and it is) but because I’m using LVM, I can run many tests on the filesystem, in a snapshot, without harming the master filesystem (which has around 2.5T of data on it). In this case the bad blocks have already been written to and likely won’t be changing. So I can continue to use the share in production, while working with the fsck devs to figure out what is segfaulting. I can also take two snapshots at the same time. Run fsck.ext3 on one of them, and then compare the actual files to see which (if any) of them have changed or gotten corrupt.
This is just one case in a million where knowledge and use of lvm has proved immensely powerful. Another case (and the topic of this article) is a situation where the combination of lvm and software raid will allow me to convert a raid1 array into a raid5 array without any downtime on my critical apps. And for those of you that think you’ll take a performance hit when using software raid over hardware raid, I ask “which can calculate parody faster. a 486 processor in the card or one of the xeon processors in your box that isn’t on the card”. Go ahead, run the test yourself.
Anyway, to get started: in our environment we use virtualization heavily. One of our xen dom0’s has two raid1 arrays on it, it was a while ago when this machine was built and we can only assume that those two arrays were for two dedicated machines. But now we want to merge the arrays and just have one raid5. So the basic layout of our setup:
2 raid 0 arrays.
2 physical volumes.
2 volume groups.
At the end we want:
1 raid5 array
1 physical volume
1 volume group.
I should warn you that in prep for this we moved all critical apps to one of the volumegroups so essentially the other volume group is now empty so I’m treating it that way during my test, I’ll only be creating one volumegroup and array. To test this at home (as I’ll be doing in the examples) I assume you have a volume group with 400M free. I’ll be creating a logical volume as an analog for the “disks” in our production server.
First: create the “disks” (100M each)
# lvcreate -L 100M -n 1 VolGroup00
Logical volume “1″ created
# lvcreate -L 100M -n 2 VolGroup00
Logical volume “2″ create
# lvcreate -L 100M -n 3 VolGroup00
Logical volume “3″ created
# lvcreate -L 100M -n 4 VolGroup00
Logical volume “4″ created
Then create your raid array:
# mdadm -C /dev/md0 -a yes –level=1 –raid-devices=2 /dev/VolGroup00/1 /dev/VolGroup00/2
mdadm: array /dev/md0 started.
Next convert the array to a physical volume
# pvcreate /dev/md0
Physical volume “/dev/md0″ successfully created
Then create your group
# vgcreate testVG /dev/md0
Volume group “testVG” successfully created
# vgs
LV VG Attr LSize Origin Snap% Move Log Copy%
testFS testVG -wi-a- 96.00M
We’ve got a volume group with 96M free.
Then create your logical volume
# lvcreate -n testFS -L 96M testVG
# lvs
LV VG Attr LSize Origin Snap% Move Log Copy%
testFS testVG -wi-a- 96.00M
Next create a filesystem and stick some information on it:
# mkfs.ext3 /dev/testVG/testFS
mke2fs 1.40.4 (31-Dec-2007)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
24576 inodes, 98304 blocks
4915 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
12 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 24 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
# mount /dev/testVG/testFS /mnt
# echo “Critical Data” > /mnt/ThisIsVaulable
At this point we’re in an analog to where I am in production. As I mentioned I moved everything critical off of one volumegroup already so I’ve essentially got a logical volume on top of a raid1 array with 2 free disks. At this point we can start moving stuff around:
# mdadm -C /dev/md1 –level=5 –raid-devices=2 /dev/VolGroup00/3 /dev/VolGroup00/4
# mdadm: array /dev/md1 started.
You can always check the status of your array in /proc/mdstat:
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 dm-11[2] dm-10[0]
102336 blocks level 5, 64k chunk, algorithm 2 [2/1] [U_]
[====>…………….] recovery = 21.0% (22408/102336) finish=0.0min speed=22408K/sec
md0 : active raid1 dm-9[1] dm-8[0]
102336 blocks [2/2] [UU]
Smarties will note that you have to have 3 disks to create a raid5 array. Technically you only need 2, but you don’t have redundancy. I’m basically running this array in a degraded mode.
After that finishes I’m going to steal one disk from the raid1 array. At this point I feel the need to mention that I have a solid backup of all of the data left on our soon to be gone raid1 array. If the disk it is on fails during the migration. The data is likely toast. The only way to protect against this is to add more disks to the machine which just isn’t feasible in my use case.
So lets fail a disk in the raid1 array and add it to the raid5 array:
# mdadm /dev/md0 –fail /dev/VolGroup00/2
mdadm: set /dev/VolGroup00/2 faulty in /dev/md0
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 dm-11[1] dm-10[0]
102336 blocks level 5, 64k chunk, algorithm 2 [2/2] [UU]
md0 : active raid1 dm-9[2](F) dm-8[0]
102336 blocks [2/1] [U_]
Note the (F), its failed.
# mdadm /dev/md0 –remove /dev/VolGroup00/2
mdadm: hot removed /dev/VolGroup00/2
# cat /mnt/ThisIsVaulable
Critical Data
note we have not unmounted /mnt and will not do so.
# mdadm -G /dev/md1 –raid-devices=3 –backup-file=/tmp/backup
mdadm: Need to backup 128K of critical section..
mdadm: … critical section passed.
# mdadm /dev/md1 –add /dev/VolGroup00/2
mdadm: added /dev/VolGroup00/2
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 dm-9[3](S) dm-11[1] dm-10[0]
102336 blocks super 0.91 level 5, 64k chunk, algorithm 2 [3/2] [UU_]
[====>…………….] reshape = 24.0% (25088/102336) finish=0.4min speed=2787K/sec
md0 : active raid1 dm-8[0]
102336 blocks [2/1] [U_]
At this time it is reshaping the raid5 array and ultimately will add my new disk and rebuild the array. The array is online at that time and could be used.
So now we have two arrays, its time to add our raid5 array to the testVG volume group:
# pvcreate /dev/md1
Physical volume “/dev/md1″ successfully created
# vgs
VG #PV #LV #SN Attr VSize VFree
testVG 1 1 0 wz–n- 96.00M 0
# vgextend testVG /dev/md1
Volume group “testVG” successfully extended
# vgs
VG #PV #LV #SN Attr VSize VFree
testVG 2 1 0 wz–n- 292.00M 196.00M
notice the size of our volumegroup has changed. Now comes the magic. All of our valuable data is still on the raid1 (/dev/md0) raid array. To move it off of that disk, onto the new disk we use the pvmove command:
# pvmove -i 1 /dev/md0
/dev/md0: Moved: 4.2%
/dev/md0: Moved: 37.5%
/dev/md0: Moved: 70.8%
/dev/md0: Moved: 100.0%
Then to remove the disks from the testVG, use vgreduce:
# vgreduce testVG /dev/md0
Removed “/dev/md0″ from volume group “testVG”
At that point, the hard part is over, but we’re not done. Now we’re going to disable the /dev/md0 array and add the only remaining disk to /dev/md1.
# mdadm -S /dev/md0
mdadm: stopped /dev/md0
# mdadm -G /dev/md1 –raid-devices=4 –backup-file=/tmp/backup
mdadm: Need to backup 384K of critical section..
mdadm: … critical section passed.
# mdadm /dev/md1 –add /dev/VolGroup00/1
mdadm: added /dev/VolGroup00/1
# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 dm-9[2] dm-11[1] dm-10[0]
204672 blocks super 0.91 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[==>………………] reshape = 14.0% (15172/102336) finish=0.4min speed=3034K/sec
Again we’re reshaping then rebuilding the array, notice md0 is gone.
# pvs
PV VG Fmt Attr PSize PFree
/dev/md1 testVG lvm2 a- 196.00M 100.00M
# pvresize /dev/md1
Physical volume “/dev/md1″ changed
# pvs
PV VG Fmt Attr PSize PFree
/dev/md1 testVG lvm2 a- 296.00M 200.00M
# vgs
VG #PV #LV #SN Attr VSize VFree
testVG 1 1 0 wz–n- 296.00M 200.00M
Now we’ve got that additional 100M to use. remember, you lose a drive of usage when using raid0, so our total storage is only 3 of our 100M disks (for 300M of usable space) The 96M is already taken by the filesystem.
To use some of this space, first we need to extend the logical volume we created earlier (testFS). Note size before and after:
# lvs
LV VG Attr LSize Origin Snap% Move Log Copy%
testFS testVG -wi-a- 96.00M
# lvresize /dev/testVG/testFS -L +100M
Extending logical volume testFS to 196.00 MB
Logical volume testFS successfully resized
# lvs
LV VG Attr LSize Origin Snap% Move Log Copy%
testFS testVG -wi-ao 196.00M
next, resize the filesystem:
# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/testVG-testFS
93M 5.6M 83M 7% /mnt
# resize2fs /dev/testVG/testFS
resize2fs 1.40.4 (31-Dec-2007)
Filesystem at /dev/testVG/testFS is mounted on /mnt; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/testVG/testFS to 200704 (1k) blocks.
The filesystem on /dev/testVG/testFS is now 200704 blocks long.
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/testVG-testFS
190M 5.6M 175M 4% /mnt
# cat /mnt/ThisIsValuable
Critical Data
Done, no outage. It may be a few steps to do, but in my case its worth it to avoid having the downtime. This is a pretty specific use-case but there are plenty of other cases where this example will allow users to use these tools in a way that has little impact on the end users.

I challenged all of my boxes to see which could "calculate parody" fastest, but they all just sat there with neither exaggeration nor comic effect.
Its likely then that cpu wasn't the limiting factor in your io system. Keep adding disks until your raid card can't keep up wit hit :)
My thoughts on software RAID vs. hardware RAID: http://piece.dpiddy.net/2008/4/22/hardware-vs-software-raid
thunk, I prefer the uncalculated parody myself. :-P
If the author doesn´t know the difference between parody and parity I´d be wary of anything technical they purport to know...
meh, ad hominem attacks are a common fallacy and one I don't blame Adrian for making. But then, I'm a technician not a writer.
This tutorial is riddled with errors and inconsistencies. I'm pretty sure they have people that will do proof reading and fact checking for you.