Linux device-mapper has a module named 'snapshot', which affords Copy-on-Write layers to block devices.

Stack a COW atop a disk, and test out your damaging changes on that instead. If it doesn't work, throw away the COW and try again.

While syncing a raid5 to a new disk, the server's video card failed and, inexplicably, disabled one of the SATA ports, causing two disks to fail out of the RAID5.

Assembling the raid with --force should have ignored the 'stale' status of one of the members, and rebuilt it. This failed for some reason.

The only way to recover from this would be to recreate the array with the exact same parameters as the original, and leave out the disk that never finished synchronizing.

for I in a b c d e
do
  dd if=/dev/zero of=sd${I}.cow bs=1048576 count=1024
  losetup -a sd${I}.cow
done

echo "0 976773168 snapshot /dev/sda /dev/loop1 n 4096" | dmsetup create sda.cow
echo "0 976773168 snapshot /dev/sdb /dev/loop2 n 4096" | dmsetup create sdb.cow
echo "0 976773168 snapshot /dev/sdc /dev/loop3 n 4096" | dmsetup create sdc.cow
echo "0 976773168 snapshot /dev/sdd /dev/loop4 n 4096" | dmsetup create sdd.cow
echo "0 976773168 snapshot /dev/sde /dev/loop5 n 4096" | dmsetup create sde.cow

mdadm --create /dev/md/0 --assume-clean --metadata=1 --level=5 --chunk=1024 --raid-devices=5 /dev/mapper/sde.cow /dev/mapper/sdd.cow missing /dev/mapper/sdb.cow /dev/mapper/sda.cow

In the future, when swapping a functional non-failing RAID member for another, the following procedure would greatly reduce the problem window:

  • Ensure the RAID has a bitmap, for quick resilvering of temporarily removed members
  • Temporarily remove the disk to replace, add it to a RAID mirror, and reinsert the mirror
  • Wait for the resilvering to bring the disk back up to date using the bitmap
  • Add the new disk to the RAID mirror, and let it clone the contents across
  • Once synchronized, pull the mirror from the array, add the new disk in its place, and let the resilvering complete.

Possible problems with the above:

  • Superblock alignment issues - Creating the RAID1 atop the RAID5 in such a way as to preserve the RAID5 superblock may require adding additional sectors to the end of the disk
  • Raid5s are not bootable, so you can put the superblock at the beginning. RAID1s should be, so you can stick that superblock at the end. Things should Just Work(tm)