wijte |
03-17-2013 10:33 PM |
software raid: adding "write-intent bitmap" overwrites data on raid device
I tried to add a write-intent bitmap to an existing software raid 5 array. It didn't work; instead, part of the LVM meta-data was overwritten with what looks like the write-intent bitmap.
Fortunately, the data could be restored from the LVM backup files. However, I would really like to get to the bottom of why this happened. It is locally reproducible, but I could not yet reproduce this in a virtual machine, or with loopback devices. Anyone with ideas about why it happened or how it can be debugged? Any help will be very much appreciated.
System information:
Fedora 17 x86_64
kernel 3.7.9-104
mdadm 3.2.6-7
Symptoms:
Code:
$ mdadm --grow --bitmap=internal --bitmap-chunk=512M /dev/md/bulk
mdadm: failed to set internal bitmap.
$ dmesg | tail
kernel: [ 543.768771] created bitmap (2 pages) for device md124
kernel: [ 543.768778] md124: bitmap file is out of date, doing full recovery
kernel: [ 543.778904] md124: bitmap initialisation failed: -5
$ dd if=/dev/md/bulk bs=16 count=16 | hexdump -C
00000000 62 69 74 6d 04 00 00 00 4d 11 94 a9 39 85 3d 14 |bitm....M...9.=.|
00000010 d6 98 2e ab ed 21 a5 1d 00 00 00 00 00 00 00 00 |.....!..........|
00000020 00 00 00 00 00 00 00 00 00 fc 01 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 10 00 05 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00000200
Edit (removed all debugging output and speculation):
This occurs because the raid5 array was converted from a raid0 array. The raid0 array does not leave enough room to add the bitmap, because a raid0 array cannot have one. When checking data location:
Code:
$ mdadm --examine /dev/sdb1 | grep Offset
Data Offset : 16 sectors
Super Offset : 8 sectors
The mdadm code will put the bitmap 4KiB from the superblock, which is where the real data starts.
As a workaround, you can fail and remove the affected drive(s) from the array, wipe the superblock and add them back again. The new data offset should leave enough room for the bitmap.
The following script reproduces the problem using loopback devices.
Code:
#!/usr/bin/bash
# Created: March 18, 2013
# Author: Lars Wijtemans <lars23091019 - gmail.com>
# Use at your own risk
# This script demonstrates mdadm overwriting user data
# when adding a write-intent bitmap to an array that was
# converted from raid0, leaving 4KiB between the start
# of the superblock and the start of the user data.
# Tested on Fedora 17 x86_64, kernel 3.7.9-104, mdadm 3.2.6-7
RAIDPREFIX="test"
WORKDIR="/tmp"
DEVSIZE="8" # in MB. Script will use 3*DEVSIZE
cd "$WORKDIR"
# Sanity checks
if [ -e "/dev/md/$RAIDPREFIX-five" ]; then
echo "Raid device $RAIDPREFIX-five exists, stopping"
exit
fi
echo "Please read the script before executing"
echo "Use at your own risk, you can Ctrl-C now"
read -p "Test workaround? [y/n]"
WORKAROUND=$REPLY
# Create loopback devices
for i in 0 1 2; do
if [ -e "disk$i.img" ]; then
echo "File disk$i.img exists, stopping"
if [ $i -eq 0 ]; then
# Nothing to clean up
exit
fi
# Remove created files
LAST=$(( $i - 1 ))
for (( d=0; d<=$LAST; d++ )); do
sudo losetup -d "${DISK[$d]}"
rm "disk$d.img"
done
exit
fi
COUNT=$DEVSIZE
dd if=/dev/zero of="disk$i.img" bs=1M count=$COUNT
DISK[$i]=$(sudo losetup -f --show "disk$i.img")
done
# Create raid device as raid0
sudo mdadm --create "/dev/md/$RAIDPREFIX-five" --level=0 \
--raid-devices=2 "${DISK[0]}" "${DISK[1]}"
# Convert it to raid5
sudo mdadm --grow "/dev/md/$RAIDPREFIX-five" --level=5 \
--raid-devices=3 --add "${DISK[2]}"
sudo mdadm --wait "/dev/md/$RAIDPREFIX-five"
# Workaround
if [[ $WORKAROUND =~ ^[Yy]$ ]]; then
echo "Applying workaround"
# Re-add first disk from previous raid0
sudo mdadm "/dev/md/$RAIDPREFIX-five" --fail "${DISK[0]}"
sudo mdadm "/dev/md/$RAIDPREFIX-five" --remove "${DISK[0]}"
sudo mdadm --zero-superblock "${DISK[0]}"
sudo mdadm "/dev/md/$RAIDPREFIX-five" --add "${DISK[0]}"
sudo mdadm --wait "/dev/md/$RAIDPREFIX-five"
# Second disk
sudo mdadm "/dev/md/$RAIDPREFIX-five" --fail "${DISK[1]}"
sudo mdadm "/dev/md/$RAIDPREFIX-five" --remove "${DISK[1]}"
sudo mdadm --zero-superblock "${DISK[1]}"
sudo mdadm "/dev/md/$RAIDPREFIX-five" --add "${DISK[1]}"
sudo mdadm --wait "/dev/md/$RAIDPREFIX-five"
fi
# Put dummy data on raid device
echo "Writing random data"
sudo dd if=/dev/urandom of="/dev/md/$RAIDPREFIX-five" bs=1M 2>/dev/null
echo -e "\nArray device contains:"
sudo dd if="/dev/md/$RAIDPREFIX-five" bs=16 count=3 2>/dev/null | hexdump -C
# Add write-intent bitmap
sudo mdadm --grow --bitmap=internal --bitmap-chunk=1M \
"/dev/md/$RAIDPREFIX-five"
# Check the data
echo -e "\nArray device contains:"
if [[ $WORKAROUND =~ ^[Yy]$ ]]; then
TESTSIZE=3
else
TESTSIZE=35
fi
sudo dd if="/dev/md/$RAIDPREFIX-five" bs=16 count=$TESTSIZE 2>/dev/null \
| hexdump -C
echo "press return to continue with cleanup"
read
# Clean up
sudo mdadm --stop "/dev/md/$RAIDPREFIX-five"
for i in 0 1 2; do
sudo losetup -d "${DISK[$i]}"
rm "disk$i.img"
done
|