LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 03-17-2013, 10:33 PM   #1
wijte
LQ Newbie
 
Registered: Jan 2013
Location: The Netherlands
Posts: 1

Rep: Reputation: Disabled
software raid: adding "write-intent bitmap" overwrites data on raid device


I tried to add a write-intent bitmap to an existing software raid 5 array. It didn't work; instead, part of the LVM meta-data was overwritten with what looks like the write-intent bitmap.

Fortunately, the data could be restored from the LVM backup files. However, I would really like to get to the bottom of why this happened. It is locally reproducible, but I could not yet reproduce this in a virtual machine, or with loopback devices. Anyone with ideas about why it happened or how it can be debugged? Any help will be very much appreciated.

System information:
Fedora 17 x86_64
kernel 3.7.9-104
mdadm 3.2.6-7

Symptoms:
Code:
$ mdadm --grow --bitmap=internal --bitmap-chunk=512M /dev/md/bulk
mdadm: failed to set internal bitmap.
$ dmesg | tail
kernel: [  543.768771] created bitmap (2 pages) for device md124
kernel: [  543.768778] md124: bitmap file is out of date, doing full recovery
kernel: [  543.778904] md124: bitmap initialisation failed: -5
$ dd if=/dev/md/bulk bs=16 count=16 | hexdump -C
00000000  62 69 74 6d 04 00 00 00  4d 11 94 a9 39 85 3d 14  |bitm....M...9.=.|
00000010  d6 98 2e ab ed 21 a5 1d  00 00 00 00 00 00 00 00  |.....!..........|
00000020  00 00 00 00 00 00 00 00  00 fc 01 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 10 00  05 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000100  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00000200
Edit (removed all debugging output and speculation):
This occurs because the raid5 array was converted from a raid0 array. The raid0 array does not leave enough room to add the bitmap, because a raid0 array cannot have one. When checking data location:
Code:
$ mdadm --examine /dev/sdb1 | grep Offset
    Data Offset : 16 sectors
   Super Offset : 8 sectors
The mdadm code will put the bitmap 4KiB from the superblock, which is where the real data starts.

As a workaround, you can fail and remove the affected drive(s) from the array, wipe the superblock and add them back again. The new data offset should leave enough room for the bitmap.

The following script reproduces the problem using loopback devices.
Code:
#!/usr/bin/bash

# Created: March 18, 2013
# Author: Lars Wijtemans <lars23091019 - gmail.com>

# Use at your own risk

# This script demonstrates mdadm overwriting user data
# when adding a write-intent bitmap to an array that was
# converted from raid0, leaving 4KiB between the start
# of the superblock and the start of the user data.

# Tested on Fedora 17 x86_64, kernel 3.7.9-104, mdadm 3.2.6-7

RAIDPREFIX="test"
WORKDIR="/tmp"
DEVSIZE="8" # in MB. Script will use 3*DEVSIZE

cd "$WORKDIR"

# Sanity checks
if [ -e "/dev/md/$RAIDPREFIX-five" ]; then
  echo "Raid device $RAIDPREFIX-five exists, stopping"
  exit
fi

echo "Please read the script before executing"
echo "Use at your own risk, you can Ctrl-C now"
read -p "Test workaround? [y/n]"
WORKAROUND=$REPLY

# Create loopback devices
for i in 0 1 2; do
  if [ -e "disk$i.img" ]; then
    echo "File disk$i.img exists, stopping"
    if [ $i -eq 0 ]; then
      # Nothing to clean up
      exit
    fi
    # Remove created files
    LAST=$(( $i - 1 ))
    for (( d=0; d<=$LAST; d++ )); do
      sudo losetup -d "${DISK[$d]}"
      rm "disk$d.img"
    done
    exit
  fi
  COUNT=$DEVSIZE
  dd if=/dev/zero of="disk$i.img" bs=1M count=$COUNT
  DISK[$i]=$(sudo losetup -f --show "disk$i.img")
done


# Create raid device as raid0
sudo mdadm --create "/dev/md/$RAIDPREFIX-five" --level=0 \
--raid-devices=2 "${DISK[0]}" "${DISK[1]}"

# Convert it to raid5
sudo mdadm --grow "/dev/md/$RAIDPREFIX-five" --level=5 \
--raid-devices=3 --add "${DISK[2]}"

sudo mdadm --wait "/dev/md/$RAIDPREFIX-five"


# Workaround
if [[ $WORKAROUND =~ ^[Yy]$ ]]; then
  echo "Applying workaround"
  # Re-add first disk from previous raid0
  sudo mdadm "/dev/md/$RAIDPREFIX-five" --fail "${DISK[0]}"
  sudo mdadm "/dev/md/$RAIDPREFIX-five" --remove "${DISK[0]}"
  sudo mdadm --zero-superblock "${DISK[0]}"
  sudo mdadm "/dev/md/$RAIDPREFIX-five" --add "${DISK[0]}"
  sudo mdadm --wait "/dev/md/$RAIDPREFIX-five"
  # Second disk
  sudo mdadm "/dev/md/$RAIDPREFIX-five" --fail "${DISK[1]}"
  sudo mdadm "/dev/md/$RAIDPREFIX-five" --remove "${DISK[1]}"
  sudo mdadm --zero-superblock "${DISK[1]}"
  sudo mdadm "/dev/md/$RAIDPREFIX-five" --add "${DISK[1]}"
  sudo mdadm --wait "/dev/md/$RAIDPREFIX-five"
fi


# Put dummy data on raid device
echo "Writing random data"
sudo dd if=/dev/urandom of="/dev/md/$RAIDPREFIX-five" bs=1M 2>/dev/null
echo -e "\nArray device contains:"
sudo dd if="/dev/md/$RAIDPREFIX-five" bs=16 count=3 2>/dev/null | hexdump -C


# Add write-intent bitmap
sudo mdadm --grow --bitmap=internal --bitmap-chunk=1M \
"/dev/md/$RAIDPREFIX-five"


# Check the data
echo -e "\nArray device contains:"
if [[ $WORKAROUND =~ ^[Yy]$ ]]; then
  TESTSIZE=3
else
  TESTSIZE=35
fi
sudo dd if="/dev/md/$RAIDPREFIX-five" bs=16 count=$TESTSIZE 2>/dev/null \
| hexdump -C


echo "press return to continue with cleanup"
read

# Clean up
sudo mdadm --stop "/dev/md/$RAIDPREFIX-five"

for i in 0 1 2; do
  sudo losetup -d "${DISK[$i]}"
  rm "disk$i.img"
done

Last edited by wijte; 03-18-2013 at 02:58 PM. Reason: Problem cause and workaround was found
 
Old 03-27-2013, 02:34 PM   #2
corp769
Guru
 
Registered: Apr 2005
Posts: 5,807

Rep: Reputation: 996Reputation: 996Reputation: 996Reputation: 996Reputation: 996Reputation: 996Reputation: 996Reputation: 996
Nice work, and good write-up. Thanks for the effort! (Plus replying to get this off the zero reply list )

Cheers,

Josh
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Debian 5 mdadm "device or resource busy" raid edwardzc Debian 1 12-27-2010 04:52 AM
Would "RAID" Data storage system be best for Music download website.? mixhypnatist Linux - Server 11 08-23-2009 06:24 PM
Adding an old software-RAID array to a software-RAID installation.. Boot problems GarethM Linux - Hardware 2 05-05-2008 03:16 PM
Dell "CERC" Raid-5 w/CENT OS. Is this true hardware RAID or just an expensive card? fireman949 Linux - Hardware 2 06-24-2005 05:44 PM
Swapon error: "device busy" during RAID install of 9.2 Seekadvice Mandriva 0 02-19-2004 06:07 PM


All times are GMT -5. The time now is 03:04 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration