LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Repair a broken RAID 1 partition (http://www.linuxquestions.org/questions/linux-general-1/repair-a-broken-raid-1-partition-540412/)

a_l_a_n 03-25-2007 05:43 AM

Repair a broken RAID 1 partition
 
Simple question: How do you repair a RAID 1? Its a two disk software RAID. The partition on disk one is broken, the one one disk two is okay, and I want to copy two to one. How?

jdmcdaniel3 03-25-2007 08:22 AM

How to Restore a RAID Configuration After a Hard Drive Failure
 
I have only had to deal with this twice and both times it was a computer that belonged to someone else. I always do a backup of the system first while I can still get to the data.

The first time this happened, I just replaced the bad drive and the RAID manager saw the new drive and automatically duplicated the working drive. I was able to find an exact drive replacement.

The second time I did not have an exact drive, so I had to just replace both drives, which means I started over. I had to create a new RAID configuration and then performed a restore. Everything was then back to normal.

I would ask if you have a bad hard drive, can you find an exact replacement or must you replace both hard drives?

Thank You,

a_l_a_n 03-25-2007 01:44 PM

I dont have any reason to think theres physical damage. I have no problems with the RAID 0 partitions on the rest of the disk at least. The RAID 1 is my /boot partition.

What do you mean by "performed a restore"? I think thats the bit I want to do ... from the good disk, to the bad one.

Quakeboy02 03-25-2007 03:00 PM

There is some information about repairing RAIDS on this page: http://www.linux.com/howtos/Software...-HOWTO-4.shtml

makyo 03-26-2007 07:16 AM

Hi.

I did this yesterday.

Background: RHEL4/U4 on a SunFire X2100. I was burning in a few (Seagate SATA) disks for eventual backup. I ran Sun's utilities in a stress mode for a number of hours to be sure there was no infant mortality. The SunFires have a nice mechanism for quickly removing and inserting disks on a carrier -- 4 screws to hold the disk in the carrier.

The procedure I followed was to place a new disk in a slot, and reboot. The version of mdadm that is in RHEL is 1.6.0. The status from mdadm noted the lack of a valid partition table on the new disk and mdadm "removed" the partitions from the RAID1 arrays. This box has a simple partition scheme -- two partitions -- /boot for booting, and everything else in an LVM in one other partition.

I rebooted with gparted, and wrote a "label" on the new disk -- the gparted term for empty partition table. I then booted back into RHEL. This time mdadm did not complain about the lack of a partition, but it didn't do anything about syncing the new disk.

I found a reference to sfdisk on a webpage, and wrote a little script to clone the partition table from the good disk to the new disk:
Code:

#!/bin/sh

# @(#) clone-partition-table    Use sfdisk to copy partition data.

echo " UID is $UID"
if [ ! "$UID" -eq 0 ]
then
        echo " You must be root, your UID is $UID." >&2
        exit 1
fi

GOOD=/dev/sda
NEW=/dev/sdb

echo
echo " GOOD disk is $GOOD"
echo "  NEW disk is $NEW"
echo
echo "If not correct, use ^C to abort, otherwise CR."
read junk

/sbin/sfdisk -d $GOOD |
sed -e "s+$GOOD+$NEW+" |
/sbin/sfdisk $NEW

exit 0

The performance is very fast and produces some output, the last part of which reads:
Quote:

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
I am not sure what the purpose of that is, and so I did not do that. I wanted to see if mdadm could handle things from that point onward, so I used mdadm to add the two partitions back into md0 and md1 arrays, and mdadm seemed happy. Of course, the sync takes some time for large amounts of data.

If you use the script, you will probably want to remove the last step of the pipeline before a final run so that you can verify that things look OK. The sfdisk utility is careful -- I had "sbd" in the first run instead of the correct "sdb", and it said it could not find the disk.

Once the sync was done, I off-powered, removed the "good" disk, and re-booted, simulating a broken disk in the RAID1 arrays. It booted flawlessly. I was grateful (not to mention impressed) that mdadm was able to re-construct the mbr contents (RHEL uses grub). I off-powered, inserted the good disk, rebooted, and it ran without incident. Using
Code:

cat /proc/mdadm
allows you to follow the syncing process.

In at least one web page, there was a section on how to use grub to fix up the disk for booting, but the version of mdadm that I used seemed to handle all that.

I assume your situation varies somewhat from mine -- you may be using IDE disks, you may have more partitions, etc., so you'll need to adapt to your circumstances.

Best wishes ... cheers, makyo


All times are GMT -5. The time now is 01:27 AM.