LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Raid1 fails to rebuild (unrecoverable read error) (https://www.linuxquestions.org/questions/linux-server-73/raid1-fails-to-rebuild-unrecoverable-read-error-655853/)

alienDog 07-15-2008 05:35 AM

Raid1 fails to rebuild (unrecoverable read error)
 
We have a server with RAID1 disk array built on top of an existing filesystem following the instructions here:

http://www.slacksite.com/slackware/raid.html

Unfortunately we are unable to get the disk actually mirrored (raidhotadd) due to the read error on the primary disk (i.e. the disk with data already on it):

aid1: sda : Unrecoverable I/O read error for block 35128192

After the error message, the recovery process starts over.

The disk with the error seems to be working normally and we haven't detected any problems caused by the faulty block. Is there a way to find out exactly what is the faulty block supposed to contain (i.e. which file does it belong to)?

Is there a way to force building the secondary disk even when there is an error on the primary, or is there some other workaround for this?

Thanks!

jowagner 07-15-2008 07:51 AM

Hi AlienDog,

Quote:

Originally Posted by alienDog (Post 3214957)
Is there a way to find out exactly what is the faulty block supposed to contain (i.e. which file does it belong to)?

You could try to read file by file, e.g.
find -type f -print0 | tee filelist.null | \
xargs --null md5sum > filelist.md5
Then compare filelist.* to see if there is a problem. (To use a normal text editor, first convert the Null bytes to newlines with tr '\0' '\n'.) If not, then the faulty block is somewhere is the free space of the filesystem. You can try to force the disk to re-allocate the block by writing lots of data to your filesystem. However, the faulty block could be hidden somewhere: The device blocks are 512 bytes in size, while the filesystem blocks are 4096 bytes. If then you have a, let's say, 3400 byte file and the only the last 512 bytes of the 4096 bytes allocated for the file are faulty, then the faulty block will never be touched by the filesystem (except if you append data to this particular file).

Quote:

Originally Posted by alienDog (Post 3214957)
Is there a way to force building the secondary disk even when there is an error on the primary, or is there some other workaround for this?

Workaround:
1. Remove the second disk from the RAID, leaving your current RAID1 in degraded mode.
2. Create a new (second) RAID1 with the second disk in degraded mode.
3. Create a new filesystem on the second RAID.
4. Copy all files to the new filesystem.
5. Switch to the new filesystem.
6. Deconstruct the first RAID (mdadm --fail followed by --remove).
7. Add the first disk to the new RAID.
(The resync is reading from the second disk and writing to the first disk. When the faulty block is over-written, it hopefully is readable again.)
8. Wait for resync to finish.
9. Check that the first disk is readable:
dd if=/dev/sda of=/dev/null bs=32k

JJ


All times are GMT -5. The time now is 07:51 PM.