LinuxQuestions.org - Fedora raid recovery issue on /dev/md1

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Fedora raid recovery issue on /dev/md1 (https://www.linuxquestions.org/questions/linux-newbie-8/fedora-raid-recovery-issue-on-dev-md1-790823/)

Fedora raid recovery issue on /dev/md1

Hi there,

This is my first post on LinuxQuestions.org, so I hope I am posting this this in the right forum, and English is not my mother language, so I hope I made myself understandable.

I have this small Poweredge 350 server with two 80GB IDE hard drives running Red Hat Fedora 5, with software Raid 1.

Some months ago hda started to have unrecoverable I/O read errors for some blocks. I leaved it that way for some time then hdb also started having same kind of issues.

I got new hard drives and replaced hdb first, but the problem is mdadm hasn't been able to recover the raid.

I have three LINUX RAID partitions conforming the raid 1 between hda and hdb

/dev/md0
/dev/md1
/dev/md2

Device file /dev/md0
RAID level Mirrored (RAID1)
Filesystem status Mounted on /boot
Usable size 104320 blocks (101.88 MB)
Persistent superblock? Yes
Chunk size Default
RAID status clean
Partitions in RAID IDE device B partition 1
IDE device A partition 1

Device file /dev/md2
RAID level Mirrored (RAID1)
Filesystem status Mounted on swap
Usable size 2096384 blocks (2 GB)
Persistent superblock? Yes
Chunk size Default
RAID status clean
Partitions in RAID IDE device B partition 2
IDE device A partition 2

BUT for /dev/md1 I have not being able to recover it, it keeps recovering then error the starts again and again.

Device file /dev/md1
RAID level Mirrored (RAID1)
Filesystem status Mounted on /
Usable size 75939072 blocks (72.42 GB)
Persistent superblock? Yes
Chunk size Default
RAID errors 1 disks have failed
RAID status clean, degraded, recovering
Rebuilding progress 61 %
Partitions in RAID IDE device B partition 3
IDE device A partition 3

On the server's logwatch ==>
--------------------- Kernel Begin ------------------------

WARNING: Kernel Errors Present
end_request: I/O error, dev hda, sector ...: 379 Time(s)
hda: dma_intr: error=0x40 { Uncorrect ...: 379 Time(s)
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } ...: 379 Time(s)
raid1: hda: unrecoverable I/O read error for block 139968 ...: 2 Time(s)
raid1: hda: unrecoverable I/O read error for block 139969 ...: 1 Time(s)
raid1: hda: unrecoverable I/O read error for block 139971 ...: 4 Time(s)
raid1: hda: unrecoverable I/O read error for block 139972 ...: 1 Time(s)
raid1: hda: unrecoverable I/O read error for block 139973 ...: 7 Time(s)

---------------------- Kernel End -------------------------

--------------------- Smartd Begin ------------------------

Currently unreadable (pending) sectors detected:
/dev/hda - 48 Time(s)
157 unreadable sectors detected

Offline uncorrectable sectors detected:
/dev/hda - 48 Time(s)
153 offline uncorrectable sectors detected

---------------------- Smartd End -------------------------

So how can I force the raid to be recovered ignoring the hda errors, is there a way to recover this?

Best Regards

J. Boardman

Hi,

hopefully you have done backups. You better had replaced your hard drive while the first error occurred. The reported errors are seek errors which means blocks on the disk cannot be found or cannot be addressed correctly. This errors only comes up after all drive side replacement blocks are used. In some cases i found such errors when and IDE cable was damaged, so you can check if it will work after replacing the cable.

Otherwise, from my point, there seems to be no chance to recover a RAID when the hard drives both have such errors. Because not all data can be read from source and written to the target. As mentioned, hopefully you did regular backups.

Thanks for the reply, I'll check cables. Just wondering if there is a possibility to force the recover?

JB

Quote:

Originally Posted by jorgeboardman (Post 3872843)

Hi there,
Device file /dev/md2
RAID level Mirrored (RAID1)
Filesystem status Mounted on swap
Usable size 2096384 blocks (2 GB)
Persistent superblock? Yes
Chunk size Default
RAID status clean
Partitions in RAID IDE device B partition 2
IDE device A partition 2

First off this is an incredibly bad idea. You should not RAID your swap. You are writing the exact same swap to 2 drives which will just kill the life of the drives. I use mdadm to do RAID 1 on my machine but I do not put my swap partitions in a RAID Device. I have /dev/md0=/boot /dev/md1=/ /dev/md2=/home and then I have a Swap partition on each drive. Having swap in a RAID Device is probably part of why your drives died o begin with.

Hopefully you have a backup of your data. If you do i suggest you rebuild from scratch leaving swap on a separate stand alone partition and then load the data from backup. Just note if you re-image, when you get to the disk partitioner select the option to configure RAID and delete your devices before you try to repartition the disk. If you don't it is a royal pain to get it sorted out.

Also I would try doing e2fsck on the unmounted partitions before you re-image

Sorry for the ignorance on the swap deal, now you explained to me I have it perfectly clear, thanks. Now, the system is up and running what will yo recommend me to use to create a backup while the system still up and running?

JB

I use shell scripts for my backups. However if you just want to get it done look at something like Amanda.

Hye

i agree with worm5252 to use shell script using tar for small backup solution. If you use tape backup take a look at amanda. At home i use rsync to copy data to an usb drive.