LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Software RAID1 Recovery Issues (SARGE) (https://www.linuxquestions.org/questions/linux-server-73/software-raid1-recovery-issues-sarge-548223/)

jimbo1954 04-23-2007 02:47 AM

Software RAID1 Recovery Issues (SARGE)
 
I have a mailserver, running Sarge, on a 2.6.8 kernel (yeah, I know its a bit behind, but if it ain't broke, don't fix it). We had some electrical "perturbations" round here just recently (an *enormous* electrical storm), and despite having surge protection, etc, one of the two disks in the RAID1 array got upset. I tried to resynch it, but each time it got to around 48%, it failed and started again, and looking in the syslog, I have irrecoverable errors, so I think its safe to say the disk is cooked. OK, so I 'mdadm --fail' and 'remove' the drive and check that its no longer shown as part of the array. Thankfully the other drive of the mirror pair is just getting on with life.

Tried to power off, take what I thought was the bad drive out, replace it with another, restart, rebuild, but it refused to boot with one of the disks changed. Looks bad, so I returned it to the original configuration, and it boots fine. Hmmm...must have taken the wrong drive out :( So I took the other drive out and it starts to boot and goes a fair way through, then throws a kernel panic, because it couldn't recognise a drive in the array...But I already failed and removed the drive that was bad...???

Anybody got any Ideas? The system is running on one disk now, which makes me nervous, but I'm also nervous now, about trying to recover the array, as it appears I'm getting something wrong with the recovery process....Is there anybody out there with the *faintest* idea of what I've managed to screw up:scratch:

Thanks Guys!

Micro420 04-23-2007 10:33 AM

Maybe you removed the wrong drive through mdadm???

If I were you, I would mirror that existing drive just in case!!!

jimbo1954 04-24-2007 02:47 AM

I wish it were that simple! I have /dev/hda5 and /dev/hdc5, and it was reporting that hdc5 was running, hda5 was recovering. Using mdadm, I failed/removed hda5 after several abortive attempts hda5 made to resynch, which always ended in a cascade of irrecoverable I/O errors at around 49%. Having failed/removed the disk, mdadm --detail showed it was hda5 that was removed, and that hdc5 was running clean.

It's like the system "knew" how to boot ok with a complete RAID1 system, but when half the RAID1 array was dead, GRUB didn't know how to manage the situation. Is there anything about the Debian RAID1 build that makes it do the initial GRUB load from one disk particularly?


All times are GMT -5. The time now is 04:58 AM.