I have a mailserver, running Sarge, on a 2.6.8 kernel (yeah, I know its a bit behind, but if it ain't broke, don't fix it). We had some electrical "perturbations" round here just recently (an *enormous* electrical storm), and despite having surge protection, etc, one of the two disks in the RAID1 array got upset. I tried to resynch it, but each time it got to around 48%, it failed and started again, and looking in the syslog, I have irrecoverable errors, so I think its safe to say the disk is cooked. OK, so I 'mdadm --fail' and 'remove' the drive and check that its no longer shown as part of the array. Thankfully the other drive of the mirror pair is just getting on with life.
Tried to power off, take what I thought was the bad drive out, replace it with another, restart, rebuild, but it refused to boot with one of the disks changed. Looks bad, so I returned it to the original configuration, and it boots fine. Hmmm...must have taken the wrong drive out
So I took the other drive out and it starts to boot and goes a fair way through, then throws a kernel panic, because it couldn't recognise a drive in the array...But I already failed and removed the drive that was bad...???
Anybody got any Ideas? The system is running on one disk now, which makes me nervous, but I'm also nervous now, about trying to recover the array, as it appears I'm getting something wrong with the recovery process....Is there anybody out there with the *faintest* idea of what I've managed to screw up