LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-18-2006, 05:14 AM   #1
Cairan
LQ Newbie
 
Registered: Jul 2004
Location: Quebec
Distribution: Slackware 9.0/9.1/10.0
Posts: 10

Rep: Reputation: 0
Angry Software RAID 5 crash and wrongful failed disk flagged


Ok, I've just had a nasty hard-drive crash happen to me, and as Murphy would have it, it happened at the end of the semester when I have a ***load of work to hand-in Tuesday.

Thankfully, I had upgraded recently to Slackware 11, and still had the old copy of my previous setup laying on a soon-to-be erased partition on one of my disks... so it is not as bad as it could have been.

Now to the issue at hand:

I have 4 hard-drives, exactly identical WD 160 GB. Two of those are new, the others are 2 years old. One of my 18000 hours-old HD just died this afternoon, with no prefail indications prior to this. Now smartctl -a /dev/hdg reports that it's pretty much dead.

However, when it died, Software Raid, RAID 5 on 4 disks, kicked the wrong drive, /dev/hde, from the arrays, and kept (!!?!) the bad one!
When I came home, everything was frozen solid, not even a kernel panic message, nothing... Trying to restart things, /dev/md0 which is my root RAID5 device wouldn't initialise properly... So got the rescue CD out, and I was shocked, after quite a bit of fiddling, to extract the /var/log/syslog file and see that the wrong disk had been kept up...

Since activity went on, instead of stoping right there, all files which have been touched in any ways during the few hours it took for the system to finally die are corrupted. Forcing the reinsertion of the "good" disk in the arrays does enable the salvage of quite a lot of stuff, unfortunately, the databases for mysql are wrecked beyond repair.

So, does anyone have suggestions to actually prevent this kind of horror story to repeat itself with Software RAID... It's pretty much evident, from the SMART diagnostics, that /dev/hde was good all along. I still don't get why the bad disk wasn't brushed aside, which would have stopped all the arrays before loosing their sync.

Last edited by Cairan; 12-18-2006 at 05:17 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Will a ex - Software Raid 1 disk boot without Raid software? carlosruiz Linux - Software 0 05-27-2006 01:12 PM
Software Raid with IDE: Should I be worried about failed drives? iammisc Linux - Hardware 2 02-26-2006 10:59 AM
software raid using disk druid paul_mat Fedora 0 12-13-2005 05:24 PM
Software RAID 5 / Reiserfs Crash cspos Linux - Software 1 12-08-2005 06:00 PM
Software Raid - recreate failed disk. FragInHell Red Hat 5 11-25-2004 04:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration