LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   RAID 6 failure - 3 disks failed but 2 are ok, recovery possible? (http://www.linuxquestions.org/questions/linux-server-73/raid-6-failure-3-disks-failed-but-2-are-ok-recovery-possible-595453/)

auroraglacialis 10-29-2007 08:23 AM

RAID 6 failure - 3 (ot of 7) disks failed but 2 of them are ok, recovery possible?
 
(edit: misleading title. should be "RAID 6 failure - 3 (ot of 7) disks failed but 2 of them are ok, recovery possible?")

Hi.

I used to have a RAID 6 (created with mdadm) consisting of 7 HDDs (5+2). This summer, one disk died and I removed it. I figured that there are still (5+1) disks left, so one spare still - and continued working with it until I can buy a new HDD. But then disaster struck and took out another drive, /proc/mdstat told me there are now only 5 drives, so I have no reserve.

I checked the disk that gave up last with SMART and it came out ok, so I added it back in the RAID as a new drive and it started syncing. At about 2% the PC froze, upon reboot only 3 drives where in the RAID. I checked the missing drives with mdadm -E and they where ok, superblocks and all. So I figured I have to manually add them to the RAID for some reason.

Now the big mistake was to use --add for the first device I tried to get back in the RAID and looking at the RAID info, it was added as "spare". Then strang things happened again with the PC and I checked for Hardware errors. Found that 2 IDE controllers where not behaving well any longer, probably causing all the trouble.

Now to get the data back I tried to copy every HD that was at one time part of the RAID to image files (with dd if=/dev/hdx of=/mnt/backup/hdx), so I would only have to use the on-board controller. So now I have 6 files which are images of the 6 HDDs that where formerly installed, 4 of which where pretty much untouched, one was added as a new drive that started syncing while the RAID was still active and one was added as spare while the RAID was inactive.

Trying to simply assemble the RAID from this fails with an IO-Error, using readonly gives no valid filesystem.

I figure that the data should well be there, since one drive was only 2% written on (the rest should still contain the old data from a time it was still in the RAID) and one drive was just added as spare (it was not written at all unless adding it as a spare deletes the contents). And I basically need only one of them to have 5 valid disks to start the RAID.

Now how can I recover at least some data? Somehow telling the RAID to put the "spare" back in the place where it was before dropping out? Recovering some data from the HDD that was in the RAID, dropped out and was added as a new HDD until it was about 2% in the resyncing process?

Hope someone can help me. I put some personal and business files on the RAID which are lost now.

Many thanks
Aurora

aylen 10-31-2007 01:22 AM

Trying various options at this critical stage would be dangerous. I think contacting RAID Recovery Services like Disk Doctors Labs Inc. would be the best solution for your problem.

auroraglacialis 11-01-2007 09:11 AM

private solution required
 
Hi.
Since I am not a company and do not have the means to spend a lot of money on this, I'd prefer a private solution. I made image-copies of the original drives and kept the original drives, so I could try some recovery with the drive images and let the original HDDs untouched. Of course read-only recovery is preferrable though. I have backups of some of the data and some data is not critical (photoscans, Audio-CD copies, old Photoshop-Files), but about 5% of the data is not in backups and not recoverable from other sources (Photo printouts etc).

If the definite answer is: "The data is not recoverable at all or only by professionals for hundreds of " then that is a definite answer, too. I would have to start restoring the backups etc which means a lot of work and I will be missing some files, but at least I could free the HDDs and backup-HDDs and start filling them with data again.

Greetings
Aurora

koflanagan 11-09-2007 03:14 PM

Hmm 3 out of 7 disk are dead in a raid 6.. I would say is not recoverable..

auroraglacialis 11-10-2007 11:25 AM

Quote:

Originally Posted by koflanagan (Post 2953673)
Hmm 3 out of 7 disk are dead in a raid 6.. I would say is not recoverable..

No actually only one disk is really dead. Two more howeveer are somehow out-of-date. One was removed from the array and re-added as a new disk, but the recovery-process stopped at 2% due to a Controller failure. Another disk was also removed from the RAID and then re-added, but it was taken up as a "spare" disk, not re-integrated in the RAID, although nothing on the disk was changed (it was part of the array before and just got kicked out due to the same Controller failure).

So basically I started with a RAID with 7 disks, then
* one disk died and was removed.
* RAID had 6 drives left
* An unknown error kicked disk A out
* RAID had 5 drives left
* Disk A was checked with SMART and came up error-free
* Disk A was added to the RAID
* Array started "recovery" on Disk A and aborted at 2%
* After reboot, Disk A and Disk B was missing from the Array
* YUK - only 4 Drives in the Array
* Did mdadm --add on Disk B, effectively adding it as "spare"
* YUK now it's 4 drives and 1 spare although Disk B was the 5th Disk!
* Controller was determined to be the source of the problem
* Made a dd-Copy of all HDDs to experiment with recovery options

So Disk B should still contain the data, unless adding it as spare (however that happened) deleted all the content.
And Disk A should at least contain 98% of the data that was in there before the recovery started.

It is not vital to recover all of the content, it would be ok to recover most of it.


All times are GMT -5. The time now is 11:53 PM.