LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   mdadm RAID 5, 6 disks failed. recovery possible? (http://www.linuxquestions.org/questions/linux-server-73/mdadm-raid-5-6-disks-failed-recovery-possible-676572/)

ufmale 10-15-2008 01:25 PM

mdadm RAID 5, 6 disks failed. recovery possible?
 
I have a RAID 5 linux machine setup using mdam. The RAID has 10 disks and 6 of them failed. Is there a way i can still recover the data?

In addition, I have learned in the past that mdadm may report that the disk failed, but the disk itself may still work if reformated. Is that true?

anomie 10-15-2008 02:07 PM

Let's just talk RAID 5 theory for starters. You're supposed to be able to lose one drive and continue operating normally.

You've lost six of ten?? (I can't speak to the other questions.)

PTrenholme 10-15-2008 02:46 PM

Was that a simultaneous failure of six drives? If so, were they on the same controller, cable, or power supply? In other words, are you sure you have multiple drive failures rather than a single drive infrastructure failure?

ufmale 10-15-2008 02:47 PM

That is right..
When i tried to start the RAID with mdadm

$ mdadm --assemble /dev/md0 /dev/sd[a-j]1

I got
mdadm: /dev/md0 assembled from 4 drives - not enough to start the array.

Any suggestion? Should i start reformating these disks?

thetakan 10-15-2008 11:45 PM

Can u check cable connection for two drives those are failing?

I haven't heard of 6 drive failure in my entire IT career.

Thanks though...

ufmale 10-16-2008 09:29 AM

Quote:

Originally Posted by thetakan (Post 3311670)
Can u check cable connection for two drives those are failing?

I haven't heard of 6 drive failure in my entire IT career.

Thanks though...

I puzzle that myself too. It might be a power surge since it does not connect to any UPS. The cable looks fine. I can see all 10 disks from "fdisk -l", but the mdadm does not want to build RAID with those 6 drives. Only 4 drives are good. I am thinking about reformating the drives, but post here in case someone has a good suggestion how to recover the data.

The disk enclosure is packed with 10 drives, There are 2 sata cables from this enclosure to a 2-port interface card to a Gentoo machine.

PTrenholme 10-16-2008 12:03 PM

Can you replace the 2-port interface card, at least temporary, to see if that makes a difference?

Since fdisk can see the drives, perhaps the problem is in the mdadmin control file. Have a look at it to see if it looks correct. Maybe a fsck -f on the drive containing the administrative file might be in order.

ufmale 10-18-2008 09:16 AM

Quote:

Originally Posted by PTrenholme (Post 3312416)
Can you replace the 2-port interface card, at least temporary, to see if that makes a difference?

Since fdisk can see the drives, perhaps the problem is in the mdadmin control file. Have a look at it to see if it looks correct. Maybe a fsck -f on the drive containing the administrative file might be in order.



I change the card, but i got the same problem.
fsck -f does not work since the drive is ext3. What information i would get from that command?

What is mdadmin[/b] control file?

PTrenholme 10-18-2008 10:24 AM

Quote:

Originally Posted by ufmale (Post 3314571)
I change the card, but i got the same problem.
fsck -f does not work since the drive is ext3. What information i would get from that command?

What is mdadmin control file?

O.K., first fsck is a script that identifies the Linux file system on a partition, and then runs the check program specific for that type of system. It works quite well on ext type file systems. (Note that you, of course, can only run the checking program if the partition is not mounted.) The information you would get would be a report of any problems found and what was done to correct those problems.

As to the mdadmin control file, according to man mdadm it's /etc/mdadm.conf, so I was suggesting, in effect, that you boot from a rescue disk and run fsck on your boot drive. I'm suggesting this because the simultaneous failure od six different hard disks is so implausible that some other cause seems more likely. Especially since fdisk seems to think the drives are in good shape.

By the way, man mdadm discusses several options for attempting to recover from RAID failures.

Oh, another thought: md is sometimes used to set up a RAID system using partitions of physical disks. Is your RAID composed of 10 different physical disk drives or is it made of a smaller number of physical drives using different partitions on those drives?

Yaniv-Fer 10-19-2008 10:19 AM

i found here some more details about software raid... maybe it will help you
http://www.linuxhomenetworking.com/w..._Software_RAID


here is an example of the mdadm.conf file

[root@bigboy tmp]# mdadm --detail --scan --verbose
ARRAY /dev/md0 level=raid5 num-devices=4
UUID=77b695c4:32e5dd46:63dd7d16:17696e09
devices=/dev/hde1,/dev/hdf2,/dev/hdg1
[root@bigboy tmp]#




again this is copy paste from the other site... DO NOT PASTE THIS IN YOUR FILE
this should give you a idea how this file should be looking...



good luck
Yaniv Ferszt

slackman 10-20-2008 08:24 AM

6 disk failed? ehh that's a bit strange. are they visible by controller? if not check power cables. if they are check each HD independently with mdadm -E /dev/sdX. run badblocks on failed devices.

could it be the enclosure sata controllers? do u have spare sil controllers that u can hook up all 10 HDs and force reassembly? just a thought


All times are GMT -5. The time now is 04:19 PM.