Software Raid 5 (md) recovery using mdadm
Hello,
after receiving the following errors from two of my four disks in my md0-array Code:
root:~# cat /etc/mdadm/mdadm.conf Code:
syslog: I already tried to readd the dirty disks without success: Code:
root:~# mdadm -a /dev/md0 /dev/hde1 Code:
/dev/hde1: Code:
/dev/hdf1: Code:
/dev/hdg1: Code:
/dev/hdh1: |
Hmmmm ... messy. I lost a hardware RAID1 on a Promise controller (yes I know, I should have known better ... hindsight ... blah!). Burned hand teaches best so they say - I did quite a bit of testing on soft RAID1 and RAID5. Could not get soft RAID5 to be acceptable after pulling power cables out to check the results. RAID5 doesn't like it very much and has a tendency to refuse to mount the volume because the filesystem isn't clean and the RAID is still critical. I go for soft RAID1 or hardware RAID5.
If you've lost two drives, then the RAID set is dead. There are some specialist tools which can attempt to recover some data from a multiple disk RAID5 failure, but given the way the data is written, I wouldn't be too hopeful about what that would get back. A multiple drive failure at one time is quite rare. Not unheard of though. Assuming that hdg and hdh are master and slave on a single IDE bus, it is possible that the failure of one of the drives is causing some weird bus errors and making it look like the other drive has problems too - I've seen that before. Could try removing each drive from the bus and trying to boot the system to see if either of the drives miraculously recovers, then replace the failed drive and rebuild the RAID. |
Thanks for your recommendations. As you mentioned: burned hand teaches best. Your assumptions regarding the drives were correct. There are four drives connected. Two as masters and two as slaves on two IDE busses. Unplugging each device one by one and trying to start the array didn't succeed.
Code:
~# /etc/init.d/mdadm-raid start Any other suggestions? |
Also, though multiple drive failures are uncommon, if you purchased all the drives at the same time, when one fails the others are sure to follow. I've seen RAID 5 on hardware controllers die twice during rebuilds. Generally, when one drive fails, go through the cycle and swap every drive in the array, or you may be sorry.
Or get enough disks that RAID 6 makes sense. That is all we use at work now, it is RAID 5 with an additional hot spare. Most hardware controllers will allow the hot spare to be any of the physical drives in the array, so when one goes bad the hot spare takes its place, then you pull the bad drive out, put a blank drive in, and set it as the new hot spare. Much safer. I've yet to see a RAID 6 failure. Software RAID 5 sounds like a very bad idea to me. I am aware that it is possible, but any data important enough to be on a RAID 5 array is also important enough that the additional $300 or so is spent on a hardware controller. Peace, JimBass |
I had (am having) a similar problem with a Silicon Image 3124 PCI-X Serial ATA controller on as Norco DS-500 storage array. The sata_sil24 driver with port multiplier is still a bit experimental, and I've only been able to get to work with it is a patch on a 2.17.4. The controller timed out and two of 5 drives in a raid5 were lost. The drives were still good, but adding the "failed" drives would not work. I was however able to recover the raid array by re-creating ( mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[d-h]1 ). If the airflow ide-Ribbons were the problem, and the drives did not really "fail" (i.e., just in the array), this might work for you.
I got the following messages during the process: mdadm: /dev/sdd1 appears to contain an ext2fs file system size=1953535744K mtime=Thu May 17 22:24:08 2007 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007 mdadm: /dev/sde1 appears to be part of a raid array: level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007 mdadm: /dev/sdf1 appears to be part of a raid array: level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007 mdadm: /dev/sdg1 appears to be part of a raid array: level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007 mdadm: /dev/sdh1 appears to contain an ext2fs file system size=2005702402K mtime=Wed Nov 28 02:32:38 2007 mdadm: /dev/sdh1 appears to be part of a raid array: level=raid5 devices=5 ctime=Sat May 5 15:19:07 2007 Continue creating array?y mdadm: array /dev/md0 started. I was then able to mount the array and access all files. good luck, Chris |
Hi all (first post here)!
I had identical problems this evening with my Fedora 7 server, which has 4 drives hanging off the Nvidia SATA controller in a RAID 5 array. The console started spitting out "ATA: Abnormal Status" errors, then after rebooting the RAID 5 array would not mount. After attempting the maintennance/reassemble options with mdadm I had no success so I turned to Google and stumbled across this thread. Thanks to fsbooks's reply I have successfully re-created the array and mounted it without any problems. I am not sure what caused this problem. I am running Fedora 7 with kernel: 2.6.21-1.3228.fc7. Would be interesting to see how many others have come across this. I had the same issues as fakeroot with 2 drives showing the "removed, faulty removed" status when examined with mdadm. |
Echoing the previous comment: I have a media server with a lot of large files on it - too much to effectively backup until BluRay comes down in price a lot. So I had this bright idea about using software RAID. I purchased 2 more 500G SATA drives and created a RAID 5 array on them, with my original drive "missing". Then I copied my files onto it.
I verified that it survived rebooting. So far, so good. So I repartitioned my original drive and added it to the array. I left for work with it syncing nicely. I came home to find that two drives had failed - probably a loose power cable (off a splitter from 1 IDE to 2 SATA) because reseating everything brought the drives back to life - but not the array. Followed fsbook's advice using the two drives I'd originally setup, in case the sync hadn't completed, then added the third drive. My files are there and the drives are once again syncing. So far, so good.... |
Lucky garydale!
I've tried out the advice from the past posts on my drives from the thread starter. Never got the files back. I think a loose power cable was the reason too. Thanks in advance. |
Hi
If /dev/md0 raid partiton has already created is it possible to create /dev/md1 with another disks or partitions. If so then why i'm getting this error below: [root@cjpunjabiradio ~]# mdadm -C /dev/md1 --level=5 --raid-devices=2 /dev/hda{14,15} mdadm: error opening /dev/md1: No such file or directory Also look my md0 configurations [root@cjpunjabiradio ~]# mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Sun Jun 15 17:50:54 2008 Raid Level : raid5 Array Size : 104320 (101.89 MiB 106.82 MB) Device Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Jun 15 18:14:34 2008 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : b28a774d:6f5ec466:2611d4aa:2bdd8c95 Events : 0.10 Number Major Minor RaidDevice State 0 3 12 0 active sync /dev/hda12 1 3 13 1 active sync /dev/hda13 I'm new in linux so i have little bit knowledge in raid. please assist. Thanks Charanjit Cheema |
1) RAID5 really requires a minimum of 3 drives (Raid5 on Wikipedia). You can setup RAID0 (<-- makes 2 drives look like 1 big drive, no backup) or RAID1 (mirrored drive) with 2 drives if you wish.
2) Here are the links I used to setup my mdadm RAID array: http://tldp.org/HOWTO/Software-RAID-HOWTO.html http://ubuntuforums.org/showthread.php?t=408461 I bookmarked these suckers and refer back anytime I have RAID questions. |
I've been having a problem with my RAID arrays (WD 250GB SATA drives) failing frequently (once per month?), mostly when under a large load (they max at about 5MB/sec). Two disks will simultaneously fail, but every time they fail, I'm able to recover by recreating the array. Any idea what this could be? Anyone else have this happen?
On another note, I've also been able to successfully move the array from one computer to another (this is a data RAID5 array, OS is installed on separate single drive) by using the create command. mdadm will recognize the drives as already being part of an array and will recover them with my data in tact. |
Quote:
Generally, if the RAID has crashed then the filesystem will have a problem mounting, fsck the filesystem or switch to a journalled filesystem like ext3 to minimise that risk. In my experience it's hardware RAID system which are harder to recover as you're limited to the tools available to you in the BIOS of the controllers vendor. With software RAID you're not quite so limited and in many cases, you can recover from situations where you'd be stuck if you were running hardware RAID. (I've successfully recovered from a 2 disk failure in a software RAID5 array of 7 drives without losing much data, so it's certainly possible) It's also *much easier* to be able to pull the disks out of a machine and drop them into an entirely different system running a different linux distribution and even a different architecture, ie: PPC to x86 or Sparc. Doing that with a hardware RAID card can cause driver issues and all sorts. Software RAID is incredibly flexible. Either spend lots of money on a hardware RAID5 controller with battery backed cache and a well trusted chipset or stick with Software RAID5. Anything else is a false economy. Quote:
Quote:
|
Quote:
What you're describing is just RAID5 with a hot-spare. You can achieve this with Software RAID5 under linux by defining one or more hot-spares. If a drive fails in the RAID5 set then the hot spare is automatically brought into the array and the array is rebuilt onto the hot-spare. RAID6 is RAID5 with two parity blocks, rather than 1. Quote:
RAID5 or any level of RAID is not a replacement for a good back-up strategy. Hardware controllers don't give you any additional resilience or safety for the given raid level unless they utilise battery backed cache or similar. There is nothing intrinsically wrong with Software RAID5 and in many cases it can be more flexible and resilient than a hardware controller. In almost all instances it's a better and safer bet than a hardware RAID5 controller without battery backup. Cheers, John |
I trust software RAID more than hardware RAID
Echoing the previous comments, hardware RAID controllers don't offer anything that software RAID doesn't have. They just move it to a controller instead of letting the OS handle it. In practical terms, this means you've got an extra piece of hardware that can mess up, along with some special drivers that aren't exactly mass market items.
Software RAID on the other hand needs no extra hardware. And the software RAID drivers don't have to handle as much as the hardware RAID controllers. Some hardware RAID controllers do have a battery backup to allow them to save unwritten data but this is not a substitute for a decent UPS and proper shutdown during a power outage. |
Similar problem... please help!
Hi there:
I have a similar problem - a raid 5 array with 11 drives and one drive sdh encountered problems. I tried to rebuild but sdb failed midway and my array was degraded. I followed some advice to recover the data using mdadm -C /dev/md0 /dev/sd[efghiabcdkj]1 both using command line and webmin but the drive order sde[0], sdf[2], sdg[3].... sdk[9], sdj[10] was messed up and the array was reordered sda[0]... sdk[10]. I tried mounting and received a VFS: ext 3 file system not found... I've tried for a week now to recover the data (which consists of personal data i saved over the last 20 yrs and work data i have spent the last 2 years working on) but to no avail. Any help is greatly appreciated. Thanks in advance. |
All times are GMT -5. The time now is 09:59 PM. |