LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 10-11-2009, 03:14 PM   #1
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Rep: Reputation: 15
Failed RAID5 disk array, questions about mdadm and recovery


My server is running openSUSE 11.1 and I built a RAID5 array using Yast and three identical Samsung 1TB disks that mount as /dev/sdb1 /dev/sdc1 and /dev/sdd1. Everything ran fine for a couple of months then suddenly, for no apparent reason, the computer failed to start. It boots from another disk /dev/sda1 but tries to mount the raid array /dev/md0 and as this fails the boot stops at a rescue prompt. Not good. I removed /dev/md0 from fstab and the PC now at least boots to the GUI so repairing might be easier.

After a bit of fiddling about and trying solutions mentioned in other threads it seems my problem is slightly different, hence the new thread. What seems slightly contradictory is that all my three disks appear clean:

Quote:
Server:/ # mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x1
Array UUID : 9c3b5bc4:67da9f15:e071825b:9c2c48ab
Name : poodle:0
Creation Time : Tue Jun 16 22:13:17 2009
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 1953503728 (931.50 GiB 1000.19 GB)
Array Size : 3907006976 (1863.01 GiB 2000.39 GB)
Used Dev Size : 1953503488 (931.50 GiB 1000.19 GB)
Super Offset : 1953503984 sectors
State : clean
Device UUID : 295984b5:7e86e77b:6300abfc:1ca278ce

Internal Bitmap : -234 sectors from superblock
Update Time : Tue Oct 6 21:43:02 2009
Checksum : d7f6af2 - correct
Events : 9802

Layout : left-asymmetric
Chunk Size : 128K

Array Slot : 0 (0, failed, failed, 2)
Array State : U_u 2 failed

Server:/ # mdadm -E /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x1
Array UUID : 9c3b5bc4:67da9f15:e071825b:9c2c48ab
Name : poodle:0
Creation Time : Tue Jun 16 22:13:17 2009
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 1953503728 (931.50 GiB 1000.19 GB)
Array Size : 3907006976 (1863.01 GiB 2000.39 GB)
Used Dev Size : 1953503488 (931.50 GiB 1000.19 GB)
Super Offset : 1953503984 sectors
State : clean
Device UUID : a52f3171:9ecd0ce0:d297e7d9:93cf72af

Internal Bitmap : -234 sectors from superblock
Update Time : Tue Sep 29 21:49:09 2009
Checksum : eb820fb8 - correct
Events : 8608

Layout : left-asymmetric
Chunk Size : 128K

Array Slot : 1 (0, 1, failed, 2)
Array State : uUu 1 failed
Server:/ # mdadm -E /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x1
Array UUID : 9c3b5bc4:67da9f15:e071825b:9c2c48ab
Name : poodle:0
Creation Time : Tue Jun 16 22:13:17 2009
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 1953503728 (931.50 GiB 1000.19 GB)
Array Size : 3907006976 (1863.01 GiB 2000.39 GB)
Used Dev Size : 1953503488 (931.50 GiB 1000.19 GB)
Super Offset : 1953503984 sectors
State : clean
Device UUID : 399e57c7:cf4d16dc:91a54f6b:2830c350

Internal Bitmap : -234 sectors from superblock
Update Time : Tue Oct 6 22:58:34 2009
Checksum : 7071bc41 - correct
Events : 9802

Layout : left-asymmetric
Chunk Size : 128K

Array Slot : 3 (failed, failed, failed, 2)
Array State : __U 3 failed
This also makes the problem sections rather obvious, in red. I checked the mdadm docs and the explanations I found don't quite match, and in any case didn't offer any solutions.

My attempts at recovery so far has been to try various combinations of
Quote:
mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
but these usually result in output like this
Quote:
Server:/ # mdadm --assemble --force -vv /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md/0, slot 0.
mdadm: /dev/sdc1 is identified as a member of /dev/md/0, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md/0, slot 2.
mdadm: added /dev/sdc1 to /dev/md/0 as 1
mdadm: added /dev/sdd1 to /dev/md/0 as 2
mdadm: added /dev/sdb1 to /dev/md/0 as 0
mdadm: failed to RUN_ARRAY /dev/md/0: Input/output error
Tomorrow I will buy a couple of new disks and add these to the array to see if that helps the repair/recovery process but if anyone has suggestions I would be eternally grateful.
 
Old 10-12-2009, 04:25 PM   #2
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
Still stumbling around in the dark. It seems that /dev/sdc has failed so I go another identical drive and replaced the faulty one. The drive is new, I used fdisk to create 1 partition of type 0xFD and then formatted with mkfs -t ext3 /dev/sdc1.

The disks are now:
Quote:
# fdisk -l

Disk /dev/sda: 300.0 GB, 300090728448 bytes
255 heads, 63 sectors/track, 36483 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xb19dd427

Device Boot Start End Blocks Id System
/dev/sda1 * 1 35728 286985128+ 83 Linux
/dev/sda2 35729 36483 6064537+ 5 Extended
/dev/sda5 35729 36483 6064506 82 Linux swap / Solaris

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000e4a57

Device Boot Start End Blocks Id System
/dev/sdb1 2 121601 976752000 fd Linux raid autodetect

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x259714c5

Device Boot Start End Blocks Id System
/dev/sdc1 1 121601 976760001 fd Linux raid autodetect

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000e1fb0

Device Boot Start End Blocks Id System
/dev/sdd1 2 121601 976752000 fd Linux raid autodetect
As far as I know this should be enough but still I can't get the new /dev/sdc1 added to the array. This is perhaps as the old disk was also /dev/sdc1 and mdadm has it labelled as bad/faulty somewhere, but how to clear this?

Quote:
# mdadm /dev/md0 -a /dev/sdc1
mdadm: add new device failed for /dev/sdc1 as 4: Invalid argument
Is as far as I get. Any offers of help? Meanwhile I've read a fair bit more but still am not making progress. The array's content is largely backed up but, as uasual, the restoration process is not without a fair bit of pain and it would be very nice if it would spring back to life. Thanks in advance...
 
Old 10-13-2009, 02:39 PM   #3
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
Ok, not generating a huge amount of suggestions here...

It seems that I might be able to issue the create command again, ie.
Quote:
mdadm --create /dev/md0 -v --level=5 --raid-disks=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
and this may re-create the array for me, without over-writing the disks' contents. (Reference). The question is now, is this better with the [possibly] failed /dev/sdc1 or the new clean /dev/sdc1 that I bought and formatted yesterday?

In any case it seems my efforts to add the new clean /dev/sdc1 have not been successful.
Quote:
# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdb1[0] sdd1[3]
1953503488 blocks super 1.0

unused devices: <none>
What is going on?
 
Old 10-13-2009, 03:02 PM   #4
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
A leap in the dark

As all other avenues seem totally dark I tried the create command, now I have:
Quote:
# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Tue Oct 13 21:51:52 2009
Raid Level : raid5
Array Size : 1953503872 (1863.01 GiB 2000.39 GB)
Used Dev Size : 976751936 (931.50 GiB 1000.19 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Oct 13 21:55:17 2009
State : clean, degraded, recovering
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 64K

Rebuild Status : 4% complete

UUID : 2f771e5c:ed85feca:39b3243c:51ef9e42 (local to host PoodleServer)
Events : 0.8

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
3 8 49 2 spare rebuilding /dev/sdd1
I don't quite follow the device numbering but anyway at least the 'rebuilding' bit seems to be progress, and three disks are involved, incidentally with the new clean disk in /dev/sdc1. If any data is recovered from this it will just be a bonus

Last edited by HellesAngel; 10-14-2009 at 10:16 AM. Reason: Added glorious colour and some details.
 
Old 10-13-2009, 03:37 PM   #5
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
And while the array rebuilds here are some other useful resources:
Google Books: Managing RAID on Linux
Heroic Journey to RAID5 Data Recovery
Another SUSE 'figured it out by myself' story
RAID5 Data Recovery on this very site, but not the same problem I had.

Last edited by HellesAngel; 10-13-2009 at 03:40 PM.
 
Old 10-14-2009, 12:31 AM   #6
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
So, the recovery seemed to go well last night, now the assemble looks good too:
Quote:
# mdadm --assemble --verbose /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md/0, slot 0.
mdadm: /dev/sdc1 is identified as a member of /dev/md/0, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md/0, slot 2.
mdadm: added /dev/sdc1 to /dev/md/0 as 1
mdadm: added /dev/sdd1 to /dev/md/0 as 2
mdadm: added /dev/sdb1 to /dev/md/0 as 0
mdadm: /dev/md/0 has been started with 3 drives.
However /dev/md0 still won't mount:
Quote:
# mount /dev/md0 /mnt/raid
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
# dmesg
8< ---- snip ---- >8
md: md0 stopped.
md: bind<sdc1>
md: bind<sdd1>
md: bind<sdb1>
xor: automatically using best checksumming function: generic_sse
generic_sse: 8796.000 MB/sec
xor: using function: generic_sse (8796.000 MB/sec)
async_tx: api initialized (async)
raid6: int64x1 2321 MB/s
raid6: int64x2 3017 MB/s
raid6: int64x4 2495 MB/s
raid6: int64x8 2119 MB/s
raid6: sse2x1 4097 MB/s
raid6: sse2x2 4605 MB/s
raid6: sse2x4 7268 MB/s
raid6: using algorithm sse2x4 (7268 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid5: device sdb1 operational as raid disk 0
raid5: device sdd1 operational as raid disk 2
raid5: device sdc1 operational as raid disk 1
raid5: allocated 3234kB for md0
raid5: raid level 5 set md0 active with 3 out of 3 devices, algorithm 2
RAID5 conf printout:
--- rd:3 wd:3
disk 0, o:1, dev:sdb1
disk 1, o:1, dev:sdc1
disk 2, o:1, dev:sdd1
EXT3-fs error (device md0): ext3_check_descriptors: Block bitmap for group 1920 not in group (block 0)!
EXT3-fs: group descriptors corrupted!
EXT3-fs error (device md0): ext3_check_descriptors: Block bitmap for group 1920 not in group (block 0)!
EXT3-fs: group descriptors corrupted!
So, it seems there's still some file system checking to be done. No time now, will have a look tonight.
 
Old 10-14-2009, 03:56 AM   #7
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
So, Google reveals:
Ubuntu Recovering mdadm superblocks, a similar story.
A similar story that ended in disaster (wiped array), and one that ended happily.

I'm a little out of my depth here, I would appreciate words of wisdom.

Any advice on how to proceed?

Last edited by HellesAngel; 10-14-2009 at 04:04 AM.
 
Old 10-14-2009, 01:45 PM   #8
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
So, I tried swapping the order of disks in the mdadm --create command (Reference) to no avail, and now waiting for e2fsck -y /dev/md0 to do its stuff, it's doing a lot. The array is probably trousered, the data is gone, and what's even worse than having to recreate it all from the backups is I don't know how I lost the entire array when only one disk failed from a three disk RAID 5 array.

What went wrong? Could it have been that I swapped the cables round to see if there was a bad connection and something got confused? I don't know, but I do know a bit more about mdadm now, small consolation.

Edit: Yes, the array was destroyed, all data lost. No idea what happened, but backups are important, aren't they.

Last edited by HellesAngel; 10-14-2009 at 02:05 PM.
 
Old 10-15-2009, 12:57 AM   #9
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Quote:
I don't know how I lost the entire array when only one disk failed from a three disk RAID 5 array.
3 disks is the min num for a RAID 5 array; if one breaks you are SOL (as you discovered)
http://en.wikipedia.org/wiki/RAID
 
Old 10-15-2009, 10:20 AM   #10
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
Thanks for the reply - I was under the impression that a RAID 5 array with three disks could suffer a single disk failure and still recover fully. I read the docs carefully, I thought, before building the array and understood this to be the case, and even on re-reading the Wikipedia article it's still the way I understand it. Indeed, following the link to the Wikipedia RAID 5 article leads to this text:
Quote:
A minimum of three disks is required for a complete RAID 5 configuration. In some implementations a degraded RAID 5 disk set can be made (three disk set of which only two are online), while mdadm supports a fully-functional (non-degraded) RAID 5 setup with two disks
I understand this to mean that a RAID5 array built from three disks may or may not be online after a single disk failure [this isn't important to me] but recovery is possible when the failed disk is replaced.

So, now I'm rebuilding the array what is the best constellation to use? I have a maximum of five 1TB disks available, need a capacity of at least 2TB, need security against a single disk failure, and downtime in the array is not a big problem when rebuilding. Must I go for four active disks in the array + one spare to have any data security, or is three active plus one spare enough?
 
Old 10-15-2009, 10:52 PM   #11
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,269

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Well, from the same page, I agree you should have been able to recover from a single disk failure; if that's all it was.
RAID6 (min 4 disks) allows for up to 2 disk failures. As it says, with bigger disks these days, its a good idea, because a disk rebuild takes a long time, but you can still use the rest of the array at the same time.
You could just put all 5 disks in the array eg 5 active (given prev sentence) or 4+1.
With RAID 6 I'd says you might as well use all 5 actively, lots more disk space and still allowed up to 2 disk failures.
 
Old 10-16-2009, 06:52 AM   #12
HellesAngel
Member
 
Registered: Jun 2007
Posts: 84

Original Poster
Rep: Reputation: 15
Thanks for the advice. I use openSUSE and its Yast config tool only gives RAID5 as a choice but, following forced learning about mdadm, I feel confident about configuring a RAID6 manually - I'll set it up with four active disks and one spare.

And then get a 3TB ZyXEL NSA-220 for a backup server...
 
Old 11-06-2009, 01:06 PM   #13
miguel.arce
LQ Newbie
 
Registered: Nov 2009
Posts: 1

Rep: Reputation: 0
Never on a live system

Hi, i am happy to colaborate with you... it will be interesting to learn how to recover from a disaster such as this and to learn ways to still be able to retreive some files.

As a suggestion, NEVER issue any commands that you are not familiar with, specially when it comes to stored information, i also have a raid5 array with 4 500 GB drives.

To start clarifying things, the minimium disk required by an array 5 is 4, not 3.

the hole thing about the array is how the system saves things to the array, as an example, supose you have a file made of 3 mb in an raid5 device, the file is divided in 3 "1 mb parts", and then the 1rst mb goes to device 1, the second to device 2, the third to device 3. now, if any of those 3 pieces are missing, the file is lost, nothing more to do.

What makes an array 5 to be able to survive a disk failure is another disk, the 4th disk wich stores special information called parity.
the concept is pretty much this: any one piece missing can be "calculted" if you have two pieces and the parity, or the parity can be calculated if you have the 3 pieces.

When a raid5 device is left with 3 hardrives, you are an the start of the abissm, one disk crack and you are left with only 2 pieces of each file, to put it simple, the raid5 can still operate with 3 healthy disks, but its called a "degraded array"

i am not sure what you did to the array, not sure if its recoverable but, let me do some reading about what you did.

one recomendation, download and install some virtual machine software, create a virtual machine, create a raid5 in that virtual machine, install it the exact same software that you had, and lets do the tests on that raid, and only until you are sure of the result of an action, do it on the real array.

as little as i readed, you had an array that was able to asemble, then you tried to install a new disk, so you tried to assemble a raid5 array with 2 original members and one blanc disk and that trigerred the recovery.. dont now for sure but i have the feeling the only disk that
got written was the new disk, the other two i hope are intact.
and the original third disk is still available.

if we are in luck, the pieces of your files are still in tree separated good disks, so some files should be recoverable, the trick will be, making mdadm to accept this disks as part of an array again, and force it to start them, once the array is started, the /dev/mdx device becomes available and you could use tesdisk, etc. to recovery the partition information and repair it. and maybe then, we will see some intact files.

regards, i'll be checking this threat once a week until Christmas.

Last edited by miguel.arce; 11-07-2009 at 10:16 AM.
 
Old 04-08-2012, 05:30 AM   #14
ssuede
LQ Newbie
 
Registered: Apr 2012
Posts: 1

Rep: Reputation: Disabled
Sorry miguel.arce, but you are wrong.

Raid 5 requires a minimum of 3 (THREE) drives, and will tolerate the loss of 1 drive. In the case of a drive failure in an array containing 3 drives, the array can still operate without loss of data (albeit in a degraded state) from the remaining 2 drives.

Additionally, Raid 5 distributes parity information across all disks of the array - not solely on a dedicated disk as you suggest. In the event of a failure of one disk, raid 5 can reconstruct the missing parity information contained on the failed drive from that of the remaining drives, and using that, can also reconstruct the data that was stored on the failed drive.

Even though this post is long dead, it remains one of the top google hits for "mdadm raid 5 fail", and thus I feel I must correct this error for the benefit of others (I'm surprised it has gone uncontested for this long).
 
  


Reply

Tags
disk, error, ext3, ext3fs, failure, mdadm, raid5, recovery


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
mdadm assemble raid5 with loopback mounted disk images Voltar Linux - Server 1 06-11-2011 01:20 PM
Mdadm: reporting 2 drive failures in RAID5 array wolfywolf Linux - Software 3 04-26-2009 11:54 AM
mdadm RAID 5, 6 disks failed. recovery possible? ufmale Linux - Server 10 10-20-2008 08:24 AM
mdadm clean install of four drives overwritting any previous raid5 disk array javaholic Linux - Server 15 10-13-2008 11:05 AM
RAID5 Array Recovery after OS upgrade hazmatt20 Linux - Server 25 04-19-2007 08:41 AM


All times are GMT -5. The time now is 08:50 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration