LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 07-03-2009, 11:36 AM   #1
shachar
LQ Newbie
 
Registered: Jul 2009
Posts: 2

Rep: Reputation: 0
md device failure (help!)


Hi all,

my md array just crapped out on me. I'm partly responsible, since one of the device in the RAID5 array died some time ago and I neglected to replace it, but I don't think it's the whole problem now.

When I assemble the array I get the following:
root@server:~# mdadm --assemble --verbose /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 3.
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: added /dev/sdb1 to /dev/md0 as 1
mdadm: added /dev/sdd1 to /dev/md0 as 3
mdadm: added /dev/sdc1 to /dev/md0 as 2
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array

(slot 0 is the long-dead drive)
the output of "mdadm --examine" for 2 of the drive (sdc & sdd) is similar and looks like this:
...
State: Clean
Active Devices: 2
Working Devices: 2
Failed Devices: 1
Events: 1923796
...

while the output for sdb looks different:
...
State: active
Active Devices: 3
Working Devices: 3
Failed Devices: 0
Events: 1923787
...

Note the difference in the Events counter and the state. My guess is that the drive is out of sync with the rest.
I tried "mdadm --assemble --force --update=summaries" to bring the stray Events counter up to date per a recommendation I saw in a forum, but this command segfaults.
I tried strace-ing it and it faults right after reading 4K of data from /dev/sdb1.

To summarize: I'm not sure what to do next. I've read in forums that I should try to re-create the array but I fear it will completely destroy the data (not sure what creating an array from previously-array-ed disks does).

Any help will be appreciated. really!

Thanks,

-- Shachar
 
Old 07-03-2009, 05:45 PM   #2
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
Well, for a start, if you have the space, 'dd dd each disk to make sure you have a backup just in case something does go wrong, you can always get back to the current point in time.

Did you put a new disk in the RAID and tried to rebuild it or are you still trying all of this with the failed disk?

Can you not see the content of your RAID? It should still work when only one disk fails.
 
Old 07-04-2009, 03:08 AM   #3
shachar
LQ Newbie
 
Registered: Jul 2009
Posts: 2

Original Poster
Rep: Reputation: 0
I am planning to go and buy a big disk to dd all the block devices onto it before making any changes.

But - as I said, this is not the first disk failure. I had a previous failure and didn't replace it.

I cannot see the contents of the RAID array since it won't start with 2 disks (out of 4). However, I'm not sure this is really a disk failure. from what I can tell it somehow managed to progress in writing to 2 of the 3 disks but one disk was left behind and was marked faulty, even though I don't see any read/write errors on this disk.

My questions is what can be done to "mark" this disk to be fine and with the same Events count, so I can start the array, even with a minor data loss?

Also - I found a post somewhere that says that "mdadm --build /dev/md0 --chunk-size=64 --raid-level=5 --devices /dev/sdb1 /dev/sdc1 /dev/sdd1 missing" worked for him when he tried to recover from a similar (but not identical) condition. Does "build" destroy data, or does it just reset the md superblock metadata? Will my logical volumes survive this?

Thanks
 
Old 07-06-2009, 01:08 AM   #4
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
Sorry for the delay in answering.

You should backup your disks to another using dd. That way, you don't just have one go at getting your data back.

I suggest you read the man pages to make sure what each option does in mdadm.

Best of luck in getting back your data. You should have had backups and you should have changed the disk as soon as it failed or at least had a spare disk that would have started rebuilding the raid as soon as there was a failure. A good option for you might have been RAID6
 
  


Reply

Tags
mdadm


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] FC 10: RAID failure test: grub cannot find boot device jot Fedora 3 03-26-2009 09:11 AM
FAT32 hd failure -plugged usb device Jose Rivas Fedora 2 04-03-2008 03:25 PM
6.2 Boot Failure: Populating /dev with device nodes gpenguin Linux From Scratch 3 03-12-2007 04:52 PM
Controller failure on USB storage device MJatIFAD Linux - Hardware 1 02-22-2006 07:57 AM
device mapper failure booting SuSE 9.1 Joseph Schiller Linux - Distributions 2 08-18-2004 07:54 AM


All times are GMT -5. The time now is 11:12 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration