LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 06-15-2010, 08:04 AM   #1
GregIthaca
LQ Newbie
 
Registered: Jun 2010
Posts: 7

Rep: Reputation: 0
RAID5 two disk failure, botched recovery, need help finding filesystem


Running a server at work (FC3, kernel 2.6.12) which contains nearly all of the company's documents. I've been at this for going on 22 hours, so apologies if I leave out important details -- the whole thing is looking a bit fuzzy right now. MDADM RAID5 with /dev/md0 made from /dev/sda1 /dev/sdb1 and /dev/sdc1. Boot partition is on /dev/hda so I'm able to bring the machine up and down readily despite the raid problems. These are all (except /dev/hda) Seagate Barracuda 7200 160GB SATA drives.

Long story short, I noticed yesterday that one of the RAID5 drives (sdb) was offline with errors, swapped it for one of the hot spares we have, and let it start recreating. But sda failed before it was done. I've gone through a bunch of different permutations of trying to get things to work (swapping out the old sdb and the new sdb, switching SATA controllers, etc.)

Somewhere along the way I did something BAD and probably assembled the array incorrectly, followed by an fsck that showed a LOT of errors. (Damn.)

However, by doing the (A missing B) (C missing B) etc. permutations, I have been able to resurrect a /dev/md0 which, if I do a
Code:
dd if=/dev/md0 count=512 skip=xxxxxx | strings
shows me what looks like a lot of valid data. I can identify pieces of text documents, word files, etc. I'm still holding out a glimmer of hope that this means I'm not royally screwed.

The problem is, even though dumpe2fs works pretty well, e2fsck doesn't seem to be able to find valid superblocks no matter where I tell it to look. I'm trying things like -b 8192000 or 8192001 or 32768000/1 etc. (Not sure why all the docs show the -b argument using an odd number, while the dumpe2fs shows an even one, so I experimented.) Whatever I do, it just says 'invalid argument' and:
Code:
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
etc.

Can anyone give me suggestions for what I might try to recover/rebuild the filesystem?

Sorry I don't have more code examples, the system is booted single-user right now and I can't get files on or off of it. I'm pretty sure no LVM is involved here because /etc/fstab just lists
Code:
/dev/md0  /server ...
Very scared here... our last full backup seems to have been in January.

Thanks,
Greg
 
Old 06-15-2010, 09:36 AM   #2
never say never
Member
 
Registered: Sep 2009
Location: Indiana, USA
Distribution: SLES, SLED, OpenSuse, CentOS, ubuntu 10.10, OpenBSD, FreeBSD
Posts: 195

Rep: Reputation: 37
Ouch!

I am afraid I don't have any real advice for you regarding recovering the RAID 5 array quickly or easily. From what you have described and my initial read through, I would guess that you are able to find individual stripes of data, but since the superblocks are hosed (likely do to swapping the disks back and fourth) and at least two of the drives are hosed, I would not hold out for a full recovery.

My advice is build a new server (or at least a new RAID Array) and restore from the last backup. Hopefully this will allow the business to function at some level. Then I would image all the drives in the raid array and attempt to restore the RAID array for the imaged drives on a backup server (Offline). Then as you are able to recover data you can push it to the live server.

Also by using imaged drives you can leave the originals in their current state, and you don't risk destroying any data that may be on them. Depending on the value (how critical) the missing data is, they may need to be sent to a data recovery company and you don't want to risk destroying anything.

Do you have any incremental, or differential backups since January?

Good Luck, if you find a solution please let us know.
 
Old 06-15-2010, 09:57 AM   #3
GregIthaca
LQ Newbie
 
Registered: Jun 2010
Posts: 7

Original Poster
Rep: Reputation: 0
Well, I'm going to start on something like that now. Here are my planned steps:

1. Before I even shut down the server or turn off the drives (in case they become flakier with power cycles), I'm going to plug in a large external drive, make an ext2 file system there, and dd if=/dev/sda of=/newdrive/image_a.bin etc.

2. Pull all the raid array drives. Replace with two larger drives in RAID1 configuration.

3. Restore from backups. Things are spotty. We have a full from January, a differential from mid-February, but after that the tape drive needed a cleaning and everyone just ignored the errors, left the tapes in, and stuff got overwritten. In addition, I have a mirror of some of the data on my own home server (which I can't access from the office, only the other way around) and things on people's local hard drives, laptops, etc. I was running a full backup last night when the reconstruction failed; I'm guessing that if I *hadn't* done that, the drive would have finished reconstruction and I wouldn't be where I am now. The array went offline partway through the process.

4. Get a quote from a recovery company.

I'm curious what folks know about the striping. I was looking at sections of 262144 (512*512) bytes and getting a lot of valid data from that section (though I didn't count it). md0 was probably set up with the defaults; dumpe2fs lists 32768 as the block size. Is that also likely to be my chunk size?

Still curious why dumpe2fs can find valid superblocks, and e2fsck can't, if anyone knows.

Greg
 
  


Reply

Tags
disk, e2fsck, failure, raid5, reconstruction, superblock, two


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Failed RAID5 disk array, questions about mdadm and recovery HellesAngel Linux - General 13 04-08-2012 06:30 AM
How to monitor LaCie 4big Quadra RAID5 box for disk failure Vanyel Linux - Hardware 0 05-27-2010 05:12 PM
Filesystem recovery after hard drive failure bl0tt0 Slackware 4 09-22-2008 07:30 PM
multiple disk failure in RAID5 and dd gorets Linux - Hardware 1 07-22-2005 02:51 PM
multiple disk failure in RAID5 and dd gorets Linux - Software 1 07-21-2005 10:25 AM


All times are GMT -5. The time now is 01:59 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration