LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 03-02-2012, 11:32 AM   #1
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Rep: Reputation: Disabled
Unhappy Really need help recovering corrupted software RAID filesystem


Hi all,

I really could use some help with this one!!

I'm having a nightmare trying to repair a broken software RAID5 array. One disk of 7 died a few weeks ago, I replaced it today and started the resync. All fine till mdadm found a bad sector on the new disk and threw it out. I tried to remove it then add it again with mdadm --manage --add, the system hung and I was forced to reboot. In the process it completely killed the array (showed 'inactive' in /proc/mdstat) and I couldn't start it at all, even after removing the new disk to try to push back to the old state. In the end the only solution I could find was to recreate the array using:

mdadm --create /dev/md0 --assume-clean --level=5 --verbose --raid-devices=7 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 missing

mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 512K
mdadm: layout defaults to left-symmetric
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdb1 appears to contain an ext2fs file system
size=-1163822592K mtime=Fri Mar 2 10:35:48 2012
mdadm: /dev/sdb1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Fri Mar 2 17:30:55 2012
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdc1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Fri Mar 2 17:30:55 2012
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdd1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Fri Mar 2 17:30:55 2012
mdadm: layout defaults to left-symmetric
mdadm: /dev/sde1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Fri Mar 2 17:30:55 2012
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdf1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Fri Mar 2 17:30:55 2012
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdg1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Fri Mar 2 17:30:55 2012
mdadm: size set to 1953511936K
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

After that the RAID array came back, degraded as before, but I can't mount it. The 'ext2fs' it automatically detected is wrong, the array was created as ext4.

fsck gives me a superblock error:

fsck.ext4 /dev/md0

e2fsck 1.41.14 (22-Dec-2010)
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/md0

I still have 6 good disks, but the metadata seems to be completely messed up. Is there any hope of recovering the data left on the array?
 
Old 03-02-2012, 11:53 AM   #2
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Is there anyway I can tell mdadm to create the array with ext4 instead? Then maybe I could fix it with fsck?
 
Old 03-07-2012, 08:09 AM   #3
ba.page
Member
 
Registered: Feb 2012
Location: Canada
Distribution: Scientific,Debian
Posts: 35

Rep: Reputation: 7
The array device itself doesn't care what filesystem it's formatted as.
Think of md0 as you would hda - just a block device for you to format using ext3, ext4, xfs, etc...
So, you should just be able to mount /dev/md0 on a folder and mount will automatically detect the filesystem (assuming it's native like ext4).

what does it say when you:
Code:
mount /dev/md0 /some/folder
I would recommend a couple things:
1) fdisk each disk removing all partitions
2) rebuilding the array as raid 5, with 6 members, and then add a 7th as a hot spare.
3) after building the array, format it like so:
Code:
mkfs.ext4 /dev/md0
4) then mount it as you would any other disk:
Code:
mount /dev/md0 /some/folder
 
Old 03-07-2012, 09:19 AM   #4
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Hi, thanks a lot for the reply!

I can't mount the filesystem. mount asks me to specify the fs type, it was originally ext4 but whether I specify ext2, ext3 or ext4 I always get the same message:

mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

dmesg reveals that it couldn't successfully identify any filesystem after also trying XFS etc. The RAID array seems to be back up and in action, but I really need to repair the old ext4 filesytem to get at the old files. Won't mkfs reformat the disk and make recovery of the original filesystem even harder?

My previous comment wasn't particularly well thought through, but does mdadm write metadata for the filesystem when it creates the array? Is there any way to repair the old ext4 filesystem like that, or are there any specialist tools for recovering the old filesystem that you could recommend?
 
Old 03-07-2012, 09:41 AM   #5
ba.page
Member
 
Registered: Feb 2012
Location: Canada
Distribution: Scientific,Debian
Posts: 35

Rep: Reputation: 7
yes, mkfs will destroy the data, that's why I posted it at the end of a set of rebuild instructions.

the mount command will detect the filesystem automatically; you shouldn't be seeing wrong fs type errors unless you've specified an incorrect fs type, or it's not a natively supported fs.
fsck /dev/md0 will also detect the filesystem automatically in the same way mount does.

please post the results of:
Code:
mdadm -D /dev/md0
 
Old 03-07-2012, 10:49 AM   #6
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
/dev/md0:
Version : 1.2
Creation Time : Fri Mar 2 17:48:18 2012
Raid Level : raid5
Array Size : 11721071616 (11178.09 GiB 12002.38 GB)
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 7
Total Devices : 6
Persistence : Superblock is persistent

Update Time : Fri Mar 2 17:48:18 2012
State : clean, degraded
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : archive:0 (local to host archive)
UUID : 3f9b90e2:cf0ed0f0:22b36f1d:14c30d6c
Events : 0

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
5 8 97 5 active sync /dev/sdg1
6 0 0 6 removed


fsck gives:

fsck from util-linux 2.19
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>

I have to leave the office now, but any further suggestions I can get back to first thing tomorrow if you have any ideas?
 
Old 03-07-2012, 11:37 AM   #7
ba.page
Member
 
Registered: Feb 2012
Location: Canada
Distribution: Scientific,Debian
Posts: 35

Rep: Reputation: 7
burn a copy of system rescue cd and boot off of that. this will eliminate the OS as a variable here.

reassemble the array:
Code:
sudo mdadm --assemble --auto=yes /dev/md0 /dev/sdb1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1
check it's rebuild progress:
Code:
cat /proc/mdstat
check is fs:
Code:
fsck /dev/md0
do NOT fsck any of the component members of the array (ie: fsck /dev/sdb1)

mount the array and check for your data:
Code:
mkdir /mnt/md0
mount /dev/md0 /mnt/md0
ls /mnt/md0
 
Old 03-07-2012, 02:07 PM   #8
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Thanks for the tip, I'll try that with knoppix tomorrow. Shouldn't I include information about the failed 7th disk somewhere, though? E.g.

--raid-devices=7 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 missing
 
Old 03-07-2012, 02:58 PM   #9
ba.page
Member
 
Registered: Feb 2012
Location: Canada
Distribution: Scientific,Debian
Posts: 35

Rep: Reputation: 7
yes you should, but none of your posts actually name the 7th device ('missing' isn't a valid name).
 
Old 03-08-2012, 11:01 AM   #10
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Ok, I just tried that. The assembly goes well and I recover a running array with one missing device as before (contents of /proc/mdstat are as in my earlier post) but when I try fsck /dev/md0 I get the same error about invalid superblock as before. It really looks like the filesystem has been corrupted somewhere along the line?
 
Old 03-08-2012, 11:32 AM   #11
ba.page
Member
 
Registered: Feb 2012
Location: Canada
Distribution: Scientific,Debian
Posts: 35

Rep: Reputation: 7
Is it possible that you ran fsck against the array when it was mounted?
Is it possible that you ran fsck against one of the array members while the array was assembled and running?

If either is true, you may have corrupted your array.
 
Old 03-09-2012, 10:28 AM   #12
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
No, I haven't been able to mount the array so I couldn't run fsck on it while it was mounted. I haven't fsck'd any of the individual devices either (checked .bash_history to make sure).

I'm pretty sure that the array is in some way corrupted, though. Question is, is there anything I can do to fix it or at least identify the problem (superblock, mbr, filetable...)?
 
Old 03-09-2012, 10:39 AM   #13
ba.page
Member
 
Registered: Feb 2012
Location: Canada
Distribution: Scientific,Debian
Posts: 35

Rep: Reputation: 7
like you mentioned in your first post:
Quote:
After that the RAID array came back, degraded as before, but I can't mount it. The 'ext2fs' it automatically detected is wrong, the array was created as ext4.
that means you're able to start the array, but you can't mount it.
It therefore follows that you've lost the filesystem.
 
Old 03-09-2012, 11:40 AM   #14
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Yes, my question is really how this could have happened. Is it possible that the array was rebuilt using the wrong stripe size or something due to corrupt metadata, so that it can't read the filesystem even though the array is running again, or that there's a problem with the parted partition table, or something to do with the ext4 filesystem? I really don't know enough about the inner workings to pin down the problem properly so that I can look for an answer. I'm currently running a testdisk scan (which will probably take all weekend) but if I then try and fix the partition table when it turns out some of my RAID settings are wrong then I guess I could do more damage than good?
 
Old 03-26-2012, 04:12 AM   #15
m_dev34
LQ Newbie
 
Registered: Mar 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Well, for the purposes of posterity, and hopefully to help out someone who finds themselves in a similar situation in the future, here's what seems to have happened:

1) The crash corrupted the RAID metadata and prevented me from re-assembling the array

2) I eventually resorted to mdadm --create, but made a stupid mistake (I included the partitions, i.e. sdb1, sdc1... instead of the devices sdb, sdc which made up the original array) so new metadata was written, apparently bang in the middle of the ext4 superblock!

3) Lots of stress and reading and a few tips from this forum.

Eventually I tried r-studio. The hex-editor let me pin down exactly where my filesystem was (I recognized the superblock from the ext4 documentation - ext4.wiki.kernel.org), it also let me double-check the block-size (based on how big the file fragments were on each member disk), disk order (a fragment of a text file on sdb continued at the same block on sdc and so on), parity layout (by checking the start point of consecutive parity blocks) etc. The virtual RAID array feature didn't work for some reason, just scanned for 4 days and returned a load of enumerated and broken file fragments, but once I used the hex-editor on the mdadm /dev/md0 software RAID array device it was pretty obvious that something was wrong with its configuration (the first block on the device came after the ext4 superblock that I'd found with r-studio!)

Having rebuilt the array using mdadm again with the correct device names this time I could finally see the (now corrupt) ext4 filesystem using fsck.ext4. The very long process of fixing the filesystem using fsck is now underway, so fingers crossed...

I would say the most important lessons learned are: if you're thinking of trying to recover data using mdadm --create, make sure you tried absolutely everything else first (including recovery tools like r-studio), and if you really have to give it a go then make certain you know all of your original RAID parameters (especially filesystem type, block size, disk order, offset and parity layout) before you start as it's a lot easier than tracking them down afterwards.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Software required for recovering data from corrupted pen drive shakes82 Linux - Software 2 07-09-2010 06:11 AM
recovering data from a software raid partition f14f21 Linux - Newbie 3 11-06-2008 06:29 AM
recovering software raid - disk marked as failed rjstephens Linux - General 9 06-10-2008 03:29 AM
Recovering software 5 RAID wesleywest Red Hat 1 02-09-2005 02:35 PM
Recovering Software RAID Mirror rootking Linux - Software 1 11-01-2004 07:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 10:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration