Broken raid 5 (11 drives in mdadm) -- data recovery/raid reconstruction needed -- ple
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Broken raid 5 (11 drives in mdadm) -- data recovery/raid reconstruction needed -- ple
Hi there:
Thanks for reading this thread and I thank you in advance for any help you can provide.
So this is what happened... I noticed that my MDADM RAID 5 array with drives ordered: /dev/sd[EFGHIABCDKJ]1 reported a failed drive -- /dev/sdb1. I stopped the array and ran smartctl -t long /dev/sdb1 and received a pass.
So I added /dev/sdb1 back to /dev/md0 with mdadm --add. In the process of rebuilding, /dev/sdh1 went offline (the data cable must have been knocked loose while I was moving from FL to MI) and now the array state is degraded. I checked both drives using smartctl again and received 2 passes.
I read advice on some forum about using mdadm -C /dev/md0 /dev/sd[efghiabcdkj]1 but the array resynced with the drive order messed up (sd[abcdefghijk]1 as opposed to sd[efghiabcdkj]1). I tried to mdadm -Af /dev/md0 but got a missing superblock error message.
Came across another post stating that I should do mdadm -C --assume-clean /dev/md0 /dev/sd[efghia MISSING cdkj]1 and then add /dev/sdb1 and then mdadm --assemble /dev/md0 --resync=update but I had a flashdrive plugged in my server which got assigned /dev/sdi1 (OPPS)... Anyways, I pulled the plug quickly, halted the system, removed the flash drive and repeated the steps.
================================================================================
fdisk -l reports:
Disk /dev/hda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 3187 25599546 7 HPFS/NTFS
/dev/hda2 3188 60801 462784455 5 Extended
/dev/hda5 3188 9561 51199123+ 7 HPFS/NTFS
/dev/hda6 9562 28045 148472698+ 83 Linux
/dev/hda7 28046 28835 6345643+ 82 Linux swap / Solaris
/dev/hda8 28836 60801 256766863+ 83 Linux
Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 182402 1465138552+ 83 Linux
Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 182402 1465138552+ fd Linux raid autodetect
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 * 1 182402 1465138552+ 83 Linux
Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 * 1 182402 1465138552+ 83 Linux
Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sde1 * 1 182401 1465136001 83 Linux
Disk /dev/sdf: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdf1 * 1 182401 1465136001 83 Linux
Disk /dev/sdg: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdg1 * 1 182401 1465136001 83 Linux
Disk /dev/sdh: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdh1 * 1 182401 1465136001 83 Linux
Disk /dev/sdi: 1500.3 GB, 1500301910016 bytes
16 heads, 63 sectors/track, 2907021 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Device Boot Start End Blocks Id System
/dev/sdi1 * 1 2907021 1465138552+ 83 Linux
Disk /dev/sdj: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdj1 * 1 182402 1465138552+ 83 Linux
Disk /dev/sdk: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdk1 * 1 182402 1465138552+ 83 Linux
Disk /dev/md0: 0 MB, 0 bytes
2 heads, 4 sectors/track, 0 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md0 doesn't contain a valid partition table
So I am guessing that me inserting the flashdrive messed up the # of heads on all the other drives except the drive that did not get assigned the first mdadm -C because its assignment was taken by the flashdrive.
So.... bottom line is... now the resync is completed (diskstats shows reads but no writes to disk) and I am unable to mount the array. I get a "VFS: Can't find ext3 filesystem on dev md0" message.
Current status: R-Studio reports some data, testdisk is still analyzing my partition, I aborted Raid Reconstructor cause it reports taking like 20 days to complete...
Any hints on how I can recover my data? Any suggestions you can offer will be greatly appreciated cause I am starting a new job and cannot afford to look disorganized despite the bad run of events this past week. Thanks... J
Do you have an mdadm.conf for this array? If it's set to do a certain thing and you tell mdadm to do something else with a /dev/md0 which is already defined in mdadm.conf you could have a problem.
I've had drives mis-ordered in RAID arrays and suffered no ill effects; as long as the drive names themselves are consistent (e.g. it's always the same 3/5/7/10 letters) you shouldn't have problems with that.
I'd say get a copy of Parted Magic and try to assemble/mount the array within it (Parted Magic has mdadm), separate from the installed OS. Also, avoid running a bunch of diagnostic stuff on an array all at once or it'll take forever and possibly return inaccurate results; do one test at a time.
Just out of curiosity, why did you choose RAID-5 with so many drives? RAID-5 is only fault-tolerant to 1 drive, regardless the size of the array. In fact, as the array gets bigger the likelihood of an array failure actually increases probability-wise.
Thanks for your suggestion... I'd give it a try and get back to you. I did have a mdadm.conf file for the array and /dev/md0 was already defined (not by UUID but by drive order).
As for why I used RAID-5, my best guess would be as a cost saving measure (electric and drive cost). In hindsight, I should have gone with RAID6 and a more appropriate filesystem: XFS.
As for why I used RAID-5, my best guess would be as a cost saving measure (electric and drive cost). In hindsight, I should have gone with RAID6 and a more appropriate filesystem: XFS.
If I may suggest RAID-10...
In your case there would be minimal net change in power consumption and you'd have much better fault tolerance and performance (in fact writes can be almost twice as fast). I speak from experience; I started with RAID-5 as well but as my storage needs grew past 3 physical drives the shrinking fault-tolerance made me nervous. Also, mdadm does RAID-10 natively now (that is, in one step), no more stripe-then-mirror.
Of course, all of this depends on exactly what you're using the array for; mine was for a combination of storage and hosting VMs which is a lot of small writes with the occasional big one. I don't regret my decision one bit; the performance is better and I feel safer between backups.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.