LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-28-2015, 04:33 AM   #1
flangemonkey
LQ Newbie
 
Registered: Aug 2007
Location: Newcastle UK
Distribution: Arch, Gentoo, Xubuntu
Posts: 11

Rep: Reputation: 0
Unhappy Salvaging a RAID 5 array with 2 failed drives


I'll try and be concise, and please don't fill the thread with "you should have better backups" messages, most of the important stuff is safe, it's the "I wasn't too bothered about this stuff until it broke" stuff I want to recover...

Anyway...

I have (had?) an mdraid RAID 5 array across 4 3TB Seagate Barracuda drives.

The array booted out a drive, SMART data for the drive was scrambled, data has read errors and I/O errors (we'll call this one drive 1)

After a reboot I could hear a drive clicking (fubar) and removed it, it wasn't the drive that was kicked out... (we'll call this drive 4)

I now have this:

Drive 1 - Random read errors and I/O issues
Drives 2 & 3 - Array members, recently scrubbed, should be fine
Drive 4 - fubar

I am cloning what I can from drive 1, with dd and a small block size - the first errors on read seem to have been about 1.5TB through the clone and not particularly numerous.

So, before I ham-fistedly try and get mdadm to have a go at rebuilding, is there any special command options I should enforce, or tricks to try..?

I do want to try and salvage the array, although I am also prepared to flatten it if worst comes to worst...

further thought: should I put in a blank spare straight away or try and initialise in a degraded state?
 
Old 08-28-2015, 05:00 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
What do you mean by "scrambled"?
 
Old 08-28-2015, 05:15 AM   #3
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,617

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
Worth a shot, but do not hold high hopes....

To my knowledge: if a RAID5 array with any number of drives loses one, no data is lost. It rebuilds the missing data on the fly from the crc information on the remaining drives.
If a RAID5 array loses two drives, ALL data is lost: the array has lost sync and balance, there is not adequate information to rebuild lost data or even maintain the array.

RAID6 double up the crc data and allows you to run with two drives down, though with two down the performance drags significantly.

I would do what you are doing, try to re-create the last drive that dropped (or was pulled for errors) and see if the array will rebuild. If it will, you are VERY lucky, but it is worth a shot.
If it will not rebuild (what I would expect) then the normal procedure would be to take the surviving good drives and some replacement drives and start clean. Build a new array, install new and restore data from backups, and drive on.

Frankly, trying to recover failed arrays is not something I have spent a lot of time on. If I have a good backup, it is hard to imagine that I would WANT to spend much time. Someone with more experience in that may chime in with better advice for your current case.

I do hope that this helps.
 
Old 08-28-2015, 05:27 AM   #4
flangemonkey
LQ Newbie
 
Registered: Aug 2007
Location: Newcastle UK
Distribution: Arch, Gentoo, Xubuntu
Posts: 11

Original Poster
Rep: Reputation: 0
smartctl couldn't read past random points (I don't recall the exact error, but it was along the lines of couldn't read past a point), although it does report now
 
Old 08-28-2015, 06:36 AM   #5
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
Quote:
Originally Posted by wpeckham View Post
To my knowledge: if a RAID5 array with any number of drives loses one, no data is lost. It rebuilds the missing data on the fly from the crc information on the remaining drives.
If a RAID5 array loses two drives, ALL data is lost: the array has lost sync and balance, there is not adequate information to rebuild lost data or even maintain the array.
Exactly this. RAID5 prevents downtime in case of one drive failing. With more drives failing you are pretty much doomed.
 
Old 08-29-2015, 10:39 AM   #6
S.Haran
LQ Newbie
 
Registered: Jun 2014
Location: Boston USA
Posts: 19

Rep: Reputation: Disabled
flangemonkey, you are on the right track attempting to image Drive 1. But I would suggest the Linux ddrescue command for the job as it is designed to operate on failing drives.

I've worked on many similar RAID recoveries and can say a 99+ percent recovery is often achievable. Your level of success of will depend on the quality of the image you make and the degree to which the failing drive is out of sync.

To assist further post the output of mdadm --examine against the RAID member partitions. So we can assess the state of your RAID5.
 
1 members found this post helpful.
Old 08-29-2015, 05:03 PM   #7
flangemonkey
LQ Newbie
 
Registered: Aug 2007
Location: Newcastle UK
Distribution: Arch, Gentoo, Xubuntu
Posts: 11

Original Poster
Rep: Reputation: 0
S.Haran you are indeed wise! I vaguely remembered ddrescue when you mentioned it and read a couple of resources on it - it is much faster than dd...

This one was quite interesting

I have set a ddrescue running and will post back how I get on.

Thanks
 
Old 09-02-2015, 06:03 AM   #8
flangemonkey
LQ Newbie
 
Registered: Aug 2007
Location: Newcastle UK
Distribution: Arch, Gentoo, Xubuntu
Posts: 11

Original Poster
Rep: Reputation: 0
Update!

Right... ddrescue finished with a total data loss of 90MB.

I now have 3 out of 4 drives from the array.

The events count for the 3 drives differ, but not massively:

Code:
# mdadm --examine /dev/sd[abc]1 | grep Event
         Events : 351397
         Events : 355232
         Events : 355232
I have tried to assemble the array:

Code:
# mdadm --assemble /dev/md1 --scan --force
mdadm: forcing event count in /dev/sda1(0) from 351397 upto 355232
mdadm: Marking array /dev/md1 as 'clean'
mdadm: /dev/md1 assembled from 3 drives - not enough to start the array.
[root@anu phill]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md1 : inactive sda1[0](S) sdc1[2](S) sdb1[1](S)
      8784093337 blocks super 1.2

unused devices: <none>
And all of the rest of the information I have found:

fdisk -l /dev/sd{a,b,c,d}

Code:
Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9C113EDE-CFE9-4F87-8D28-ED138A3DEA32
Device     Start        End    Sectors  Size Type
/dev/sda1   2048 5856326416 5856324369  2.7T Linux RAID

Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: B21FD950-DDF3-441A-AD3E-4E7C09253920
Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 5856326416 5856324369  2.7T Linux RAID

Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 230BDB8E-FE1C-4392-BB06-1424686031DB
Device     Start        End    Sectors  Size Type
/dev/sdc1   2048 5856326416 5856324369  2.7T Linux RAID

Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
mdadm –examine /dev/sd{a,b,c}

Code:
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 294e5cbd:82264ac6:4e11d1fd:9295556f
           Name : archiso:md1
  Creation Time : Wed Jul  3 00:10:15 2013
     Raid Level : raid5
   Raid Devices : 4

Avail Dev Size : 5856062225 (2792.39 GiB 2998.30 GB)
     Array Size : 8784092928 (8377.16 GiB 8994.91 GB)
  Used Dev Size : 5856061952 (2792.39 GiB 2998.30 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=273 sectors
          State : active
    Device UUID : 6dade4db:030d150a:771eb5f1:2f50eec3

    Update Time : Thu Aug 20 16:06:16 2015
       Checksum : 3a255bb9 - correct
         Events : 351397

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)


/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 294e5cbd:82264ac6:4e11d1fd:9295556f
           Name : archiso:md1
  Creation Time : Wed Jul  3 00:10:15 2013
     Raid Level : raid5
   Raid Devices : 4

Avail Dev Size : 5856062225 (2792.39 GiB 2998.30 GB)
     Array Size : 8784092928 (8377.16 GiB 8994.91 GB)
  Used Dev Size : 5856061952 (2792.39 GiB 2998.30 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=273 sectors
          State : clean
    Device UUID : 0842eeef:9304704d:e9c84c39:f13e59d3

    Update Time : Mon Aug 24 22:33:45 2015
       Checksum : e89433cd - correct
         Events : 355232

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 1
   Array State : .AA. ('A' == active, '.' == missing, 'R' == replacing)


/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 294e5cbd:82264ac6:4e11d1fd:9295556f
           Name : archiso:md1
  Creation Time : Wed Jul  3 00:10:15 2013
     Raid Level : raid5
   Raid Devices : 4

Avail Dev Size : 5856062225 (2792.39 GiB 2998.30 GB)
     Array Size : 8784092928 (8377.16 GiB 8994.91 GB)
  Used Dev Size : 5856061952 (2792.39 GiB 2998.30 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=273 sectors
          State : clean
    Device UUID : 858f9dfb:ae734ca7:216839e5:b53f24f8

    Update Time : Mon Aug 24 22:33:45 2015
       Checksum : 1ed78d6c - correct
         Events : 355232

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 2
   Array State : .AA. ('A' == active, '.' == missing, 'R' == replacing)
And finally;
Code:
# mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 3
    Persistence : Superblock is persistent

          State : inactive

           Name : archiso:md1
           UUID : 294e5cbd:82264ac6:4e11d1fd:9295556f
         Events : 351397

    Number   Major   Minor   RaidDevice

       -       8        1        -        /dev/sda1
       -       8       17        -        /dev/sdb1
       -       8       33        -        /dev/sdc1
Does anyone know how I can assemble with a missing drive? (when I tried I got this: )

Code:
mdadm --assemble /dev/md1 /dev/sd{a,b,c}1 missing 
mdadm: cannot open device missing: No such file or directory
mdadm: missing has no superblock - assembly aborted
Which was the same as this: http://unix.stackexchange.com/questi...md-raid5-array

TIA
 
Old 09-02-2015, 03:04 PM   #9
S.Haran
LQ Newbie
 
Registered: Jun 2014
Location: Boston USA
Posts: 19

Rep: Reputation: Disabled
Well 90MB seems like a lot, did you try multiple passes with ddrescue?

The Event count difference looks significant. Not a good sign.

You can't use missing with mdadm --assemble.

It's a bit risky in that it wipes out your current mdadm superblocks but you can try a --create

Quote:
mdadm -v --create /dev/md1 --assume-clean --level=5 --raid-devices=4 --chunk=128 --metadata=1.2 /dev/sda1 /dev/sdb1 /dev/sdc1 missing
 
Old 09-02-2015, 03:20 PM   #10
flangemonkey
LQ Newbie
 
Registered: Aug 2007
Location: Newcastle UK
Distribution: Arch, Gentoo, Xubuntu
Posts: 11

Original Poster
Rep: Reputation: 0
In the end

Code:
mdadm -A /dev/md1 /dev/sd{a,b,c}
Brought up the array degraded and it's rebuilding now...

And as a percentage 90MB is nowt... There'll be a few things that won't work later though.

I invoked ddrescue (amateurishly) without the logfile option and as the drive was putting out a lot of I/O errors I was happy to only lose 90MB - especially as I think another drive in the array sounds poorly... smartctl can't get power on hours...

Last edited by flangemonkey; 09-02-2015 at 03:22 PM. Reason: Code was wrong!
 
Old 09-03-2015, 03:11 AM   #11
flangemonkey
LQ Newbie
 
Registered: Aug 2007
Location: Newcastle UK
Distribution: Arch, Gentoo, Xubuntu
Posts: 11

Original Poster
Rep: Reputation: 0
Another drive failed during the rebuild... joy.

 
Old 09-03-2015, 06:06 AM   #12
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,617

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
oops. Too late

Ouch. Have you run self-tests on the controller?
This is starting to not look good.

Last edited by wpeckham; 09-03-2015 at 06:08 AM.
 
Old 09-07-2015, 01:26 AM   #13
flangemonkey
LQ Newbie
 
Registered: Aug 2007
Location: Newcastle UK
Distribution: Arch, Gentoo, Xubuntu
Posts: 11

Original Poster
Rep: Reputation: 0
To summarise:

We started with a 4 drive RAID 5 array;

Disk 1: (Seagate) I/O errors and was cloned 90MB data loss
Disk 2: (Seagate) I/O errors, 1 URE (unrecoverable read error) cloned, 4096 bytes data loss
Disk 3: (Seagate) Should I worry about this one..?
Disk 4: (Seagate) Fails to spin up at all

I replaced drives 1 and 4 with like for like Seagate drives, and drive 2 with a Toshiba; I will be adding another Toshiba into the mix to make it RAID 6 at some point now...

Now - to the present...

fsck says the filesystem is a mess - lots of shared inodes and several corrupt files... I suspect there will be some more data loss, I've already lost a few directories of files and suspect the fsck repair will be the point where I lose a few more...

For those who are in the same boat, here is how I did some of the stages:

Cloning drives:

Code:
ddrescue /dev/sdX /dev/sdY logfile
(where sdX is the failing drive and sdY the new one
Copying a filesystem layout / partition table for the blank drive so the partition was the same size as the other members:

Code:
sfdisk -d /dev/sdL > partition1.txt
sfdisk -d /dev/sdM > partition2.txt
sfdisk /dev/sdL < partition1.txt 
#sdL is new drive, sdM is drive you want to copy
(from http://linuxaria.com/pills/how-to-cl...ux-with-sfdisk) also this is not the quickes command way, but it does give you a backup!

Getting mdadm to assemble the array:

Code:
mdadm --assemble /dev/mdN /dev/sd{a,b,c,d}1 --force
(where the mdN is the ID of your array and {a,b,c,d} are the drives in your array and assuming you are using parition 1 - (note, I assembled it first with a,b,c in a degraded state and then mdadm -A /dev/sdd1 to add the blank drive and rebuild.

to check your files, mount the array:

Code:
mount /dev/mdN /mountpoint
and have a look around

to check your filesystem, it should not be mounted (very important!!!)

Code:
fsck /dev/mdN
if you want to check without making changes add -n after the space after fsck, if you want it to just fix the errors it finds add a -y

hope this helps someone - and don't forget to scrub your arrays - not that it helped me...

Last edited by flangemonkey; 09-07-2015 at 01:27 AM. Reason: code was wrong...
 
  


Reply

Tags
degraded, raid5, rebuild



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Failed RAID array daisychick Linux - Software 18 12-09-2014 07:56 PM
[SOLVED] Rebuilding a RAID 5 Array with the Proper Drives zwabbit Linux - General 1 02-16-2012 01:46 AM
2 "failed" drives in a 3-disk RAID-5 array anon195 Linux - Server 0 04-23-2010 02:07 PM
Need help moving onto a larger RAID array -- drives getting full! lowsy Linux - General 1 10-28-2009 09:58 PM
Raid 5 Array Failed Zhadnost Linux - Server 1 07-11-2009 10:39 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 07:43 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration