LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 12-16-2021, 06:10 PM   #1
T-Prime3797
LQ Newbie
 
Registered: Dec 2021
Posts: 3

Rep: Reputation: Disabled
RAID Failure


Good Day,

My RAID has failed, and I'm not sure what's going on. mdadm is giving me strange information (see below):


First off, this says raid0 when it should be raid5
Code:
/dev/md127:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 4
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 4

              Name : ubuntu-server:Data_RAID
              UUID : e53ba358:1a0b2928:60fa66ce:d96f4138
            Events : 5967434

    Number   Major   Minor   RaidDevice

       -       8       64        -        /dev/sde
       -       8        0        -        /dev/sda
       -       8       48        -        /dev/sdd
       -       8       16        -        /dev/sdb

This one says the first device in the array is missing.
Code:
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : e53ba358:1a0b2928:60fa66ce:d96f4138
           Name : ubuntu-server:Data_RAID
  Creation Time : Tue Jul 23 01:24:47 2019
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
     Array Size : 5860147200 (5588.67 GiB 6000.79 GB)
  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : 1860eee9:458f6d4d:afa39c8e:07fb048a

Internal Bitmap : 8 sectors from superblock
    Update Time : Thu Nov 25 14:03:43 2021
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : c94d6a9 - correct
         Events : 5967434

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)

This one says the first and third devices are missing.
Code:
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : e53ba358:1a0b2928:60fa66ce:d96f4138
           Name : ubuntu-server:Data_RAID
  Creation Time : Tue Jul 23 01:24:47 2019
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
     Array Size : 5860147200 (5588.67 GiB 6000.79 GB)
  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : 6f797783:0f21ab6a:69266265:14c4635b

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Dec  5 00:57:01 2021
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : 6b8d409a - correct
         Events : 5967440

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : .A.A ('A' == active, '.' == missing, 'R' == replacing)

This one says the first and 3 devices are missing, and has bad blocks.
Code:
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x9
     Array UUID : e53ba358:1a0b2928:60fa66ce:d96f4138
           Name : ubuntu-server:Data_RAID
  Creation Time : Tue Jul 23 01:24:47 2019
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
     Array Size : 5860147200 (5588.67 GiB 6000.79 GB)
  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : b2dd7d4b:a524b4b6:f80cb48e:b4c96bc6

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Dec  5 00:57:01 2021
  Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
       Checksum : 13f69538 - correct
         Events : 5967440

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : .A.A ('A' == active, '.' == missing, 'R' == replacing)

And finally, this says all the devices are active.
Code:
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : e53ba358:1a0b2928:60fa66ce:d96f4138
           Name : ubuntu-server:Data_RAID
  Creation Time : Tue Jul 23 01:24:47 2019
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
     Array Size : 5860147200 (5588.67 GiB 6000.79 GB)
  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : 23448303:be388788:1628ea60:0186e328

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Jul 27 00:34:58 2021
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : d173ef57 - correct
         Events : 53434

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
I don't understand what's happening. Can someone help?

Thank you.
 
Old 12-18-2021, 12:36 PM   #2
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
First off, what commands are you using to get each of those outputs?

I cannot even fathom what may give those divergent results, so please update the post with the command used for each.

Without the commands we cannot even hope to know the answer.

It is possible that you have had one drive in a failed state for some time and a second failure took the array offline (and makes it unrecoverable). We need more info to know.

Also please post the output of
Code:
cat /proc/mdstat
 
Old 12-18-2021, 05:39 PM   #3
Crippled
Member
 
Registered: Sep 2015
Distribution: MX Linux 21.3 Xfce
Posts: 595

Rep: Reputation: Disabled
Do you have a hardware RAID or a software RAID? If you have a hardware RAID just replace the defective drives and the RAID will rebuild it self. If you have a software RAID it's trashed.
 
Old 12-18-2021, 06:15 PM   #4
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,138

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
md stops writing to the device when it fails, so sdd failed first, all devices were good up to that point. It was device 0.

sde failed next. device 0 was failed since Nov 25. sde was device 2.

sda and sdb both noted the missing drives (0 and 2) on Dec 5, which is when sde and the RAID failed. They are devices 3 and 1.

Ideally, you should set up monitoring with mdadm in monitor mode and have it email or something when a drive dies.

No idea why it now thinks the array is RAID 0.
 
1 members found this post helpful.
Old 12-19-2021, 09:21 AM   #5
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,800

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by computersavvy View Post
First off, what commands are you using to get each of those outputs?

I cannot even fathom what may give those divergent results, so please update the post with the command used for each.

Without the commands we cannot even hope to know the answer.
The first appears to be the output of something like:
Code:
mdadm --query --detail /dev/mdNNN
(I haven't figured out what resulted in the remainder, though.)

Update:
Code:
mdadm --examine /dev/sd<A><N>
(It's been a long time since I've had to dig that deeply into an md device.)

Last edited by rnturn; 12-19-2021 at 01:00 PM.
 
Old 12-19-2021, 10:05 AM   #6
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,681

Rep: Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894
Quote:
Originally Posted by rnturn View Post
(I haven't figured out what resulted in the remainder, though.)
mdadm --examine /dev/sdb
 
Old 12-19-2021, 01:02 PM   #7
T-Prime3797
LQ Newbie
 
Registered: Dec 2021
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by smallpond View Post
md stops writing to the device when it fails, so sdd failed first, all devices were good up to that point. It was device 0.

sde failed next. device 0 was failed since Nov 25. sde was device 2.

sda and sdb both noted the missing drives (0 and 2) on Dec 5, which is when sde and the RAID failed. They are devices 3 and 1.

Ideally, you should set up monitoring with mdadm in monitor mode and have it email or something when a drive dies.

No idea why it now thinks the array is RAID 0.
Okay, that makes sense to me. Unfortunately I was out of the country when all this happened, so even if I had been notified, I was in no position to do anything about it.

Right now I'm using 'dd' to pull data from the 4 drives in hopes I can rebuild the information at least long enough to recover some of the data. What are the odds of that actually working?
 
Old 12-19-2021, 01:06 PM   #8
T-Prime3797
LQ Newbie
 
Registered: Dec 2021
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by computersavvy View Post
First off, what commands are you using to get each of those outputs?

I cannot even fathom what may give those divergent results, so please update the post with the command used for each.

Without the commands we cannot even hope to know the answer.

It is possible that you have had one drive in a failed state for some time and a second failure took the array offline (and makes it unrecoverable). We need more info to know.

Also please post the output of
Code:
cat /proc/mdstat
rnturn & michaelk are correct in thier deductions of the commands I used. /proc/mdstat states:
Code:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
 
Old 12-19-2021, 06:37 PM   #9
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
That output from /proc/mdstat is not surprising since the raid array is failed and not active.

I appreciate the confirmation on the commands.

I do not envy you the recovery process as it will certainly be tedious at best.

If attempting to use dd to recover the data, I would not suggest using anything other than /dev/sde (the last to fail) for recovery since that lasted a lot longer than the other and the first one to fail will have data that is way out of date.

If you can recover a good image of that one and write the data to a new drive and thus get the array back online in a still degraded state then you can add in a drive to replace the first one that failed and once it has then rebuilt the data you may have a fully functioning raid array
 
Old 12-19-2021, 07:05 PM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
You may find this an interesting read - especially the bit about using overlay files to save stressing dodgy drives. Also note the preference for ddrescue rather than dd where an image is actually required.

Lotsa luck.
 
Old 12-19-2021, 07:49 PM   #11
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,681

Rep: Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894Reputation: 5894
I don't know understand why the status is different between the disks.

With RAID 5 the data is stripped across and requires at least 3 disks. As far as I know you need at least 2 disks to run RAID 5 in degraded mode.

testdisk can recover data from a RAID.
 
Old 12-20-2021, 05:40 AM   #12
lvm_
Member
 
Registered: Jul 2020
Posts: 912

Rep: Reputation: 314Reputation: 314Reputation: 314Reputation: 314
mdadm --examine reports data stored on individual devices. Once device fell out of array md naturally stops writing to it so different data on different devices is perfectly ok and lets you track the order in which array collapsed: device with AAAA was the first to go followed by .AAA, and after that array stopped. Since event counts on all devices are pretty close, array should be [almost] ok after you force-assemble it. Run fsck and checkarray after that.
 
Old 12-20-2021, 10:15 AM   #13
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Quote:
Originally Posted by michaelk View Post
I don't know understand why the status is different between the disks.

With RAID 5 the data is stripped across and requires at least 3 disks. As far as I know you need at least 2 disks to run RAID 5 in degraded mode.

testdisk can recover data from a RAID.
His raid5 array was 4 disks. Raid 5 can tolerate only one drive failure and he has had 2 drives fail.
 
Old 12-20-2021, 10:27 AM   #14
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Quote:
Originally Posted by lvm_ View Post
mdadm --examine reports data stored on individual devices. Once device fell out of array md naturally stops writing to it so different data on different devices is perfectly ok and lets you track the order in which array collapsed: device with AAAA was the first to go followed by .AAA, and after that array stopped. Since event counts on all devices are pretty close, array should be [almost] ok after you force-assemble it. Run fsck and checkarray after that.
The event count on /dev/sdd is tiny compared to the other three. /dev/sde is only 6 events less than /dev/sda and /dev/sdb so he may be able to force assemble those 3 into a degraded state.

I would suggest as you did, that he do an fsck and checkarray, but that then he immediately add a 4th disk replacing /dev/sdd and allow the array to fully rebuild before doing anything else, not even mounting it. Alternatively he could do a backup of the data on that array which would involve read only while still in the degraded state.
 
Old 12-21-2021, 02:14 AM   #15
lvm_
Member
 
Registered: Jul 2020
Posts: 912

Rep: Reputation: 314Reputation: 314Reputation: 314Reputation: 314
Quote:
Originally Posted by computersavvy View Post
The event count on /dev/sdd is tiny compared to the other three.
Oh yes, missed that in cursory reading - they are all 5-somethings :) So actually the first drive dropped out of the array ages ago, but OP was not paying attention...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Failure after failure after failure.....etc 69Rixter Linux - Laptop and Netbook 5 04-14-2015 09:58 AM
Recover mdadm RAID after failure during RAID level change Caetel Linux - General 1 11-07-2013 10:38 PM
[SOLVED] Software RAID (mdadm) - RAID 0 returns incorrect status for disk failure/disk removed Marjonel Montejo Linux - General 4 10-04-2009 06:15 PM
Dual drive failure in RAID 5 (also, RAID 1, and LVM) ABL Linux - Server 6 05-27-2009 08:01 PM
RAID mdadm - Sending E-Mails on RAID Failure? rootking Linux - General 1 12-25-2007 03:59 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:51 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration