LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-27-2009, 07:36 PM   #1
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Rep: Reputation: 23
mdadm - re-added disk treated as spare


I recently moved a 4 disk RAID-5 to a new machine. The array had 3 SATA disks and 1 IDE, and as i was planning to replace the IDE disk with an SATA one I just moved the 3 SATA disks and added the new disk later. The array assembled OK and began to rebuild. Unfortunately, during the rebuild one of the original drives had an unrecoverable read error and was kicked out of the array. I checked it with smartctl and as it showed only 2 errors within the past month I just removed it and re-added it to the array. However, mdadm added it to the array as a spare and did not rebuild. I stopped the array and attempted to reassemble but was not successful:

Code:
edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d,e}1
mdadm: WARNING /dev/sdd1 and /dev/sde1 appear to have very similar superblocks.
      If they are really different, please --zero the superblock on one
      If they are the same or overlap, please remove one from the list.
sdb and sdc are the original, working devices
sdd was the original device that failed
sde is the new device that was not finished building when the array went down

As I figured sde was certainly not clean, I zeroed it's superblock and attempted to start the array with b,c and d to pull off the most important data before I tried anything else.

Code:
edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d}1
mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
mdadm seems to think that sdd is dirty, however:

Code:
edge ~ # mdadm -E /dev/sd{b,c,d}1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : f59e3fc1:b609490d:db2e1351:d0866abe
  Creation Time : Tue May  1 19:43:52 2007
     Raid Level : raid5
  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
     Array Size : 879100608 (838.38 GiB 900.20 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Aug 28 05:59:02 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : a2d84d22 - correct
         Events : 299596

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       17        2      active sync   /dev/sdb1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync   /dev/sdb1
   3     3       0        0        3      faulty removed
   4     4       8       65        4      spare   /dev/sde1
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : f59e3fc1:b609490d:db2e1351:d0866abe
  Creation Time : Tue May  1 19:43:52 2007
     Raid Level : raid5
  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
     Array Size : 879100608 (838.38 GiB 900.20 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Aug 28 05:59:02 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : a2d84d30 - correct
         Events : 299596

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync   /dev/sdb1
   3     3       0        0        3      faulty removed
   4     4       8       65        4      spare   /dev/sde1
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : f59e3fc1:b609490d:db2e1351:d0866abe
  Creation Time : Tue May  1 19:43:52 2007
     Raid Level : raid5
  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
     Array Size : 879100608 (838.38 GiB 900.20 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Aug 28 05:59:02 2009
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : a2d84d7b - correct
         Events : 299596

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       49       -1      spare   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync   /dev/sdb1
   3     3       0        0        3      faulty removed
   4     4       8       65        4      spare   /dev/sde1
edge ~ #
All 3 report they are clean and have the same UUID, checksum and events so I am not sure why it is attempting to use sdd as a spare when it should be able to build a working array out of the three. I suspect this might be because I did not properly remove the IDE drive (I'm assuming it's the removed RaidDevice 0) from the array before I moved it.

I'm not very experienced with mdadm and so am very hesitate to try any --force or --assume-clean options. Is there any other way to tell it that sdd drive shouldn't be a spare?
 
Old 08-27-2009, 08:55 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 17,828

Rep: Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559
Try

mdadm --fail /dev/md0 /dev/sdd1
madam --remove /dev/md0 /dev/sdd1
mdadm --add /dev/md0 /dev/sdd1
 
Old 08-27-2009, 09:35 PM   #3
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Original Poster
Rep: Reputation: 23
Code:
edge ~ # mdadm --fail /dev/md0 /dev/sdd1
mdadm: set device faulty failed for /dev/sdd1:  No such device
I don't understand how it can't see it if mdadm thinks it's already in the array as a spare and I seem to be able to access it fine.

Code:
edge ~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Tue May  1 19:43:52 2007
     Raid Level : raid5
  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Aug 28 05:59:02 2009
          State : active, degraded, Not Started
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : f59e3fc1:b609490d:db2e1351:d0866abe
         Events : 0.299596

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       33        1      active sync   /dev/sdc1
       2       8       17        2      active sync   /dev/sdb1
       3       0        0        3      removed

       5       8       49        -      spare   /dev/sdd1
 
Old 08-27-2009, 11:48 PM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 17,828

Rep: Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559
Maybe because it's not active. You can still try the remove option.
 
Old 08-28-2009, 07:28 PM   #5
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Original Poster
Rep: Reputation: 23
I'm able to remove it, but whenever I --add or --re-add it sdd still shows up as a spare.
 
Old 08-31-2009, 01:55 AM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 17,828

Rep: Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559Reputation: 2559
Its possible that this msg
Quote:
mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
indicates that you need to edit the /etc/mdadm.conf file (or equiv) to force it to realise that all disks should be used and you have no spare mentioned.
Then try try the --add or --assemble again.
 
Old 08-31-2009, 07:40 PM   #7
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Original Poster
Rep: Reputation: 23
Code:
mdadm.conf:
DEVICE /dev/sd[bcd]1
ARRAY /dev/md0 level=raid5 num-devices=4 uuid=f59e3fc1:b609490d:db2e1351:d0866abe
Array still fails with 'mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.'

I noticed that only sdd sees itself as a spare, the other 2 working devices see it as faulty removed:

Code:
      Number   Major   Minor   RaidDevice State
this     2       8       17        2      active sync   /dev/sdb1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync   /dev/sdb1
   3     3       0        0        3      faulty removed
--
      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync   /dev/sdb1
   3     3       0        0        3      faulty removed
--
      Number   Major   Minor   RaidDevice State
this     5       8       49       -1      spare   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       17        2      active sync   /dev/sdb1
   3     3       0        0        3      faulty removed
   4     4       8       65        4      spare   /dev/sde1
Not only that, but sdd also sees sde as an additional spare...

At this point I'm thinking about just wiping the superblocks and having mdadm re-create them.

Code:
mdadm --create /dev/md0 --level=5 --num-devices=4 --layout=left-symmetric --chunk-size=64 --assume-clean /dev/sd[bcd]1 missing
Am I correct in thinking that this will wipe the superblocks without touching the data itself and cause mdadm to make a new, degraded array that I can mount and pull data off? I'm just worried that if the superblocks aren't matching the actual data might not either.
 
1 members found this post helpful.
Old 04-12-2013, 02:45 PM   #8
benb
LQ Newbie
 
Registered: Apr 2013
Posts: 1

Rep: Reputation: Disabled
Question Any solution?

Hey wingnut64,

I have the exact same problem as you do - one drive unnecessarily dropped by the software, another drive failed during the resync. (However, my problem came by just upgrading the OS, not evening moving the array.) Like you, I have valuable data on it, and I believe the disks contain it, just that mdadm doesn't let me get to it.

Did you solve it? How?

I'm also scared about the risk of re-creating the array. I read other people's forum posts, and they lost all their data by trying this.

I found one happy outcome, on Gentoo Forums: He purchased a new disk, then re-created the array. Apparently, he was lucky and got the right order. I don't want to take that chance. I don't play lottery, this is exactly why I run RAID5.

Last edited by benb; 04-12-2013 at 02:53 PM. Reason: Found another thread about this problem
 
Old 04-22-2013, 10:36 PM   #9
wingnut64
Member
 
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51

Original Poster
Rep: Reputation: 23
Wow, it's been 4 years and over 9,000 views...

Unfortunately no, I was not able to rebuild the array. It's been so long I forget what I did but I didn't get anything off it. Fortunately I was able to recover a decent chunk of the original data from backups and other systems. Most of the rest was replaceable or re creatable. This actually kind of scared me away from mdadm and my NAS setup is now using ZFS on Solaris

If you have extra disks or free space somewhere, you could try dd'ing the individual raw disks in your array to other physical disks or files then loop mounting them (http://www.linuxquestions.org/questi...images-715343/ (disclaimer, i've not tried this). Then you could try potentially dangerous commands on a copy of the raid, possibly on another system.

For the benefit of those who might stumble on this in the future, some miscellaneous thoughts on software RAID:
  • Play around with your chosen solution in a VM or test system. Go through the process of replacing disks and such BEFORE you put data you care about on it.
  • Don't cheap out on the disks (or disk adapters). Some of the cheaper 1+ TB disks had issues with the controller hanging for seconds or minutes, which will cause your RAID card or software to mark the disk as failed. Some models may note they're designed for RAID. It's not always just a marketing gimmick, do your homework. Remember, you're using RAID because you want reliability and/or performance.
  • A degraded array is a single read error away from disaster.
  • Don't ever do anything that will intentionally degrade an array (unless you have a backup...)
  • Remember, RAID is not a backup. It's pretty good at allowing inexperience, typos or bad luck with hardware to destroy large volumes of your data.
 
Old 12-11-2013, 08:19 PM   #10
dominix
LQ Newbie
 
Registered: Sep 2006
Location: Moorea, French Polynesia
Distribution: Debian, Centos, Ubuntu, Fedora ... and more
Posts: 1

Rep: Reputation: 0
I have add the exact same problem.

on a NAS (linux povered) w RAID5 x 4 disks. I have had a failed disk (sdc) , and another that gived smart alert (sdb) ...
we have change sdc, an a technician have by mistake removed the sdb disk while reconstructing. that crashed the filesystem, and umounted the volume. He had put it back immediately but the evil was in.

if I "add" again the removed disk it showed as spare ...

showing md2 as sda[3] sdb[3]S sdd[3] and a missing, so with not enought disk to run

I have solved it with re-creating the raid.

Code:
mdadm --stop /dev/md2
mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 missing /dev/sdd3 --assume-clean
the --assume-clean is the magic touch to convert a disk view as spare in a "normal" disk.
later on, I had added the replaced disk and it rebuilt correctly

Code:
mdadm --add /dev/md2 /dev/sdc3
cat /proc/mdstat
md2 : active raid5 sdc3[4] sdd3[3] sdb3[1] sda3[0]
      2916120768 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
      [>....................]  recovery =  0.0% (143360/972040256) finish=338.9min speed=47786K/sec

hope this help.

Last edited by dominix; 12-11-2013 at 08:22 PM.
 
Old 11-25-2014, 06:20 AM   #11
AngelG
LQ Newbie
 
Registered: Nov 2014
Posts: 1

Rep: Reputation: Disabled
Thumbs up

Quote:
Originally Posted by dominix View Post
I have add the exact same problem.

[...]

hope this help.
Thank you very much. My raid5 is rebuilding now.

...

No data after rebuilding.

Last edited by AngelG; 11-25-2014 at 03:41 PM.
 
Old 08-26-2015, 10:50 PM   #12
YzRacer
LQ Newbie
 
Registered: Aug 2015
Posts: 1

Rep: Reputation: Disabled
Many Thanks!

Quote:
Originally Posted by dominix View Post
I have add the exact same problem.

on a NAS (linux povered) w RAID5 x 4 disks. I have had a failed disk (sdc) , and another that gived smart alert (sdb) ...
we have change sdc, an a technician have by mistake removed the sdb disk while reconstructing. that crashed the filesystem, and umounted the volume. He had put it back immediately but the evil was in.

if I "add" again the removed disk it showed as spare ...

showing md2 as sda[3] sdb[3]S sdd[3] and a missing, so with not enought disk to run

I have solved it with re-creating the raid.

Code:
mdadm --stop /dev/md2
mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 missing /dev/sdd3 --assume-clean
the --assume-clean is the magic touch to convert a disk view as spare in a "normal" disk.
later on, I had added the replaced disk and it rebuilt correctly

Code:
mdadm --add /dev/md2 /dev/sdc3
cat /proc/mdstat
md2 : active raid5 sdc3[4] sdd3[3] sdb3[1] sda3[0]
      2916120768 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
      [>....................]  recovery =  0.0% (143360/972040256) finish=338.9min speed=47786K/sec

hope this help.
I signed up just to say this was the post that saved me! Thanks for your input! I was running RAID-5 on my Synology NAS and wanted to up the size so I swapped one of my drives for a larger one. During the automatic rebuild process the NAS decided another disk had some errors and the volume failed. I searched everywhere and tried a file system check with no luck. Eventually I found this thread and I went for it. I used the following command after inserting my original drive back into the NAS and now I am back online. Definitely recommend running a file system check before and after any major change, and just as many others suggest before making any changes that could affect your RAID be sure you have a backup!

Code:
mdadm --stop /dev/md2
mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 --assume-clean
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mdadm - removing faulty spare carlmarshall Linux - Server 7 04-28-2009 05:31 PM
mdadm forced resyncing to activate spare drive javaholic Linux - Server 3 12-15-2008 07:24 AM
[Fedora 9]mdadm + faulty spare setkos Linux - Newbie 0 10-30-2008 10:17 AM
RAID 5 with mdadm "spare" and "active sync" confusion ufmale Linux - Server 1 12-08-2007 11:31 AM
IDE Disk has no more spare sectors -- still good for which filesystem? M_F_H Linux - Hardware 1 08-12-2006 02:07 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 06:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration