LinuxQuestions.org - mdadm - re-added disk treated as spare

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - mdadm - re-added disk treated as spare (https://www.linuxquestions.org/questions/linux-server-73/mdadm-re-added-disk-treated-as-spare-750739/)

mdadm - re-added disk treated as spare

I recently moved a 4 disk RAID-5 to a new machine. The array had 3 SATA disks and 1 IDE, and as i was planning to replace the IDE disk with an SATA one I just moved the 3 SATA disks and added the new disk later. The array assembled OK and began to rebuild. Unfortunately, during the rebuild one of the original drives had an unrecoverable read error and was kicked out of the array. I checked it with smartctl and as it showed only 2 errors within the past month I just removed it and re-added it to the array. However, mdadm added it to the array as a spare and did not rebuild. I stopped the array and attempted to reassemble but was not successful:

Code:

edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d,e}1

mdadm: WARNING /dev/sdd1 and /dev/sde1 appear to have very similar superblocks.

      If they are really different, please --zero the superblock on one

      If they are the same or overlap, please remove one from the list.

sdb and sdc are the original, working devices
sdd was the original device that failed
sde is the new device that was not finished building when the array went down

As I figured sde was certainly not clean, I zeroed it's superblock and attempted to start the array with b,c and d to pull off the most important data before I tried anything else.

Code:

edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d}1

mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.

mdadm seems to think that sdd is dirty, however:

Code:

edge ~ # mdadm -E /dev/sd{b,c,d}1

/dev/sdb1:

          Magic : a92b4efc

        Version : 0.90.00

          UUID : f59e3fc1:b609490d:db2e1351:d0866abe

  Creation Time : Tue May  1 19:43:52 2007

    Raid Level : raid5

  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)

    Array Size : 879100608 (838.38 GiB 900.20 GB)

  Raid Devices : 4

  Total Devices : 3

Preferred Minor : 0



    Update Time : Fri Aug 28 05:59:02 2009

          State : clean

 Active Devices : 2

Working Devices : 3

 Failed Devices : 1

  Spare Devices : 1

      Checksum : a2d84d22 - correct

        Events : 299596



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    2      8      17        2      active sync  /dev/sdb1



  0    0      0        0        0      removed

  1    1      8      33        1      active sync  /dev/sdc1

  2    2      8      17        2      active sync  /dev/sdb1

  3    3      0        0        3      faulty removed

  4    4      8      65        4      spare  /dev/sde1

/dev/sdc1:

          Magic : a92b4efc

        Version : 0.90.00

          UUID : f59e3fc1:b609490d:db2e1351:d0866abe

  Creation Time : Tue May  1 19:43:52 2007

    Raid Level : raid5

  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)

    Array Size : 879100608 (838.38 GiB 900.20 GB)

  Raid Devices : 4

  Total Devices : 3

Preferred Minor : 0



    Update Time : Fri Aug 28 05:59:02 2009

          State : clean

 Active Devices : 2

Working Devices : 3

 Failed Devices : 1

  Spare Devices : 1

      Checksum : a2d84d30 - correct

        Events : 299596



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    1      8      33        1      active sync  /dev/sdc1



  0    0      0        0        0      removed

  1    1      8      33        1      active sync  /dev/sdc1

  2    2      8      17        2      active sync  /dev/sdb1

  3    3      0        0        3      faulty removed

  4    4      8      65        4      spare  /dev/sde1

/dev/sdd1:

          Magic : a92b4efc

        Version : 0.90.00

          UUID : f59e3fc1:b609490d:db2e1351:d0866abe

  Creation Time : Tue May  1 19:43:52 2007

    Raid Level : raid5

  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)

    Array Size : 879100608 (838.38 GiB 900.20 GB)

  Raid Devices : 4

  Total Devices : 3

Preferred Minor : 0



    Update Time : Fri Aug 28 05:59:02 2009

          State : clean

 Active Devices : 2

Working Devices : 3

 Failed Devices : 1

  Spare Devices : 1

      Checksum : a2d84d7b - correct

        Events : 299596



        Layout : left-symmetric

    Chunk Size : 64K



      Number  Major  Minor  RaidDevice State

this    5      8      49      -1      spare  /dev/sdd1



  0    0      0        0        0      removed

  1    1      8      33        1      active sync  /dev/sdc1

  2    2      8      17        2      active sync  /dev/sdb1

  3    3      0        0        3      faulty removed

  4    4      8      65        4      spare  /dev/sde1

edge ~ #

All 3 report they are clean and have the same UUID, checksum and events so I am not sure why it is attempting to use sdd as a spare when it should be able to build a working array out of the three. I suspect this might be because I did not properly remove the IDE drive (I'm assuming it's the removed RaidDevice 0) from the array before I moved it.

I'm not very experienced with mdadm and so am very hesitate to try any --force or --assume-clean options. Is there any other way to tell it that sdd drive shouldn't be a spare?

Try

mdadm --fail /dev/md0 /dev/sdd1
madam --remove /dev/md0 /dev/sdd1
mdadm --add /dev/md0 /dev/sdd1

Code:

edge ~ # mdadm --fail /dev/md0 /dev/sdd1

mdadm: set device faulty failed for /dev/sdd1:  No such device

I don't understand how it can't see it if mdadm thinks it's already in the array as a spare and I seem to be able to access it fine.

Code:

edge ~ # mdadm --detail /dev/md0

/dev/md0:

        Version : 0.90

  Creation Time : Tue May  1 19:43:52 2007

    Raid Level : raid5

  Used Dev Size : 293033536 (279.46 GiB 300.07 GB)

  Raid Devices : 4

  Total Devices : 3

Preferred Minor : 0

    Persistence : Superblock is persistent



    Update Time : Fri Aug 28 05:59:02 2009

          State : active, degraded, Not Started

 Active Devices : 2

Working Devices : 3

 Failed Devices : 0

  Spare Devices : 1



        Layout : left-symmetric

    Chunk Size : 64K



          UUID : f59e3fc1:b609490d:db2e1351:d0866abe

        Events : 0.299596



    Number  Major  Minor  RaidDevice State

      0      0        0        0      removed

      1      8      33        1      active sync  /dev/sdc1

      2      8      17        2      active sync  /dev/sdb1

      3      0        0        3      removed



      5      8      49        -      spare  /dev/sdd1

Maybe because it's not active. You can still try the remove option.

I'm able to remove it, but whenever I --add or --re-add it sdd still shows up as a spare.

Its possible that this msg

Quote:

mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.

indicates that you need to edit the /etc/mdadm.conf file (or equiv) to force it to realise that all disks should be used and you have no spare mentioned.
Then try try the --add or --assemble again.

Code:

mdadm.conf:

DEVICE /dev/sd[bcd]1

ARRAY /dev/md0 level=raid5 num-devices=4 uuid=f59e3fc1:b609490d:db2e1351:d0866abe

Array still fails with 'mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.'

I noticed that only sdd sees itself as a spare, the other 2 working devices see it as faulty removed:

Code:

      Number  Major  Minor  RaidDevice State

this    2      8      17        2      active sync  /dev/sdb1



  0    0      0        0        0      removed

  1    1      8      33        1      active sync  /dev/sdc1

  2    2      8      17        2      active sync  /dev/sdb1

  3    3      0        0        3      faulty removed

--

      Number  Major  Minor  RaidDevice State

this    1      8      33        1      active sync  /dev/sdc1



  0    0      0        0        0      removed

  1    1      8      33        1      active sync  /dev/sdc1

  2    2      8      17        2      active sync  /dev/sdb1

  3    3      0        0        3      faulty removed

--

      Number  Major  Minor  RaidDevice State

this    5      8      49      -1      spare  /dev/sdd1



  0    0      0        0        0      removed

  1    1      8      33        1      active sync  /dev/sdc1

  2    2      8      17        2      active sync  /dev/sdb1

  3    3      0        0        3      faulty removed

  4    4      8      65        4      spare  /dev/sde1

Not only that, but sdd also sees sde as an additional spare...

At this point I'm thinking about just wiping the superblocks and having mdadm re-create them.

Code:

mdadm --create /dev/md0 --level=5 --num-devices=4 --layout=left-symmetric --chunk-size=64 --assume-clean /dev/sd[bcd]1 missing

Am I correct in thinking that this will wipe the superblocks without touching the data itself and cause mdadm to make a new, degraded array that I can mount and pull data off? I'm just worried that if the superblocks aren't matching the actual data might not either.

Hey wingnut64,

I have the exact same problem as you do - one drive unnecessarily dropped by the software, another drive failed during the resync. (However, my problem came by just upgrading the OS, not evening moving the array.) Like you, I have valuable data on it, and I believe the disks contain it, just that mdadm doesn't let me get to it.

Did you solve it? How?

I'm also scared about the risk of re-creating the array. I read other people's forum posts, and they lost all their data by trying this.

I found one happy outcome, on Gentoo Forums: He purchased a new disk, then re-created the array. Apparently, he was lucky and got the right order. I don't want to take that chance. I don't play lottery, this is exactly why I run RAID5.

Wow, it's been 4 years and over 9,000 views...

Unfortunately no, I was not able to rebuild the array. It's been so long I forget what I did but I didn't get anything off it. Fortunately I was able to recover a decent chunk of the original data from backups and other systems. Most of the rest was replaceable or re creatable. This actually kind of scared me away from mdadm and my NAS setup is now using ZFS on Solaris :)

If you have extra disks or free space somewhere, you could try dd'ing the individual raw disks in your array to other physical disks or files then loop mounting them (http://www.linuxquestions.org/questi...images-715343/ (disclaimer, i've not tried this). Then you could try potentially dangerous commands on a copy of the raid, possibly on another system.

For the benefit of those who might stumble on this in the future, some miscellaneous thoughts on software RAID:

Play around with your chosen solution in a VM or test system. Go through the process of replacing disks and such BEFORE you put data you care about on it.
Don't cheap out on the disks (or disk adapters). Some of the cheaper 1+ TB disks had issues with the controller hanging for seconds or minutes, which will cause your RAID card or software to mark the disk as failed. Some models may note they're designed for RAID. It's not always just a marketing gimmick, do your homework. Remember, you're using RAID because you want reliability and/or performance.
A degraded array is a single read error away from disaster.
Don't ever do anything that will intentionally degrade an array (unless you have a backup...)
Remember, RAID is not a backup. It's pretty good at allowing inexperience, typos or bad luck with hardware to destroy large volumes of your data.

I have add the exact same problem.

on a NAS (linux povered) w RAID5 x 4 disks. I have had a failed disk (sdc) , and another that gived smart alert (sdb) ...
we have change sdc, an a technician have by mistake removed the sdb disk while reconstructing. that crashed the filesystem, and umounted the volume. He had put it back immediately but the evil was in.

if I "add" again the removed disk it showed as spare ...

showing md2 as sda[3] sdb[3]S sdd[3] and a missing, so with not enought disk to run

I have solved it with re-creating the raid.

Code:

mdadm --stop /dev/md2

mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 missing /dev/sdd3 --assume-clean

the --assume-clean is the magic touch to convert a disk view as spare in a "normal" disk.
later on, I had added the replaced disk and it rebuilt correctly

Code:

mdadm --add /dev/md2 /dev/sdc3

cat /proc/mdstat

md2 : active raid5 sdc3[4] sdd3[3] sdb3[1] sda3[0]

      2916120768 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]

      [>....................]  recovery =  0.0% (143360/972040256) finish=338.9min speed=47786K/sec

hope this help.

Quote:

Originally Posted by dominix (Post 5079206)

I have add the exact same problem.

[...]

hope this help.

Thank you very much. My raid5 is rebuilding now.

...

:( No data after rebuilding.

Quote:

Originally Posted by dominix (Post 5079206)

Code:

mdadm --stop /dev/md2

mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 missing /dev/sdd3 --assume-clean

the --assume-clean is the magic touch to convert a disk view as spare in a "normal" disk.
later on, I had added the replaced disk and it rebuilt correctly

Code:

mdadm --add /dev/md2 /dev/sdc3

cat /proc/mdstat

md2 : active raid5 sdc3[4] sdd3[3] sdb3[1] sda3[0]

      2916120768 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]

      [>....................]  recovery =  0.0% (143360/972040256) finish=338.9min speed=47786K/sec

hope this help.

I signed up just to say this was the post that saved me! Thanks for your input! I was running RAID-5 on my Synology NAS and wanted to up the size so I swapped one of my drives for a larger one. During the automatic rebuild process the NAS decided another disk had some errors and the volume failed. I searched everywhere and tried a file system check with no luck. Eventually I found this thread and I went for it. I used the following command after inserting my original drive back into the NAS and now I am back online. Definitely recommend running a file system check before and after any major change, and just as many others suggest before making any changes that could affect your RAID be sure you have a backup!

Code:

mdadm --stop /dev/md2

mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 --assume-clean

Like the last YzRacer in 2015 I only signed up to say that this post saved my Synology raid. The "--assume-clean" option did the trick for me.

My Synology raid was in "Clean, FAILED" and re-building the array using the above command worked like a charm. Now fsck is running and so far no bad blocks.

Thanks!

\o/

you're welcome.