Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
08-27-2009, 06:36 PM
|
#1
|
Member
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51
Rep:
|
mdadm - re-added disk treated as spare
I recently moved a 4 disk RAID-5 to a new machine. The array had 3 SATA disks and 1 IDE, and as i was planning to replace the IDE disk with an SATA one I just moved the 3 SATA disks and added the new disk later. The array assembled OK and began to rebuild. Unfortunately, during the rebuild one of the original drives had an unrecoverable read error and was kicked out of the array. I checked it with smartctl and as it showed only 2 errors within the past month I just removed it and re-added it to the array. However, mdadm added it to the array as a spare and did not rebuild. I stopped the array and attempted to reassemble but was not successful:
Code:
edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d,e}1
mdadm: WARNING /dev/sdd1 and /dev/sde1 appear to have very similar superblocks.
If they are really different, please --zero the superblock on one
If they are the same or overlap, please remove one from the list.
sdb and sdc are the original, working devices
sdd was the original device that failed
sde is the new device that was not finished building when the array went down
As I figured sde was certainly not clean, I zeroed it's superblock and attempted to start the array with b,c and d to pull off the most important data before I tried anything else.
Code:
edge ~ # mdadm --assemble /dev/md0 /dev/sd{b,c,d}1
mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
mdadm seems to think that sdd is dirty, however:
Code:
edge ~ # mdadm -E /dev/sd{b,c,d}1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : f59e3fc1:b609490d:db2e1351:d0866abe
Creation Time : Tue May 1 19:43:52 2007
Raid Level : raid5
Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
Array Size : 879100608 (838.38 GiB 900.20 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Aug 28 05:59:02 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : a2d84d22 - correct
Events : 299596
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 17 2 active sync /dev/sdb1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 active sync /dev/sdb1
3 3 0 0 3 faulty removed
4 4 8 65 4 spare /dev/sde1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : f59e3fc1:b609490d:db2e1351:d0866abe
Creation Time : Tue May 1 19:43:52 2007
Raid Level : raid5
Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
Array Size : 879100608 (838.38 GiB 900.20 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Aug 28 05:59:02 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : a2d84d30 - correct
Events : 299596
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 active sync /dev/sdb1
3 3 0 0 3 faulty removed
4 4 8 65 4 spare /dev/sde1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : f59e3fc1:b609490d:db2e1351:d0866abe
Creation Time : Tue May 1 19:43:52 2007
Raid Level : raid5
Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
Array Size : 879100608 (838.38 GiB 900.20 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Aug 28 05:59:02 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : a2d84d7b - correct
Events : 299596
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 8 49 -1 spare /dev/sdd1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 active sync /dev/sdb1
3 3 0 0 3 faulty removed
4 4 8 65 4 spare /dev/sde1
edge ~ #
All 3 report they are clean and have the same UUID, checksum and events so I am not sure why it is attempting to use sdd as a spare when it should be able to build a working array out of the three. I suspect this might be because I did not properly remove the IDE drive (I'm assuming it's the removed RaidDevice 0) from the array before I moved it.
I'm not very experienced with mdadm and so am very hesitate to try any --force or --assume-clean options. Is there any other way to tell it that sdd drive shouldn't be a spare?
|
|
|
08-27-2009, 07:55 PM
|
#2
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,391
|
Try
mdadm --fail /dev/md0 /dev/sdd1
madam --remove /dev/md0 /dev/sdd1
mdadm --add /dev/md0 /dev/sdd1
|
|
|
08-27-2009, 08:35 PM
|
#3
|
Member
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51
Original Poster
Rep:
|
Code:
edge ~ # mdadm --fail /dev/md0 /dev/sdd1
mdadm: set device faulty failed for /dev/sdd1: No such device
I don't understand how it can't see it if mdadm thinks it's already in the array as a spare and I seem to be able to access it fine.
Code:
edge ~ # mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Tue May 1 19:43:52 2007
Raid Level : raid5
Used Dev Size : 293033536 (279.46 GiB 300.07 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Aug 28 05:59:02 2009
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
UUID : f59e3fc1:b609490d:db2e1351:d0866abe
Events : 0.299596
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 33 1 active sync /dev/sdc1
2 8 17 2 active sync /dev/sdb1
3 0 0 3 removed
5 8 49 - spare /dev/sdd1
|
|
|
08-27-2009, 10:48 PM
|
#4
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,391
|
Maybe because it's not active. You can still try the remove option.
|
|
|
08-28-2009, 06:28 PM
|
#5
|
Member
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51
Original Poster
Rep:
|
I'm able to remove it, but whenever I --add or --re-add it sdd still shows up as a spare.
|
|
|
08-31-2009, 12:55 AM
|
#6
|
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,391
|
Its possible that this msg
Quote:
mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
|
indicates that you need to edit the /etc/mdadm.conf file (or equiv) to force it to realise that all disks should be used and you have no spare mentioned.
Then try try the --add or --assemble again.
|
|
|
08-31-2009, 06:40 PM
|
#7
|
Member
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51
Original Poster
Rep:
|
Code:
mdadm.conf:
DEVICE /dev/sd[bcd]1
ARRAY /dev/md0 level=raid5 num-devices=4 uuid=f59e3fc1:b609490d:db2e1351:d0866abe
Array still fails with 'mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.'
I noticed that only sdd sees itself as a spare, the other 2 working devices see it as faulty removed:
Code:
Number Major Minor RaidDevice State
this 2 8 17 2 active sync /dev/sdb1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 active sync /dev/sdb1
3 3 0 0 3 faulty removed
--
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 active sync /dev/sdb1
3 3 0 0 3 faulty removed
--
Number Major Minor RaidDevice State
this 5 8 49 -1 spare /dev/sdd1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 active sync /dev/sdb1
3 3 0 0 3 faulty removed
4 4 8 65 4 spare /dev/sde1
Not only that, but sdd also sees sde as an additional spare...
At this point I'm thinking about just wiping the superblocks and having mdadm re-create them.
Code:
mdadm --create /dev/md0 --level=5 --num-devices=4 --layout=left-symmetric --chunk-size=64 --assume-clean /dev/sd[bcd]1 missing
Am I correct in thinking that this will wipe the superblocks without touching the data itself and cause mdadm to make a new, degraded array that I can mount and pull data off? I'm just worried that if the superblocks aren't matching the actual data might not either.
|
|
1 members found this post helpful.
|
04-12-2013, 01:45 PM
|
#8
|
LQ Newbie
Registered: Apr 2013
Posts: 1
Rep:
|
Any solution?
Hey wingnut64,
I have the exact same problem as you do - one drive unnecessarily dropped by the software, another drive failed during the resync. (However, my problem came by just upgrading the OS, not evening moving the array.) Like you, I have valuable data on it, and I believe the disks contain it, just that mdadm doesn't let me get to it.
Did you solve it? How?
I'm also scared about the risk of re-creating the array. I read other people's forum posts, and they lost all their data by trying this.
I found one happy outcome, on Gentoo Forums: He purchased a new disk, then re-created the array. Apparently, he was lucky and got the right order. I don't want to take that chance. I don't play lottery, this is exactly why I run RAID5.
Last edited by benb; 04-12-2013 at 01:53 PM.
Reason: Found another thread about this problem
|
|
|
04-22-2013, 09:36 PM
|
#9
|
Member
Registered: Sep 2004
Distribution: AIX, RHEL, Ubuntu
Posts: 51
Original Poster
Rep:
|
Wow, it's been 4 years and over 9,000 views...
Unfortunately no, I was not able to rebuild the array. It's been so long I forget what I did but I didn't get anything off it. Fortunately I was able to recover a decent chunk of the original data from backups and other systems. Most of the rest was replaceable or re creatable. This actually kind of scared me away from mdadm and my NAS setup is now using ZFS on Solaris
If you have extra disks or free space somewhere, you could try dd'ing the individual raw disks in your array to other physical disks or files then loop mounting them ( http://www.linuxquestions.org/questi...images-715343/ (disclaimer, i've not tried this). Then you could try potentially dangerous commands on a copy of the raid, possibly on another system.
For the benefit of those who might stumble on this in the future, some miscellaneous thoughts on software RAID:
- Play around with your chosen solution in a VM or test system. Go through the process of replacing disks and such BEFORE you put data you care about on it.
- Don't cheap out on the disks (or disk adapters). Some of the cheaper 1+ TB disks had issues with the controller hanging for seconds or minutes, which will cause your RAID card or software to mark the disk as failed. Some models may note they're designed for RAID. It's not always just a marketing gimmick, do your homework. Remember, you're using RAID because you want reliability and/or performance.
- A degraded array is a single read error away from disaster.
- Don't ever do anything that will intentionally degrade an array (unless you have a backup...)
- Remember, RAID is not a backup. It's pretty good at allowing inexperience, typos or bad luck with hardware to destroy large volumes of your data.
|
|
|
12-11-2013, 07:19 PM
|
#10
|
LQ Newbie
Registered: Sep 2006
Location: Moorea, French Polynesia
Distribution: Debian, Centos, Ubuntu, Fedora ... and more
Posts: 2
Rep:
|
I have add the exact same problem.
on a NAS (linux povered) w RAID5 x 4 disks. I have had a failed disk (sdc) , and another that gived smart alert (sdb) ...
we have change sdc, an a technician have by mistake removed the sdb disk while reconstructing. that crashed the filesystem, and umounted the volume. He had put it back immediately but the evil was in.
if I "add" again the removed disk it showed as spare ...
showing md2 as sda[3] sdb[3]S sdd[3] and a missing, so with not enought disk to run
I have solved it with re-creating the raid.
Code:
mdadm --stop /dev/md2
mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 missing /dev/sdd3 --assume-clean
the --assume-clean is the magic touch to convert a disk view as spare in a "normal" disk.
later on, I had added the replaced disk and it rebuilt correctly
Code:
mdadm --add /dev/md2 /dev/sdc3
cat /proc/mdstat
md2 : active raid5 sdc3[4] sdd3[3] sdb3[1] sda3[0]
2916120768 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
[>....................] recovery = 0.0% (143360/972040256) finish=338.9min speed=47786K/sec
hope this help.
Last edited by dominix; 12-11-2013 at 07:22 PM.
|
|
|
11-25-2014, 05:20 AM
|
#11
|
LQ Newbie
Registered: Nov 2014
Posts: 1
Rep:
|
Quote:
Originally Posted by dominix
I have add the exact same problem.
[...]
hope this help.
|
Thank you very much. My raid5 is rebuilding now.
...
No data after rebuilding.
Last edited by AngelG; 11-25-2014 at 02:41 PM.
|
|
|
08-26-2015, 09:50 PM
|
#12
|
LQ Newbie
Registered: Aug 2015
Posts: 1
Rep:
|
Many Thanks!
Quote:
Originally Posted by dominix
I have add the exact same problem.
on a NAS (linux povered) w RAID5 x 4 disks. I have had a failed disk (sdc) , and another that gived smart alert (sdb) ...
we have change sdc, an a technician have by mistake removed the sdb disk while reconstructing. that crashed the filesystem, and umounted the volume. He had put it back immediately but the evil was in.
if I "add" again the removed disk it showed as spare ...
showing md2 as sda[3] sdb[3]S sdd[3] and a missing, so with not enought disk to run
I have solved it with re-creating the raid.
Code:
mdadm --stop /dev/md2
mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 missing /dev/sdd3 --assume-clean
the --assume-clean is the magic touch to convert a disk view as spare in a "normal" disk.
later on, I had added the replaced disk and it rebuilt correctly
Code:
mdadm --add /dev/md2 /dev/sdc3
cat /proc/mdstat
md2 : active raid5 sdc3[4] sdd3[3] sdb3[1] sda3[0]
2916120768 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
[>....................] recovery = 0.0% (143360/972040256) finish=338.9min speed=47786K/sec
hope this help.
|
I signed up just to say this was the post that saved me! Thanks for your input! I was running RAID-5 on my Synology NAS and wanted to up the size so I swapped one of my drives for a larger one. During the automatic rebuild process the NAS decided another disk had some errors and the volume failed. I searched everywhere and tried a file system check with no luck. Eventually I found this thread and I went for it. I used the following command after inserting my original drive back into the NAS and now I am back online. Definitely recommend running a file system check before and after any major change, and just as many others suggest before making any changes that could affect your RAID be sure you have a backup!
Code:
mdadm --stop /dev/md2
mdadm --create /dev/md2 --level=5 --raid-devices=4 --chunk=64 --name=RackStation:2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 --assume-clean
|
|
|
07-13-2022, 09:49 AM
|
#13
|
LQ Newbie
Registered: Jul 2022
Posts: 1
Rep:
|
Like the last YzRacer in 2015 I only signed up to say that this post saved my Synology raid. The "--assume-clean" option did the trick for me.
My Synology raid was in "Clean, FAILED" and re-building the array using the above command worked like a charm. Now fsck is running and so far no bad blocks.
Thanks!
|
|
|
08-09-2022, 04:02 PM
|
#14
|
LQ Newbie
Registered: Sep 2006
Location: Moorea, French Polynesia
Distribution: Debian, Centos, Ubuntu, Fedora ... and more
Posts: 2
Rep:
|
\o/
you're welcome.
|
|
|
All times are GMT -5. The time now is 11:47 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|