Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
So I have this system with two sata disks, sda and sdb. sdb dies, so last night I replaced it with a new disk. I resync'd the array, and everything looks good.
Overnight I started to get unrecoverable read errors on sda. The array's response to these errors is to restart the sync. It's been doing this constantly ever since.
So I say OK, clearly sda is bad too. (First I checked to make sure I'd really pulled the dead drive and not the survivor. I conclude that since I have log files on the device between ). This resync is never going to finish, and I don't want to prematurely kill my new drive with this constant activity. So I'd like to kill the resync.
Only I can't.
I try to fail, then remove the array member like so:
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2] sda1[0]
77071680 blocks [2/1] [U_]
[=>...................] recovery = 9.2% (7165440/77071680) finish=25.2min speed=46124K/sec
# mdadm /dev/md1 -f /dev/sda1 -r /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md1
mdadm: hot remove failed for /dev/sda1: Device or resource busy
The resync restarts immediately after the device is marked faulty.
Anyone know how I might get myself out of this loop? (Ideally without having to reboot into single user mode or anything like that -- I'm doing this remotely.) I have two new disks on order and they should get here today or tomorrow, and I do have tape backups which are good, but I still don't want to burn out this new disk if I don't have to.
This is the text in dmesg related to the disk in distress:
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x40 { UncorrectableError }
scsi0: ERROR on channel 0, id 0, lun 0, CDB: Read (10) 00 09 30 06 3b 00 00 04 00
Current sda: sense key Medium Error
Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 154142267
Actually the problem turned out to be I was screwed: the initial sync to sdb never completed, so the sync operation from sda -> sdb kept failing, and the RAID software was trying to recover the only way it knew how. The hint is when looking at the array, the alleged "good" disk is labeled "spare", and the "failed" disks is labeled "active sync".
I've replaced both disks (fortunately the read errors were in unused sectors) and everything looks good again.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.