Degraded Array on Software Raid
At 5am, I get this message from my Centos 4 Server:
"DegradedArray event had been detected on md device /dev/md1" This is a web server and is configured with software raid 1 on two 160gb SATA drives in hot swap bays. Here is the errors reported in the log: messages.1:Mar 22 05:03:02 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7 messages.1:Mar 22 05:03:02 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7 messages.1:Mar 22 05:03:32 ns0 kernel: ata1: command 0x25 timeout, stat 0xd0 host_stat 0x61 messages.1:Mar 22 05:03:32 ns0 kernel: ata1: status=0xd0 { Busy } messages.1:Mar 22 05:03:32 ns0 kernel: SCSI error : <0 0 1 0> return code = 0x8000002 messages.1:Mar 22 05:03:32 ns0 kernel: Current sdb: sense key Aborted Command messages.1:Mar 22 05:03:32 ns0 kernel: Additional sense: Scsi parity error messages.1:Mar 22 05:03:32 ns0 kernel: end_request: I/O error, dev sdb, sector 312576567 messages.1:Mar 22 05:03:32 ns0 kernel: Buffer I/O error on device sdb7, logical block 128592192 messages.1:Mar 22 05:03:32 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7 messages.1:Mar 22 05:04:02 ns0 kernel: ata1: command 0x25 timeout, stat 0xd0 host_stat 0x61 Raid on MD0 is still active: more /proc/mdstat Personalities : [raid1] md5 : active raid1 sda1[0] 3068288 blocks [2/1] [U_] md2 : active raid1 sda2[0] 10241344 blocks [2/1] [U_] md1 : active raid1 sda3[0] 10241344 blocks [2/1] [U_] md3 : active raid1 sda6[0] 2048192 blocks [2/1] [U_] md4 : active raid1 sda7[0] 128592192 blocks [2/1] [U_] md0 : active raid1 sdb5[1] sda5[0] 2096384 blocks [2/2] [UU] unused devices: <none> Questions: Is there any troubleshooting that I could (or should) do on this? Or replace the drive that appears to be a problem? If I am to replace the drive, how do I rebuild the array with minimal downtime to the server? Thanks, in advance for your help. Dan |
Everything described below can be found in the mdadm man page.
********** It looks like the sdb member of all raid1 pairs have failed, except for md0. Have a look at md1 using --detail (-D): Code:
# mdadm -D /dev/md1 Code:
# mdadm /dev/md1 -r /dev/sdb3 -a /dev/sdb3 If sdb3 reverts to faulty, then the drive may be defective. If you have the manufacturer’s diagnostic utility, then try testing the drive with it. If it fails that test, then it’s time to spend some money on a new drive. |
Thanks for your reply.
The command returns this: /sbin/mdadm -D /dev/md1 /dev/md1: Version : 00.90.01 Creation Time : Wed Nov 2 03:44:57 2005 Raid Level : raid1 Array Size : 10241344 (9.77 GiB 10.49 GB) Device Size : 10241344 (9.77 GiB 10.49 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Thu Mar 30 18:05:31 2006 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 0 0 -1 removed UUID : 1d16a55f:71fed86c:5a198cc1:1db31dfa Events : 0.1147282 I am assuming that 'degraded' doesn't indicate "faulty" ? |
It looks like sdb3 has already been removed. Just to verify that sdb3 is paired with sda3, run --examine (-E):
Code:
# mdadm -E /dev/sdb3 Code:
# mdadm /dev/md1 -a /dev/sdb3 |
Thanks for your assistance. I was able to add them back and the recovery
went fine. # more /proc/mdstat Personalities : [raid1] md5 : active raid1 sdb1[1] sda1[0] 3068288 blocks [2/2] [UU] md2 : active raid1 sdb2[1] sda2[0] 10241344 blocks [2/2] [UU] md1 : active raid1 sdb3[1] sda3[0] 10241344 blocks [2/2] [UU] md3 : active raid1 sdb6[1] sda6[0] 2048192 blocks [2/2] [UU] md4 : active raid1 sdb7[1] sda7[0] 128592192 blocks [2/2] [UU] md0 : active raid1 sdb5[1] sda5[0] 2096384 blocks [2/2] [UU] unused devices: <none> I appreciate your help. Dan |
It’s good to hear that everything worked out.
It could have turned out to be a dying drive, which is never much fun to fix in a raid. |
Maybe a switch of distro?
My RAID1 had exactly the same problem (except it is an IDE setup). I followed the procedure to diagnose and fix it to the letter, and the drive is now rebuilding. Many thanks to WhatsHisName. I recently switched from Mandrake to Ubuntu, could that be the culprit?
Toby |
hello everyone!!
need help, I'm running Red Hat WS I have 2 SATA Disk (sda and sdb), but lately my sdb gone failed "faulty" as the output of /proc/mdstat.. so this what i did; try to remove all the failed disk from the array: #mdadm --manage /dev/md0 --remove /dev/sdb1 #mdadm --manage /dev/md1 --remove /dev/sdb2 #mdadm --manage /dev/md2 --remove /dev/sdb3 #mdadm --manage /dev/md3 --remove /dev/sdb5 #mdadm --manage /dev/md4 --remove /dev/sdb6 #mdadm --manage /dev/md5 --remove /dev/sdb7 i shutdown the machine to replace the bad disk with a new one, after i changed the hardisk. i copy the partion #sfdisk -d /dev/sda | sfdisk /dev/sdb then add back the sdbY devices in the array #mdadm --manage /dev/md0 --add /dev/sdb1 #mdadm --manage /dev/md1 --add /dev/sdb2 #mdadm --manage /dev/md2 --add /dev/sdb3 #mdadm --manage /dev/md3 --add /dev/sdb5 #mdadm --manage /dev/md4 --add /dev/sdb6 #mdadm --manage /dev/md5 --add /dev/sdb7 i watch /proc/mdstat and see all are syncing well & the mirroring "[UU]" are all complete, i also try mdadm --query --detail /dev/md[0-5] and all tells "clean"..and it gives me a good night sleep, but the next morning my machine give a tons of error, after seeing the output mdstat some of my sdb fails again... what seems to be the problem??? i dont konw what to do next thanks in advance guys!! |
Degraded Software RAID Array - RAID 5 CentOS 5.0
Hi there,
I have read over this post and I have received similar e-mails... so here is the e-mail I receive: Quote:
Quote:
Thank you!!! |
All times are GMT -5. The time now is 04:59 AM. |