LinuxQuestions.org - Degraded Array on Software Raid

- Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)

- - Degraded Array on Software Raid (https://www.linuxquestions.org/questions/linux-hardware-18/degraded-array-on-software-raid-429857/)

Degraded Array on Software Raid

At 5am, I get this message from my Centos 4 Server:

"DegradedArray event had been detected on md device /dev/md1"

This is a web server and is configured with software raid 1 on two 160gb SATA drives
in hot swap bays.

Here is the errors reported in the log:

messages.1:Mar 22 05:03:02 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7
messages.1:Mar 22 05:03:02 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7
messages.1:Mar 22 05:03:32 ns0 kernel: ata1: command 0x25 timeout, stat 0xd0 host_stat 0x61
messages.1:Mar 22 05:03:32 ns0 kernel: ata1: status=0xd0 { Busy }
messages.1:Mar 22 05:03:32 ns0 kernel: SCSI error : <0 0 1 0> return code = 0x8000002
messages.1:Mar 22 05:03:32 ns0 kernel: Current sdb: sense key Aborted Command
messages.1:Mar 22 05:03:32 ns0 kernel: Additional sense: Scsi parity error
messages.1:Mar 22 05:03:32 ns0 kernel: end_request: I/O error, dev sdb, sector 312576567
messages.1:Mar 22 05:03:32 ns0 kernel: Buffer I/O error on device sdb7, logical block 128592192
messages.1:Mar 22 05:03:32 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7
messages.1:Mar 22 05:04:02 ns0 kernel: ata1: command 0x25 timeout, stat 0xd0 host_stat 0x61

Raid on MD0 is still active:

more /proc/mdstat
Personalities : [raid1]
md5 : active raid1 sda1[0]
3068288 blocks [2/1] [U_]

md2 : active raid1 sda2[0]
10241344 blocks [2/1] [U_]

md1 : active raid1 sda3[0]
10241344 blocks [2/1] [U_]

md3 : active raid1 sda6[0]
2048192 blocks [2/1] [U_]

md4 : active raid1 sda7[0]
128592192 blocks [2/1] [U_]

md0 : active raid1 sdb5[1] sda5[0]
2096384 blocks [2/2] [UU]

unused devices: <none>

Questions: Is there any troubleshooting that I could (or should) do on this? Or replace the
drive that appears to be a problem?

If I am to replace the drive, how do I rebuild the array with minimal downtime to the server?

Thanks, in advance for your help.

Dan

Everything described below can be found in the mdadm man page.

**********

It looks like the sdb member of all raid1 pairs have failed, except for md0.

Have a look at md1 using --detail (-D):

Code:

# mdadm -D /dev/md1

Assuming that sdb3 is listed as “faulty”, try removing it and then adding it back:

Code:

# mdadm /dev/md1 -r /dev/sdb3 -a /dev/sdb3

# mdadm -D /dev/md1

If you see the “Rebuild Status:” percentage increasing, then things are looking up and you can try the same procedure on the other degraded raids.

If sdb3 reverts to faulty, then the drive may be defective. If you have the manufacturer’s diagnostic utility, then try testing the drive with it. If it fails that test, then it’s time to spend some money on a new drive.

Thanks for your reply.

The command returns this:
/sbin/mdadm -D /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Nov 2 03:44:57 2005
Raid Level : raid1
Array Size : 10241344 (9.77 GiB 10.49 GB)
Device Size : 10241344 (9.77 GiB 10.49 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Thu Mar 30 18:05:31 2006
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 -1 removed
UUID : 1d16a55f:71fed86c:5a198cc1:1db31dfa
Events : 0.1147282

I am assuming that 'degraded' doesn't indicate "faulty" ?

It looks like sdb3 has already been removed. Just to verify that sdb3 is paired with sda3, run --examine (-E):

Code:

# mdadm -E /dev/sdb3

That should show the association of sdb3 with sda3 in md1. Assuming that to be true, then try adding sdb3 back to md1:

Code:

# mdadm /dev/md1 -a /dev/sdb3

# mdadm -D /dev/md1

Thanks for your assistance. I was able to add them back and the recovery
went fine.

# more /proc/mdstat
Personalities : [raid1]
md5 : active raid1 sdb1[1] sda1[0]
3068288 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[0]
10241344 blocks [2/2] [UU]

md1 : active raid1 sdb3[1] sda3[0]
10241344 blocks [2/2] [UU]

md3 : active raid1 sdb6[1] sda6[0]
2048192 blocks [2/2] [UU]

md4 : active raid1 sdb7[1] sda7[0]
128592192 blocks [2/2] [UU]

md0 : active raid1 sdb5[1] sda5[0]
2096384 blocks [2/2] [UU]

unused devices: <none>

I appreciate your help.

Dan

It’s good to hear that everything worked out.

It could have turned out to be a dying drive, which is never much fun to fix in a raid.

Maybe a switch of distro?

My RAID1 had exactly the same problem (except it is an IDE setup). I followed the procedure to diagnose and fix it to the letter, and the drive is now rebuilding. Many thanks to WhatsHisName. I recently switched from Mandrake to Ubuntu, could that be the culprit?

Toby

hello everyone!!

need help, I'm running Red Hat WS

I have 2 SATA Disk (sda and sdb), but lately my sdb gone failed "faulty" as the output of /proc/mdstat..

so this what i did;

try to remove all the failed disk from the array:

#mdadm --manage /dev/md0 --remove /dev/sdb1
#mdadm --manage /dev/md1 --remove /dev/sdb2
#mdadm --manage /dev/md2 --remove /dev/sdb3
#mdadm --manage /dev/md3 --remove /dev/sdb5
#mdadm --manage /dev/md4 --remove /dev/sdb6
#mdadm --manage /dev/md5 --remove /dev/sdb7

i shutdown the machine to replace the bad disk with a new one, after i changed the hardisk. i copy the partion

#sfdisk -d /dev/sda | sfdisk /dev/sdb

then add back the sdbY devices in the array

#mdadm --manage /dev/md0 --add /dev/sdb1
#mdadm --manage /dev/md1 --add /dev/sdb2
#mdadm --manage /dev/md2 --add /dev/sdb3
#mdadm --manage /dev/md3 --add /dev/sdb5
#mdadm --manage /dev/md4 --add /dev/sdb6
#mdadm --manage /dev/md5 --add /dev/sdb7

i watch /proc/mdstat and see all are syncing well & the mirroring "[UU]" are all complete, i also try mdadm --query --detail /dev/md[0-5] and all tells "clean"..and it gives me a good night sleep, but the next morning my machine give a tons of error, after seeing the output mdstat some of my sdb fails again... what seems to be the problem??? i dont konw what to do next

thanks in advance guys!!

Degraded Software RAID Array - RAID 5 CentOS 5.0

Hi there,

I have read over this post and I have received similar e-mails... so here is the e-mail I receive:

Quote:

From root@localhost.localdomain Sun Jun 29 18:50:17 2008
Date: Sun, 29 Jun 2008 18:50:12 -0600
From: mdadm monitoring <root@localhost.localdomain>
To: root@localhost.localdomain
Subject: DegradedArray event on /dev/md1:sandbox

This is an automatically generated mail message from mdadm
running on sandbox

A DegradedArray event had been detected on md device /dev/md1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sda1[0] sdb1[1]
256896 blocks [2/2] [UU]
resync=DELAYED

md1 : active raid5 sdd1[4] sdc1[2] sdb2[1] sda2[0]
1464380160 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]
[>....................] recovery = 0.3% (1861128/488126720) finish=263.4min speed=30757K/sec

unused devices: <none>

Then I run this: mdadm -D /dev/md1

Quote:

[root@sandbox ~]# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Sun Jun 29 12:03:01 2008
Raid Level : raid5
Array Size : 1464380160 (1396.54 GiB 1499.53 GB)
Used Dev Size : 488126720 (465.51 GiB 499.84 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Mon Jun 30 09:33:08 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 256K

UUID : 292edaeb:b8233ecf:71255861:c3b16024
Events : 0.16188

Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1

So basically I am thinking the array looks fine, can someone please point out if I am wrong or missing something?

Thank you!!!