Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
03-29-2006, 10:28 PM
|
#1
|
LQ Newbie
Registered: Mar 2006
Posts: 6
Rep:
|
Degraded Array on Software Raid
At 5am, I get this message from my Centos 4 Server:
"DegradedArray event had been detected on md device /dev/md1"
This is a web server and is configured with software raid 1 on two 160gb SATA drives
in hot swap bays.
Here is the errors reported in the log:
messages.1:Mar 22 05:03:02 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7
messages.1:Mar 22 05:03:02 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7
messages.1:Mar 22 05:03:32 ns0 kernel: ata1: command 0x25 timeout, stat 0xd0 host_stat 0x61
messages.1:Mar 22 05:03:32 ns0 kernel: ata1: status=0xd0 { Busy }
messages.1:Mar 22 05:03:32 ns0 kernel: SCSI error : <0 0 1 0> return code = 0x8000002
messages.1:Mar 22 05:03:32 ns0 kernel: Current sdb: sense key Aborted Command
messages.1:Mar 22 05:03:32 ns0 kernel: Additional sense: Scsi parity error
messages.1:Mar 22 05:03:32 ns0 kernel: end_request: I/O error, dev sdb, sector 312576567
messages.1:Mar 22 05:03:32 ns0 kernel: Buffer I/O error on device sdb7, logical block 128592192
messages.1:Mar 22 05:03:32 ns0 kernel: ATA: abnormal status 0xD0 on port 0x1F7
messages.1:Mar 22 05:04:02 ns0 kernel: ata1: command 0x25 timeout, stat 0xd0 host_stat 0x61
Raid on MD0 is still active:
more /proc/mdstat
Personalities : [raid1]
md5 : active raid1 sda1[0]
3068288 blocks [2/1] [U_]
md2 : active raid1 sda2[0]
10241344 blocks [2/1] [U_]
md1 : active raid1 sda3[0]
10241344 blocks [2/1] [U_]
md3 : active raid1 sda6[0]
2048192 blocks [2/1] [U_]
md4 : active raid1 sda7[0]
128592192 blocks [2/1] [U_]
md0 : active raid1 sdb5[1] sda5[0]
2096384 blocks [2/2] [UU]
unused devices: <none>
Questions: Is there any troubleshooting that I could (or should) do on this? Or replace the
drive that appears to be a problem?
If I am to replace the drive, how do I rebuild the array with minimal downtime to the server?
Thanks, in advance for your help.
Dan
|
|
|
03-30-2006, 12:05 AM
|
#2
|
Senior Member
Registered: Oct 2003
Location: /earth/usa/nj (UTC-5)
Distribution: RHEL, AltimaLinux, Rocky
Posts: 1,151
Rep:
|
Everything described below can be found in the mdadm man page.
**********
It looks like the sdb member of all raid1 pairs have failed, except for md0.
Have a look at md1 using --detail (-D):
Code:
# mdadm -D /dev/md1
Assuming that sdb3 is listed as “faulty”, try removing it and then adding it back:
Code:
# mdadm /dev/md1 -r /dev/sdb3 -a /dev/sdb3
# mdadm -D /dev/md1
If you see the “Rebuild Status:” percentage increasing, then things are looking up and you can try the same procedure on the other degraded raids.
If sdb3 reverts to faulty, then the drive may be defective. If you have the manufacturer’s diagnostic utility, then try testing the drive with it. If it fails that test, then it’s time to spend some money on a new drive.
|
|
|
03-30-2006, 07:56 PM
|
#3
|
LQ Newbie
Registered: Mar 2006
Posts: 6
Original Poster
Rep:
|
Thanks for your reply.
The command returns this:
/sbin/mdadm -D /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Nov 2 03:44:57 2005
Raid Level : raid1
Array Size : 10241344 (9.77 GiB 10.49 GB)
Device Size : 10241344 (9.77 GiB 10.49 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Thu Mar 30 18:05:31 2006
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 -1 removed
UUID : 1d16a55f:71fed86c:5a198cc1:1db31dfa
Events : 0.1147282
I am assuming that 'degraded' doesn't indicate "faulty" ?
|
|
|
03-30-2006, 11:43 PM
|
#4
|
Senior Member
Registered: Oct 2003
Location: /earth/usa/nj (UTC-5)
Distribution: RHEL, AltimaLinux, Rocky
Posts: 1,151
Rep:
|
It looks like sdb3 has already been removed. Just to verify that sdb3 is paired with sda3, run --examine (-E):
Code:
# mdadm -E /dev/sdb3
That should show the association of sdb3 with sda3 in md1. Assuming that to be true, then try adding sdb3 back to md1:
Code:
# mdadm /dev/md1 -a /dev/sdb3
# mdadm -D /dev/md1
|
|
|
04-01-2006, 08:22 AM
|
#5
|
LQ Newbie
Registered: Mar 2006
Posts: 6
Original Poster
Rep:
|
Thanks for your assistance. I was able to add them back and the recovery
went fine.
# more /proc/mdstat
Personalities : [raid1]
md5 : active raid1 sdb1[1] sda1[0]
3068288 blocks [2/2] [UU]
md2 : active raid1 sdb2[1] sda2[0]
10241344 blocks [2/2] [UU]
md1 : active raid1 sdb3[1] sda3[0]
10241344 blocks [2/2] [UU]
md3 : active raid1 sdb6[1] sda6[0]
2048192 blocks [2/2] [UU]
md4 : active raid1 sdb7[1] sda7[0]
128592192 blocks [2/2] [UU]
md0 : active raid1 sdb5[1] sda5[0]
2096384 blocks [2/2] [UU]
unused devices: <none>
I appreciate your help.
Dan
Last edited by pcinfo-az; 04-01-2006 at 08:24 AM.
|
|
|
04-01-2006, 07:41 PM
|
#6
|
Senior Member
Registered: Oct 2003
Location: /earth/usa/nj (UTC-5)
Distribution: RHEL, AltimaLinux, Rocky
Posts: 1,151
Rep:
|
It’s good to hear that everything worked out.
It could have turned out to be a dying drive, which is never much fun to fix in a raid.
|
|
|
10-07-2006, 04:57 AM
|
#7
|
LQ Newbie
Registered: Oct 2006
Posts: 1
Rep:
|
Maybe a switch of distro?
My RAID1 had exactly the same problem (except it is an IDE setup). I followed the procedure to diagnose and fix it to the letter, and the drive is now rebuilding. Many thanks to WhatsHisName. I recently switched from Mandrake to Ubuntu, could that be the culprit?
Toby
|
|
|
01-17-2008, 10:03 AM
|
#8
|
LQ Newbie
Registered: Jan 2008
Posts: 1
Rep:
|
hello everyone!!
need help, I'm running Red Hat WS
I have 2 SATA Disk (sda and sdb), but lately my sdb gone failed "faulty" as the output of /proc/mdstat..
so this what i did;
try to remove all the failed disk from the array:
#mdadm --manage /dev/md0 --remove /dev/sdb1
#mdadm --manage /dev/md1 --remove /dev/sdb2
#mdadm --manage /dev/md2 --remove /dev/sdb3
#mdadm --manage /dev/md3 --remove /dev/sdb5
#mdadm --manage /dev/md4 --remove /dev/sdb6
#mdadm --manage /dev/md5 --remove /dev/sdb7
i shutdown the machine to replace the bad disk with a new one, after i changed the hardisk. i copy the partion
#sfdisk -d /dev/sda | sfdisk /dev/sdb
then add back the sdbY devices in the array
#mdadm --manage /dev/md0 --add /dev/sdb1
#mdadm --manage /dev/md1 --add /dev/sdb2
#mdadm --manage /dev/md2 --add /dev/sdb3
#mdadm --manage /dev/md3 --add /dev/sdb5
#mdadm --manage /dev/md4 --add /dev/sdb6
#mdadm --manage /dev/md5 --add /dev/sdb7
i watch /proc/mdstat and see all are syncing well & the mirroring "[UU]" are all complete, i also try mdadm --query --detail /dev/md[0-5] and all tells "clean"..and it gives me a good night sleep, but the next morning my machine give a tons of error, after seeing the output mdstat some of my sdb fails again... what seems to be the problem??? i dont konw what to do next
thanks in advance guys!!
|
|
|
07-03-2008, 10:43 AM
|
#9
|
Member
Registered: Sep 2006
Location: Canada, Alberta
Distribution: RHEL 4 and up, CentOS 5.x, Fedora Core 5 and up, Ubuntu 8 and up
Posts: 251
Rep:
|
Degraded Software RAID Array - RAID 5 CentOS 5.0
Hi there,
I have read over this post and I have received similar e-mails... so here is the e-mail I receive:
Quote:
From root@localhost.localdomain Sun Jun 29 18:50:17 2008
Date: Sun, 29 Jun 2008 18:50:12 -0600
From: mdadm monitoring <root@localhost.localdomain>
To: root@localhost.localdomain
Subject: DegradedArray event on /dev/md1:sandbox
This is an automatically generated mail message from mdadm
running on sandbox
A DegradedArray event had been detected on md device /dev/md1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sda1[0] sdb1[1]
256896 blocks [2/2] [UU]
resync=DELAYED
md1 : active raid5 sdd1[4] sdc1[2] sdb2[1] sda2[0]
1464380160 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]
[>....................] recovery = 0.3% (1861128/488126720) finish=263.4min speed=30757K/sec
unused devices: <none>
|
Then I run this: mdadm -D /dev/md1
Quote:
[root@sandbox ~]# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Sun Jun 29 12:03:01 2008
Raid Level : raid5
Array Size : 1464380160 (1396.54 GiB 1499.53 GB)
Used Dev Size : 488126720 (465.51 GiB 499.84 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Mon Jun 30 09:33:08 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 256K
UUID : 292edaeb:b8233ecf:71255861:c3b16024
Events : 0.16188
Number Major Minor RaidDevice State
0 8 2 0 active sync /dev/sda2
1 8 18 1 active sync /dev/sdb2
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
|
So basically I am thinking the array looks fine, can someone please point out if I am wrong or missing something?
Thank you!!!
|
|
|
All times are GMT -5. The time now is 04:08 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|