LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 06-11-2007, 07:07 PM   #1
horde
LQ Newbie
 
Registered: Jan 2005
Posts: 24

Rep: Reputation: 0
How do physically identify a failed RAID disk?


Hi. Sorry if this isn't the right forum but it didn't seem to fit any of the others particularly well.

Let me state up front that I dont currently have a problem.

However, I have a RAID-5 set up with 4 SATA-II drives and am preparing for disaster (ok perhaps not but at least I'm researching it).

In the event that a drive in the array fails - say SDC1 - how do I go about finding out which is the defective physical unit. The SDA1, SDB1 etc dont ssem to match the motherboard SATA1 etc labels. My concern is that I didn't add the drives one at a time, determining which was which and marking them as they went in so I'm in the position that I have 4 drives in but dont know which is which.

I also want to be careful of the array now that it has been set up. What would happen if I progressively pulled the power cables on the drives? But I assume it will mark the drive as faulty and discard it so I'd have to readd it as spare and then resync it. This will be time consuming.

Wouls it be as simple as taking the raid array out of FSTAB, rebooting with a drive off and checking the messages in the log to determine which one was missing? Do SATA drives reassign their letters on boot or are they fixed dependent on the port that they are plugged in to?

Your advice would be much apprecaited.

Thanks
 
Old 06-13-2007, 06:07 AM   #2
Simon Bridge
LQ Guru
 
Registered: Oct 2003
Location: Waiheke NZ
Distribution: Ubuntu
Posts: 9,211

Rep: Reputation: 198Reputation: 198
These days they are all sd-something (note, lower case), starting from a and working up. In general, pata drives come before sata drives and the order is set based on the BIOS order of the drives. You can almost pre-assign the drive's block-special-device letter by care in attaching and jumpering the physical drives. However, the drive letters can move around if care has not been taken.
 
Old 06-18-2007, 05:33 AM   #3
horde
LQ Newbie
 
Registered: Jan 2005
Posts: 24

Original Poster
Rep: Reputation: 0
Thanks for that Simon .... unfortunately pretty much as I expected. No easy way to identify the offending unit.
 
Old 06-19-2007, 09:07 AM   #4
Simon Bridge
LQ Guru
 
Registered: Oct 2003
Location: Waiheke NZ
Distribution: Ubuntu
Posts: 9,211

Rep: Reputation: 198Reputation: 198
It is truly tricky, which is why the tutorials all advocate care in the setup. You'll just have to pull the drives one at a time and see which one goes missing. You can, of course, use the syslog or a hardware manager to match the block special device to the physical drive, then read the drive label. The syslog should also tell you which drive goes wrong when that happens.

There is good reason to be confused about drives and dvice names... my system is just the one drive these days, nothing special, jumpered to master and plugged into IDE0 which should make it hda or sda. While /etc/fstab, indeed, lists /dev/sda1 etc, fdisk -l shows only sde*! Listing /dev shows devices for sde and not for sda... and sda is not a link. So how does this work... <sigh>.
 
Old 06-22-2007, 04:22 AM   #5
RedHatCat
Member
 
Registered: Jun 2005
Location: London, Uk
Distribution: RH-ES 3/4, FC 5/6
Posts: 51

Rep: Reputation: 15
If the drives are in a hot-swap bay the light on the front will usually indicate which has failed, the intel SCSI/SATA ones I have used have a solid orange light upon failure - this is the easy way at least. If there is no hot-swap bay, and you can reboot the server, you might be able to find the serial number of the failed drive in the RAID bios of your controller - then pop the box open, check the serials & pull out the appropriate disk.

I often find useful information about the raid status can be found in /proc somewhere, dependant on the driver. For example doing a cat /proc/scsi/gdth/0 would display the raid status & drive serials/models (including whether it had failed) on a build I did a while back, where gdth was the driver it used for the raid controller I think. I used this to find a failed drive in a stack of 1U boxes, where identifying the server by the raid alarm was basically impossible.

Hope that helped in some way,

Jim
 
Old 02-17-2008, 04:23 AM   #6
horde
LQ Newbie
 
Registered: Jan 2005
Posts: 24

Original Poster
Rep: Reputation: 0
How to Identify failed SATA RAID device

OK - got a failure so here's what I've been doing:

I'm sure there are better solutions and I will probably take them up if I can find them on the net.

Essentially I do:

mdadm --detail /dev/md0

and for each of the devices listed I do:

hdparm -I <device> | grep "Serial Number:"

I scripted it and on a regular basis (once a day) I run the following perl code which gets emailed to my client machine (if I was more organised I suppose I only need to do this once every time I add hardware to the array and keep the output safe somewhere ) - though I am still unsure of the way the sda1's etc are dolled out and am not 100% positive they dont depend on some response from the drives - in which case I suppose they could vary on each reboot):

#!/usr/bin/perl
# List out all HDD serial numbers of disks in RAID array
#
# The trailing pipe "|" directs command output
# into our program:

$process = "yes";
if (! open (ListDevPIPE,"mdadm --detail \/dev\/md0 |")) {
die "Can't run ls! $!\n";
}

while (<ListDevPIPE>) {
chomp $_ ;

$lin = $_ ;
$linein = ltrim($_);

if ( trim($linein) eq "") {
next;
}

if ( $linein =~ /active sync/ ) {

@devinfo = split(/ +/,$linein);

$SerialLine = `hdparm -I $devinfo[6] | grep "Serial Number:"`;
chomp $SerialLine;

@serialinfo = split(/\s+/,$SerialLine);

print "$lin Serial Number : $serialinfo[3]\n";
}
else {
print "$lin\n";
}

}

sub ltrim() {
my $string = shift;
$string =~ s/^\s+//;
return $string;
}

Output will look like this for a failure:

/dev/md0:
Version : 00.90.03
Creation Time : Mon Jun 11 03:41:55 2007
Raid Level : raid5
Array Size : 1250242048 (1192.32 GiB 1280.25 GB)
Device Size : 312560512 (298.08 GiB 320.06 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun Feb 17 17:31:57 2008
State : clean, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
UUID : d9f81e55:2fe5e5fb:f8139d5b:a6e55cd4
Events : 0.454424
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1 Serial Number : 5QF0YT6C
1 8 17 1 active sync /dev/sdb1 Serial Number : 5QF03P11
2 0 0 2 removed
3 8 49 3 active sync /dev/sdd1 Serial Number : 9QF49ERL
4 3 65 4 active sync /dev/hdb1 Serial Number : 5QF4S9J4

On my system those serial numbers match the external serial numbers printed on the drives ..... so it is relatively easy to identify the failed drive - using an old listing you can see which one is now missing.

Alternatively, once a failure occurs you could do the same and then pull the drives looking for the one not listed.

Once removed (in OpenSuse anyway) take out the dead drive, put in the new one, partition it as Linux Raid (a bit more effort if they aren't the same size). Then "mdadm /dev/md0 -a /dev/sdc1" and away goes the rebuild - very easy once you've figured out the failed drive.
 
  


Reply

Tags
raid


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Software RAID 5 crash and wrongful failed disk flagged Cairan Linux - Server 0 12-18-2006 05:14 AM
Ubuntu doesn't identify my hard disk correctly zupidupi Linux - Hardware 9 10-16-2006 03:39 PM
Looking to get RAID card for server, but how to identify true hardware card? Swakoo Linux - Hardware 2 08-19-2006 11:30 AM
how to identify a SCSI disk? anuode Solaris / OpenSolaris 1 12-16-2005 01:10 AM
Software Raid - recreate failed disk. FragInHell Red Hat 5 11-25-2004 04:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 12:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration