LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 11-28-2007, 06:40 PM   #1
testnbbuser
LQ Newbie
 
Registered: Nov 2007
Posts: 14

Rep: Reputation: 0
Physically detect a failed hard drive in a software RAID 5 array


Hi,

I am running a software RAID 5 array with 4 SATA 3.0 Drives (1TB total) on Debian (kernel 2.6.12). My RAID works ok but I am testing a method to mark and detect a failed drive. This is the use case:

I externally marked each of the 4 drives in the RAID with a sticky label as /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

Then I want to be able to build my array according to the letters that I assigned. mdadm names them randomly what make it impossible for me to identify the unit as I want.

My purpose is the following: In the case that a unit fails, I want to be able to externally identify it. That way I just substitute the drive for a new one and it is fixed.

So far, every time I reconstruct the array, the letters assigned to the units are random, never the same.

Any help would be appreciated
Thanks in advance
 
Old 11-28-2007, 06:51 PM   #2
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
It doesn't seem right that the /dev/sda# designations for the physical drives would change. Although I suppose that if this information is read in from the arrays persistent super block that may be possible. Try labeling them ( a marker may be better then a sticky. A sticky might fall off. ) before creating the array. Look for a unique device serial or uuid number and associate it with a particular /dev/sda#. A device UUID should not change even if you move it. ( note, the file system uuid might however. ) Does the kernel print out such a unique identifier for the drive when it is detected?

As a last resort, add the drives one by one before constructing the array.

You can use "mdadmin --examine <partition>" it will print out the uuid number for the partition. Maybe using a labeler with a sticker showing both the device and uuid info would be a better idea. Even if the device changed, the uuid info wouldn't, and you could replace the drive with the particular uuid number on the sticker.

Last edited by jschiwal; 11-28-2007 at 06:59 PM.
 
Old 12-20-2007, 01:47 PM   #3
testnbbuser
LQ Newbie
 
Registered: Nov 2007
Posts: 14

Original Poster
Rep: Reputation: 0
Physically detect a failed hard drive in a software RAID 5 array

Hi,
sorry for my late answer, I was on vacation.

I thought about generating a UUID for each unit before adding it to the array, but the problem is that when you create the array the UUID is generated for the whole filesystem (that is md0 or whatever), and then when you use mdadm --examine /dev/sdb2 it returns just the UUID of the whole array. It always return the same details for mdadm --examine /dev/sda2, /dev/sdb2, /dev/sdc2 ... the only part that changes is the Checksum : fbc75c3a (and it changes everytime you modify the partition table of the drive)

I am also trying to use Smartmontools (http://smartmontools.sourceforge.net/) to read the information of each unit in the array, but looks like it does not work with SATA drives that form part of an array.

Thanks for the help!
 
Old 12-21-2007, 05:10 PM   #4
J_BOO
LQ Newbie
 
Registered: Dec 2007
Posts: 6

Rep: Reputation: 0
Seems odd to me as well that the devices would be changing unless they are getting discovered in a different order on every boot. I have several systems with Raid 5 setup and have never seen this happen. In my case the disk hooked into the sata card at port 0 is sda, port 1 sdb, and so on. Smartctl should work for detecting a failed SATA drive. Smartctl should provide the serial number of a drive, of course a failed drive might not return anything which means processes of elimination using the good drives and matching serial numbers.

Again it seems odd that the devices are changing. When sdb1 fails it means the drive on port 2 in my systems.

# smartctl -a /dev/sda
smartctl version 5.33 [x86_64-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA WDC WD5000ABYS-0 Version: 1C01
Serial number: WD-WCAPW2036159
Device type: disk
Local Time is: Fri Dec 21 16:57:20 2007 UTC
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK


<<<CUT>>>>>>>>>>
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How do physically identify a failed RAID disk? horde Linux - General 5 02-17-2008 04:23 AM
LXer: Replacing A Failed Hard Drive In A Software RAID1 Array LXer Syndicated Linux News 0 01-30-2007 12:33 PM
hard drive causing trouble with software raid array machs_fuel Linux - Hardware 2 07-15-2006 02:45 PM
adding a hard drive to an existing software raid array iammisc Linux - Hardware 3 03-01-2006 06:08 PM
Mandrake 10.0 install won't detect my Hard Drive on RAID card abbasakhtar Linux - Newbie 1 07-04-2004 06:34 AM


All times are GMT -5. The time now is 12:57 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration