Cronjob 02-21-2006 07:44 PM

Replacing dead drive in degraded RAID-5.
My server is about 1500 miles from me, so I need to make sure I get everything correct before I start. Since I have never had to deal with a faulty RAID array before, this is completely new to me and I am looking for some guidance.

My server (1U rackmount, dual 1.5ghz AMD Athlong XP with 4port 3Ware Escalade running Debian) reported some errors in syslog yesterday.

kernel: 3w-xxxx: scsi0: AEN: ATA port timeout: Port #1.
kernel: 3w-xxxx: scsi0: AEN: Unit degraded: Unit #0.

The 3DMD daemon was down (I presume this is what happens when the array is degraded?) so I started it up. It would only stay up for a few moments at a time. I entered the web interface and checked the ALERT section, which reported the following:

ERROR: Disk Array Unit 0 on controller ID:0 is degraded and no longer fault tolerant. Check log for drive errors. (0x2)

WARNING: Drive timeout encountered on port 1 on controller ID:0. Check cables and drives for media errors. (0x9)

On the 3DMD interface's DETAILS page, it stated that "subunit 1 logical drive status" was 'FAILED' but that PORT 1 was "OK".

I went into the CONFIGURATION page and removed the failed drive from the array. I then reselected it to add it back into the array and after a pause, it refused.

"Subunit 1" now states:

Physical drive number: Drive Inaccessible (0xFF)

So, it would appear to me that this drive is toast and needs to be replaced and then I need to rebuild the array.

My questions are:

+ Am I correct or is there another process I should complete before determining this?

+ With the 3ware 3dmd program/interface, do I just need to ship this drive to my colo personel and have then shutdown the server, pull out the bad drive, stick in the good drive, start the server back up and then call me so I can login and run "add drive" and "rebuild array" from the web interface?

+ I could not order an 80GB drive like this as quickly as I could just UPS one that I already have on hand - but it was previously formatted for NTFS. Do I need to reformat this drive before adding it to the machine and rebuilding the array or will 3dmd do all of that itself? I don't want to send this to my colo guy and have him pop it in and not work (or worse, destroy things).

Any help would be greatly appreciated. I run a non-profit hobby-esque service for about 60,000 people and my members would sure appreciate me not screwing things up.

Thank you.

leandean 02-22-2006 11:02 PM

Sounds like the drive is bad, with the caveat that I've never had the daemon die. Everything on the 'new' drive will get overwritten but it's not wise to tempt fate so I would wipe the drive before shipping it.

Oops. Forgot to ask if your server supports hot swap. If that's the case your colo will not have to shut the machine down.

