LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 08-02-2005, 06:09 PM   #1
Diablo3d
LQ Newbie
 
Registered: Aug 2005
Posts: 10

Rep: Reputation: 0
Software Raid, Bad sectors (drive DOA)


I've done a lot of googling and can't find any more than a "This should do that..." in hypothetical situations... Here is whats going on:

2x SATA Drives connected via Sil3114, running Software RAID-1 on kernel 2.6.9-11, centos 4.1 (similar to redhat 4, update 1)

One of the drives has visible bad sectors. That is, the drive's auto allocation is full and it is now returning "Sector unreadable/writable" to the driver, on certain parts of the disk. Basically, when the kernel is 20% of the way through rebuilding the mirror, I start getting "sense key read error" and "unrecovered read error" on the failed disk. The drive appears as SCSI.

Now, theoretically, according to threads on the subject in linux.kernel on usenet, the raid driver should fail the drive if it has trouble with the disk. What trouble and how much of it I haven't been able to discover. If I pull the drive while the system is running, the driver fails the disk after hanging up for a few seconds, which is acceptable. I guess it detects an unrecoverable error and appropriately fails the drive.

The problem is with these read errors, it isn't failing the drive. My message log is just filling up with "Cannot read sector 200001... cannot read sector 200002" for hours and presumably will continue until it reaches the last bad sector. Meanwhile the raid array is pretty much unusable by anything else. Presumably the system will stop serving web pages, etc in this state as well.

So I could just pull the drive completely, MD will fail the drive, and the system will continue on about its business.

The problem is, soon this server will go into production in a co-location facility, and response time under an hour to swap a hard drive will get pretty expensive. Ideally I am trying to find a solution that will let me tell the raid driver to "Fail drive after X consecutive soft errors."

The consensus seems to be that the MD driver does not handle visible bad sectors well.

Alternatively, something to the sil_sata or scsi subsystem that would tell it to disconnect the drive after consecutive read/write errors would be nice as well.

Theoretically, it would be difficult to have a cron job automatically check the logs for these types of messages and manually fail the drive as it seems to lock up the array completely when this happens. Or if I have the cron job run every 5 seconds, maybe it would stay in memory?

Any thoughts or suggestions on this would be greatly appreciated

Thx.
 
Old 08-02-2005, 06:41 PM   #2
ironwalker
Member
 
Registered: Feb 2003
Location: 1st hop-NYC/NewJersey shore,north....2nd hop-upstate....3rd hop-texas...4th hop-southdakota(sturgis)...5th hop-san diego.....6th hop-atlantic ocean! Final hop-resting in dreamland dreamwalking and meeting new people from past lives...gd' night.
Distribution: Siduction, the only way to do Debian Unstable
Posts: 506

Rep: Reputation: Disabled
fsck --fix-fixable
if its real bad it will tell you to use --rebuild-tree
I have reiserfs so I do reiserfsck --fix-fixable
drive must not be mounted for it to work.
 
Old 08-02-2005, 06:44 PM   #3
Diablo3d
LQ Newbie
 
Registered: Aug 2005
Posts: 10

Original Poster
Rep: Reputation: 0
Thanks for the reply ironwalker, but I'm not interested in fixing the bad sectors, or even any of the data (it is a fresh install)

I'm interested in getting the MD Driver to fail the drive automatically when the scsi/sil_sata driver reports bad sectors.

It does this properly when I remove the drive from the system completely, but the driver seems to ignore read/write errors.

Thx.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How does raid work when I drive goes bad? abefroman Linux - Hardware 1 09-11-2005 07:23 PM
Bad sectors tuxombie Linux - Hardware 3 01-29-2005 02:05 PM
new SDX-500 tape drive: DOA or me? bdp Linux - Hardware 2 11-07-2004 02:56 PM
How to I check a Linux formated hard drive for bad sectors NewtoLinuxWorld Linux - Hardware 1 02-27-2004 04:46 AM
Bad sectors blystovski Linux - Software 1 10-17-2001 01:18 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 01:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration