LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-13-2007, 02:58 AM   #1
jostmart
LQ Newbie
 
Registered: Jul 2006
Posts: 8

Rep: Reputation: Disabled
RAID-1 with mdadm. Disk fails sometime.


Hi all
I seem to have some kind of problem with my software raid. It's a raid-1 setup with mdadm. There's some kernel error and the partition with trouble is moved out of the array.


WARNING: Kernel Errors Present
Additional sense: Unrecovered read error - auto reallocat ...: 42 Time(s)
ata1.00: tag 0 cmd 0x25 Emask 0x9 stat 0x51 err 0x40 (media error) ...: 12 Time(s)
ata1.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x51 err 0x1 (device error) ...: 1 Time(s)
ata1.00: tag 0 cmd 0xc8 Emask 0x9 stat 0x51 err 0x40 (media error) ...: 350 Time(s)
end_request: I/O error, dev sda, sector ...: 42 Time(s)
raid1:md0: read error corrected (8 sec ...: 11 Time(s)
sd 0:0:0:0: SCSI error: return code = 0 ...: 42 Time(s)
sda: Current: sense key: Medium Error ...: 42 Time(s)

100 Time(s): SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB)
100 Time(s): SCSI device sda: drive cache: write back
363 Time(s): ata1.00: (BMDMA stat 0x20)
363 Time(s): ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
363 Time(s): ata1: EH complete
1 Time(s): printk: 1 messages suppressed.
1 Time(s): printk: 5 messages suppressed.
1 Time(s): raid1: sda1: redirecting sector 1491784 to another mirror
100 Time(s): sda: Mode Sense: 00 3a 00 00
100 Time(s): sda: Write Protect is off




Kernel: Linux helium 2.6.18-4-amd64 #1 SMP
 
Old 08-13-2007, 09:47 PM   #2
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
Did you have a question, or was this just informational that the kernel and RAID-1 are working properly?
 
Old 08-14-2007, 02:25 AM   #3
jostmart
LQ Newbie
 
Registered: Jul 2006
Posts: 8

Original Poster
Rep: Reputation: Disabled
The problem is that I don't know where the problem with the RAID are. If it's a faulty disk, or something in the kernel. So I need some guidance to how to diagnose.

Sorry for being unclear!
 
Old 08-14-2007, 11:02 AM   #4
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
The messages indicate /dev/sda1 has medium (disk surface) errors. Swap it out.
 
Old 08-15-2007, 03:13 AM   #5
jostmart
LQ Newbie
 
Registered: Jul 2006
Posts: 8

Original Poster
Rep: Reputation: Disabled
What in the messages indicates a surface error? I don't doubt that this is the case since i've had very many disks failing from the same manufacturer, in different machines. I'm just curious.

Are the problems unrecoverable or can something (like the kernel )'tag' the bad sectors to avoid using them?

Another thing i'm wondering about is why there is several weeks between the partitions are unmounted from the RAID. Maybe this has something to do with bad sector marking? One of the partitions i've had trouble with has been running without problems for about 3 weeks now since it hapened.
 
Old 08-15-2007, 05:13 AM   #6
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
Quote:
Originally Posted by jostmart View Post
What in the messages indicates a surface error? I don't doubt that this is the case since i've had very many disks failing from the same manufacturer, in different machines. I'm just curious.
"Medium Error"

The medium in a fixed disk is the disk surface.

Quote:
Are the problems unrecoverable or can something (like the kernel )'tag' the bad sectors to avoid using them?
You can try re-adding the drive to the array, and the kernel may be able to map round the problem. If not (too many errors), it will drop out again. You can repeat this process until it works or you get tired.

You can also try using the drive as an individual unit (in a workstation for example). To initialize and test/remap run (for example):

mke2fs -j -m 0 -c -c /dev/sda1

The '-c -c' performs a read/write test during the initialization to identify and map out bad sectors.

Quote:
Another thing i'm wondering about is why there is several weeks between the partitions are unmounted from the RAID. Maybe this has something to do with bad sector marking? One of the partitions i've had trouble with has been running without problems for about 3 weeks now since it hapened.
If you mean that you re-added the drive and it dropped out again several weeks later, that's just a function of when the damaged area is encountered.

In a production environment, folks usually just swap the drive and return it to the manufacturer for a replacement (if it's still under warranty). Or give them to employees (after wiping them) to play with.

They still have a useful life (though with reduced capacity). If you can't map out the area (it's too big), you can allocate the damaged area to a partition that you don't use. I've gotten several additional years use out of "bad" drives. Most people don't consider it worth their time to play with.
 
  


Reply

Tags
kernel, mdadm, raid



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Why can't I mount this md0 raid? (mdadm and software raid) cruiserparts Linux - Software 35 01-05-2013 03:35 PM
Major problem with software raid (mdadm) and disk failure norwolf Linux - Server 8 07-27-2007 06:14 AM
mdadm fails to assemble my RAID device tomhildebrand Fedora 6 06-28-2007 12:08 AM
How do you know when one disk in RAID fails? SubCreations Linux - Software 2 06-21-2005 11:39 AM
Software Raid Setup Ok - Reboot fails on disk failure test ikke Linux - General 2 05-11-2003 06:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 02:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration