LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-02-2011, 09:18 AM   #1
misterspookey
LQ Newbie
 
Registered: Dec 2011
Posts: 4

Rep: Reputation: Disabled
Mirror partition keeps failing


Hi,

I am trying to figure out while only one parition of a software RAID1 keeps failing. If the disk was bad, I would assume that both md0 and md1 partitions would fail. The disks are different sizes, I'm not sure if that is causing an issue, or if its the SATA cable, SATA port, or if the drive is in fact going bad, or the disks need to be the same size for everything to work happily. It fails about once a week now, rebuilds OK, then fails again.

sda is 1.5TB, sdb is 1.0TB. I've just left the .5 TB on sda unused. All my data was on the 1 TB drive from a previous failure and no 1 TB's were readily available, so sdb is an older drive at this point.

Personalities : [raid1]
md0 : active raid1 sdb1[0] sda1[1]
1020032 blocks [2/2] [UU]

md1 : active raid1 sdb3[2](F) sda3[1]
967353856 blocks [2/1] [_U]

unused devices: <none>
 
Old 12-04-2011, 07:20 PM   #2
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
What is the failure in /var/log/messages? Are you getting drive errors when the array has issues? Does SMART report anything in prefail state?
 
Old 12-05-2011, 10:10 AM   #3
misterspookey
LQ Newbie
 
Registered: Dec 2011
Posts: 4

Original Poster
Rep: Reputation: Disabled
Date: Mon, 28 Nov 2011 14:38:06 -0500
From: root <root@alien.xxx>
To: root@alien.xxx
Subject: SMART error (FailedHealthCheck) detected on host: alien.xxx

This email was generated by the smartd daemon running on:

host name: alien
DNS domain: xxx
NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/sdb, not capable of SMART self-check

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.

-=-=-=-=-=-=-=-

Date: Mon, 28 Nov 2011 14:50:02 -0500
From: mdadm monitoring <root@alien.xxx>
To: root@alien.xxx
Subject: DegradedArray event on /dev/md1:alien.xxx

This is an automatically generated mail message from mdadm
running on alien.xxx

A DegradedArray event had been detected on md device /dev/md1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md0 : active raid1 sdb1[0] sda1[1]
1020032 blocks [2/2] [UU]

md1 : active raid1 sda3[1]
967353856 blocks [2/1] [_U]

unused devices: <none>

-=-=-=-=-=-=-=-=-=-=-=-=-=-

messages.1:Nov 28 11:41:45 alien kernel: disk 0, wo:0, o:1, dev:sdb1
messages.1:Nov 28 14:30:26 alien kernel: dhfis 0x0 dmafis 0x0 sdbfis 0x0
messages.1:Nov 28 14:30:26 alien kernel: ata4: tag : dhfis dmafis sdbfis sacitve
messages.1:Nov 28 14:37:05 alien kernel: dhfis 0x0 dmafis 0x0 sdbfis 0x0
messages.1:Nov 28 14:37:05 alien kernel: ata4: tag : dhfis dmafis sdbfis sacitve
messages.1:Nov 28 14:37:05 alien kernel: dhfis 0x0 dmafis 0x0 sdbfis 0x0
messages.1:Nov 28 14:37:05 alien kernel: ata4: tag : dhfis dmafis sdbfis sacitve
messages.1:Nov 28 14:37:06 alien kernel: dhfis 0x0 dmafis 0x0 sdbfis 0x0
messages.1:Nov 28 14:37:06 alien kernel: ata4: tag : dhfis dmafis sdbfis sacitve
messages.1:Nov 28 14:37:06 alien kernel: sdb: Current [descriptor]: sense key: Aborted Command
messages.1:Nov 28 14:37:06 alien kernel: end_request: I/O error, dev sdb, sector 1953519827
messages.1:Nov 28 14:37:06 alien kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB)
messages.1:Nov 28 14:37:06 alien kernel: sdb: Write Protect is off
messages.1:Nov 28 14:37:06 alien kernel: SCSI device sdb: drive cache: write back


The I/O and write back messages appear over and over, then on reboot they stop when the disk no longer appears in 'cat /proc/mdstat'


[root@alien ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[1]
1020032 blocks [2/1] [_U]

md1 : active raid1 sda3[1]
967353856 blocks [2/1] [_U]

unused devices: <none>
 
Old 12-05-2011, 01:44 PM   #4
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
So the drive is hardware failed/failing. Replace it.
 
Old 12-05-2011, 01:58 PM   #5
misterspookey
LQ Newbie
 
Registered: Dec 2011
Posts: 4

Original Poster
Rep: Reputation: Disabled
Surely I understand that resolution, BUT......if it's a hardware problem why does only 1 partition go down? Why would /dev/sdb3 be down and /dev/sdb1 be up? Granted eventually sdb1 failed, but why not at the same time? Use? Stress?
 
Old 12-05-2011, 02:06 PM   #6
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 344Reputation: 344Reputation: 344Reputation: 344
The drive is reporting a failure in the area relating to sdb3. For example, a sector can no longer be written, but the spare zone is depleted, or the actuator will no longer seek to the area because of a physical binding, or the on-drive cache memory map associated with the area has permanent bit errors. There are many reasons for a portion of a drive to become unusable, but the end result is the same; you need to replace the drive.
 
1 members found this post helpful.
Old 12-06-2011, 12:55 PM   #7
misterspookey
LQ Newbie
 
Registered: Dec 2011
Posts: 4

Original Poster
Rep: Reputation: Disabled
macemonata, awesome information. thanks for your replies. it was very helpful.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] I need help with an IBM 326m Server, Debian install. Failing to connect to mirror ComputerNinja Linux - Newbie 1 05-19-2011 07:50 AM
software raid (mirror); mount partition jtag Linux - Software 4 05-08-2011 07:09 PM
How to Mirror the Boot Partition on Existing Raid1 thewird Linux - Newbie 12 02-27-2011 11:42 PM
Setting up 4 drives. 2 as 1 partition and other 2 as RAID mirror driv-crazy Linux - Hardware 6 11-13-2008 04:21 PM
Making mirror copy of a partition with CP command? brgr88 Linux - Software 3 05-15-2006 12:09 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 12:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration