LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Reply
 
LinkBack Search this Thread
Old 06-05-2008, 08:47 PM   #1
apomatix
LQ Newbie
 
Registered: Jun 2008
Location: Near Boston, MA, USA
Distribution: SuSE
Posts: 7

Rep: Reputation: 0
RAID1 array rebuild fails at 99.9% recovery


I am running SuSE 10.1 with kernel 2.6.16.13-4-smp.

I have 4 SCSI drives. /dev/sda and /dev/sdb are partitioned and RAID1-arrayed into /dev/md0 /dev/md1 /dev/dm2 and /dev/dm3. /dev/sdc and /dev/sdd only have 1 partition each and form /dev/md4.

For some reason I don't understand /dev/sdd and /dev/sdb are not actually in the arrays. The system works fine like this but I want to have mirroring for redundancy. Here is /proc/mdstat:

Code:
Personalities : [raid1]
md4 : active raid1 sdd1[2](F) sdc1[0]
      312560512 blocks [2/1] [U_]
      [===================>.]  recovery = 99.9% (312559296/312560512) finish=0.0min speed=17772K/sec

md3 : active raid1 sda5[0]
      285193792 blocks [2/1] [U_]

md0 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]

md2 : active raid1 sda3[0]
      20972736 blocks [2/1] [U_]

md1 : active raid1 sda2[0]
      6297408 blocks [2/1] [U_]

unused devices: <none>
As you can see, I have tried to add /dev/sdd. As the recovery progressed there were no problems, but once it got to 99.9% it froze up. Now any operation involving /dev/md4 just hangs. This includes any file access or mdadm-related query. In particular removing it from the array does not work because it says the device is busy. Additionally the computer freezes randomly for a few seconds every minute, which did not happen before I "added" /dev/sdd to /dev/md4.

If I try to reboot with shutdown -r the computer hangs, I think maybe when it is trying to unmount /dev/md4. I then have to hit the power button or the reset button. It reboots OK and even runs OK for a few minutes while it tries to recover the array. Once it gets to 99.9% recovered, though, the hanging starts all over again. The only way to break the cycle is to unplug the hard drive. Then the computer runs great again, with no hanging, except that I am back where I started, with no mirroring.

I checked /var/log/messages and see error messages such as the following:

Code:
Jun  5 22:05:53 innateimmunity kernel: ata4: command 0x35 timeout, stat 0xd0 host_stat 0x21
Jun  5 22:05:53 innateimmunity kernel: ata4: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
Jun  5 22:05:53 innateimmunity kernel: ata4: status=0xd0 { Busy }
Jun  5 22:05:53 innateimmunity kernel: sd 3:0:0:0: SCSI error: return code = 0x8000002
Jun  5 22:05:53 innateimmunity kernel: sdd: Current: sense key: Aborted Command
Jun  5 22:05:53 innateimmunity kernel:     Additional sense: Scsi parity error
Jun  5 22:05:53 innateimmunity kernel: end_request: I/O error, dev sdd, sector 191836447
They appear roughly once per minute, with the sector number is increasing by 8 each minute. If I reboot and recover to 99.9% the same thing happens, but the sector number might be completely different. I have no idea what these messages mean.

/dev/sdb appears to suffer from exactly the same problem.

I have tried replacing the hard drive but this doesn't help. I also ran SeaTools on both /dev/sdb and /dev/sdd and both drives passed the LONG TEST. So I don't think there is anything physically wrong with the drives.

I would greatly appreciate anyone's thoughts on how to fix this!

Last edited by apomatix; 06-05-2008 at 09:01 PM. Reason: add detail about system log messages
 
Old 06-05-2008, 09:43 PM   #2
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
I got this reference from another RAID problem at this web site. It pays to search for similar problems before you post a question.

http://www.howtoforge.com/replacing_..._a_raid1_array
 
Old 06-05-2008, 10:46 PM   #3
apomatix
LQ Newbie
 
Registered: Jun 2008
Location: Near Boston, MA, USA
Distribution: SuSE
Posts: 7

Original Poster
Rep: Reputation: 0
That is a great website and it shows all the steps in detail. Thank you for posting it. I actually followed that exact site when I replaced the drive. Unfortunately, the last step in the process, rebuilding the array, does not work on my particular machine.
 
Old 06-06-2008, 06:30 AM   #4
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
I'm wondering if there is a problem with the disk driver in Linux. I've had some trouble something like yours but with encrypting disk partitions instead of RAID. I sometimes have trouble near the end of the encryption process where the process hangs. Eventually as other processes try to access the disk they all hang. The difference is that this only happens on some disks. Changing to another disk will work around the problem.

You might be able to find some information at http://kerneltrap.org. That site shows a lot of the behind-the-scenes communications between Linux developers on numerous issues. I haven't searched there for disk i/o problems yet.

I'm surprised that nobody else has had any information to contribute to this thread. That suggests that this problem is not widely experienced or not widely understood. Too bad for us.
 
  


Reply

Tags
raid1, recovery


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
raid1 - recovery after crash proNick Linux - Newbie 8 01-21-2008 04:37 PM
Can't RaidStart /dev/md3 in Raid1 configuration during attempted rebuild. RamonetB Red Hat 0 04-16-2007 06:32 PM
Recover/Rebuild SW Raid1? FoxNotch Linux - General 1 07-11-2004 10:04 AM
How to rebuild raid1 on RedHat 9.0 galaxyms Linux - Software 0 10-23-2003 09:38 AM
Soft RAID1 (mirror) rebuild GAVollink Linux - Hardware 4 04-25-2003 08:18 AM


All times are GMT -5. The time now is 12:00 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration