LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 04-04-2007, 08:04 PM   #1
Ossah
LQ Newbie
 
Registered: Mar 2005
Location: Morristown, NJ USA
Distribution: openSuse 10.2 / Suse 9.3
Posts: 23

Rep: Reputation: 15
Unhappy Softraid 5 messed up. Pls help to recover


Ok folks, please help me out of my misery!

Subject: softraid 5 /dev/md1, 4 disks, none spare - managed by mdadm

- hdc4 was marked as fault. That happen before but I always just resycnd that one. The hdd was ok, there only seem to be a loss on communication from time to time

- removed hdc4 from the array

- added hdc4 to the array

- some dma_intr: error=0x40 occured during the sync

- hdc4 was marked as faulty

- hdg1 automatically set to spared disk <--- this was no spare disk

- 2/4 active disk in RAID 5 -> not working

I have no idea about how to recover from this error. The data actually should be somehow consistent over the 4 / at least the 3 devices except hdc4.

Any help is highly appreciated!!! (I currently have no physical access to the computer, unfortunately)


SYSLOG:
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=298161986, high=17, low=1294931
4, sector=298161983
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 298161983
md: md1: sync done.

... some more I/O ERRORS

Buffer I/O error on device md1, logical block 1064
lost page write due to I/O error on md1
Aborting journal on device md1.
RAID5 conf printout:
--- rd:4 wd:2 fd:2
disk 0, o:1, dev:hdg1
disk 1, o:1, dev:hde1
disk 2, o:1, dev:hdb1
disk 3, o:0, dev:hdc4
journal commit I/O error
ext3_abort called.
EXT3-fs error (device md1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
RAID5 conf printout:
--- rd:4 wd:2 fd:2
disk 1, o:1, dev:hde1
disk 2, o:1, dev:hdb1
disk 3, o:0, dev:hdc4
RAID5 conf printout:
--- rd:4 wd:2 fd:2
disk 1, o:1, dev:hde1
disk 2, o:1, dev:hdb1
disk 3, o:0, dev:hdc4
RAID5 conf printout:
--- rd:4 wd:2 fd:2
disk 1, o:1, dev:hde1
disk 2, o:1, dev:hdb1
Buffer I/O error on device md1, logical block 22
lost page write due to I/O error on md1


MORE INFO:

already messed up...

venus:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid5 hdg1[4] hde1[1] hdc4[5](F) hdb1[2]
360182208 blocks level 5, 32k chunk, algorithm 2 [4/2] [_UU_]


unused devices: <none>
venus:~# mdadm --remove /dev/md1 /dev/hdc4


venus:~# mdadm --misc --examine /dev/md1
mdadm: Cannot read superblock on /dev/md1
venus:~# mdadm --misc --detail /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Sun Feb 20 20:52:22 2005
Raid Level : raid5
Array Size : 360182208 (343.50 GiB 368.83 GB)
Device Size : 120060736 (114.50 GiB 122.94 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Thu Apr 5 02:35:15 2007
State : clean, degraded
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 32K

UUID : faebbe8d:a84d956c:e764d088:1d6be888
Events : 0.3903270

Number Major Minor RaidDevice State
0 0 0 - removed
1 33 1 1 active sync /dev/hde1
2 3 65 2 active sync /dev/hdb1
3 0 0 - removed

4 34 1 - spare /dev/hdg1
 
Old 04-04-2007, 08:42 PM   #2
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Rep: Reputation: 45
Unfortunately if more than 1 drive breaks in a RAID5, you have to revert to your backups and replace the faulty drives. You might be able to get some files back, but I wouldn't be surprised if they were corrupt or incomplete. Check your hard drive and make sure there is nothing wrong with them. Are they S.M.A.R.T. enabled? You might be able to run some SMART tests remotely.

Just curious but what distro are you using? I had a hell of a bad time with Ubuntu and RAID5. Drives kept failing when I knew for a fact that my drives were good. I later dumped Ubuntu and went to CentOS and my RAID5 has been running flawlessly. I don't trust Ubuntu anymore. I also had another issue with Ubuntu, but I'll save that for later.

Last edited by Micro420; 04-04-2007 at 08:55 PM.
 
Old 04-04-2007, 09:10 PM   #3
Ossah
LQ Newbie
 
Registered: Mar 2005
Location: Morristown, NJ USA
Distribution: openSuse 10.2 / Suse 9.3
Posts: 23

Original Poster
Rep: Reputation: 15
Thanks for your reply Micro420.

Distro: Debian Sarge

I don't think it's distro related, always the same damn device.

Backup does not sound good, I'm abroad for a couple of month now and there was no back up since

Correct me if I'm wrong:
The moment hdg1 was marked as spare disk - why / however that happened, md1 was mounted read only.

Nothing could have modified md1 or hdg1 => 3 out of 4 should, at least data wise, still be in sync.

Once, one of my ide controllers crashed and it took 2 hhds with it. I rewrote the RAID-superblock if I remember correctly and no data was lost.

I don't really like the idea of doing it again. Do I have any other options? Is this a realistic option at all?
 
Old 04-04-2007, 10:31 PM   #4
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Rep: Reputation: 45
Wait until you get back, or until someone with more knowledge of fixing RAID can help you. Try not to mess with the system and mdadm too much because you don't want to permanently ruin your RAID5. There could be hope!
 
Old 04-05-2007, 09:21 AM   #5
Ossah
LQ Newbie
 
Registered: Mar 2005
Location: Morristown, NJ USA
Distribution: openSuse 10.2 / Suse 9.3
Posts: 23

Original Poster
Rep: Reputation: 15
Hi Micro, in principal I totally agree with you. However, it'll take some month until I get physical access to the computer and I was hoping to find the expert on RAID here.

I'm aware of the risk, but might take the dare.

What I do know:
- I do have the RAID configuration file, so I know exactly how it's build.
- I know which device is out of sync / faulty

What I'm trying to figure out:
- I don't know why the one active disk was marked as spare disk. Anyhow, marking the disk as spare probably only effects the superblock not the data itself, right?
- If I now write the new superblock with my former configuration, the spare disk should be set back to active.
- Just writing the superblock does not trigger a sync of the disks, right?
- I should be able to mark the faulty disk as faulty before starting the array, so the data out of sync won't mess with the good data.

=> everything should be ok, except that the RAID is running degraded until I can exchange the other disk.

Can anyone agree / disagree on that? Any input appreciated.
 
Old 04-05-2007, 10:52 AM   #6
Micro420
Senior Member
 
Registered: Aug 2003
Location: Berkeley, CA
Distribution: Mac OS X Leopard 10.6.2, Windows 2003 Server/Vista/7/XP/2000/NT/98, Ubuntux64, CentOS4.8/5.4
Posts: 2,986

Rep: Reputation: 45
Quote:
Originally Posted by Ossah
=> everything should be ok, except that the RAID is running degraded until I can exchange the other disk.

Can anyone agree / disagree on that? Any input appreciated.
Don't quote me, but I think having only 2 disks active out of a 4 disk RAID5 means that your RAID is totally broken and unusable. I wouldn't even attempt to write anything for fear of causing more damage. If you could only convert that spare to a live disk again ...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Server fails to start after softraid extension browny_amiga Linux - Server 2 03-13-2007 04:40 AM
softraid-1: Disk name changes when plug in PhillipHuang Linux - General 2 02-13-2007 09:30 PM
Linux SoftRAID using ATA and SATA dop Linux - Hardware 2 06-11-2005 03:07 AM
poor softRAID 0 performance with kernel 2.6.x hcb Linux - General 1 02-03-2004 02:09 AM
pls pls pls help me ! i'm tired with httpd config on fedora apache 2.0.48 AngelOfTheDamn Fedora 0 01-24-2004 05:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 06:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration