LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-22-2010, 12:25 PM   #1
itjstagame
LQ Newbie
 
Registered: Nov 2010
Posts: 2

Rep: Reputation: 0
Software Raid, lots of issues after a power outage, please help me keep data


Ok, I've had a 4x 400Gb Raid 5 running for exactly 3 years now. There's been plenty of power outages in that time and lots of resyncing afterwards but all has been good afterwards with no issues.

Last week after a power outage and reboot I went to check the status of the resync and it only listed 3 hdds.

Looking at dmesg:

md: bind<sdb>
md: bind<sdc>
md: bind<sdd>
md: bind<sda>
md: kicking non-fresh sdd from array!
md: unbind<sdd>
md: export_rdev(sdd)
md: md0: raid array is not clean -- starting background reconstruction
raid5: device sda operational as raid disk 0
raid5: device sdc operational as raid disk 2
raid5: device sdb operational as raid disk 1
raid5: cannot start dirty degraded array for md0

Finding information online, I did mdadm -fail -remove, which stated sdd was already removed (which made sense since only 3 drives were listed).
Then mdadm -R /dev/md0

It came online and started resyncing, I thought all was well and left it.

After resyncing I tried to access some recently downloaded files and was not able to write to the disc or read the files I wanted.

Checking syslog I saw thousands of this:
Nov 22 07:49:02 Byznotchnyai kernel: attempt to access beyond end of device
Nov 22 07:49:02 Byznotchnyai kernel: md0: rw=0, want=14963797120, limit=2344267776

cat /proc/mdstat seemed fine, it thought the array was sound, everything I read online says it must just be errors with the ext3 partition itself and to run fsck but I worry if it's something md related that 'fixing' in ext3 will just delete almost all of my files.

At first the 'bad' files seems to be some files that were in the progress of downloading when the power went out, so I thought, ok that's fine, but then I noticed stuff that had been finished for weeks wasn't working. So then I started copying off things that were irreplacibly important (like 5 years of pictures) and even some of those are throwing I/O errors.

It seems a good 1/4-1/3 of all of my files are 'bad' and I know if I let fsck do it's thing it'll just delete them all.

The hdd itself seems fine, they all report the same info in smartctl and don't throw any errors, so I don't know why just that one would be non-fresh or why a resync would trash my data.

I've heard of backup super blocks but I'm not sure how to find them, does anyone have any suggestions on how to either reassemble the md (which I've seen mentioned a few times but also worries me) or what to do to see if it's really ext 3. Or how to see which hdd in the array is throwing the I/Os, maybe it really is just a bad driving somehow.

I'm really at a loss and I'm very annoyed because I put all of my important info on my Raid 5 thinking it'd be 'safer' than another method. Thanks so much.
 
Old 11-22-2010, 04:06 PM   #2
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,939

Rep: Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619
The problem was the software raid in my opinion, I have never liked them. A true hardware raid may also have left you in the ditch though.

My only guess is the partitions have overlapped but that is a wild guess.

Another issue if is what type of journaling on the ext3.


Might boot to a live cd and then see how it tries to access and any tools to recover. http://planet.admon.org/howto/using-...to-check-ext3/
 
Old 11-22-2010, 06:18 PM   #3
itjstagame
LQ Newbie
 
Registered: Nov 2010
Posts: 2

Original Poster
Rep: Reputation: 0
Its purely a data partition so I can test and try to fix from inside my system.

It's ext3, I didn't know there were different journaling options.

I mean my array is reporting all good on the MD front it's just ext3 that's telling me there are errors and a check should be forced, which would be fine normally but it is finding a significant number of my files as errors.

It makes me wonder if the drives are in the wrong order or parity is messed up some how. I did just try failing sdd again to see if maybe it could run off just the 3 hdds using parity and somehow data on sdd was wrong, but the filesystem and I/O issues are exactly the same.

On that note I just realized it's strange the system didn't mount with just 3 drives anyway in the first place, one non-fresh disk shouldn't be an issue with raid 5. Also while copying data off it seems like anything from Oct or Nov is having an issue, older stuff seems mostly fine. I'm at a loss.

I am trying to copy off what I can for now, but I'm not getting a lot. I guess if I have to rebuild I'll go with RAID 1 and LVM or maybe 0+1, at least with mirroring I will know how to sanely get at my data.
 
Old 11-22-2010, 07:44 PM   #4
jefro
Moderator
 
Registered: Mar 2008
Posts: 21,939

Rep: Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619Reputation: 3619
Well, there are plenty of posts on the backup superblocks.

You can sure try http://linux.die.net/man/8/fsck.ext3 with the backup superblock. It will tell you if you have the wrong format.

I'd still be tempted to do that from a live cd just to be sure you have complete control over it.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Trouble rebuilding RAID-5 array after power outage WindowBreaker Slackware 10 02-02-2010 03:19 AM
Software Raid 1 behaving strangely after power outage Krigslund Linux - Server 6 01-16-2009 01:09 PM
RAID 5 after Power outage RabidSquirrel Linux - Hardware 3 08-02-2005 06:46 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration