Filesystem corruption on software RAID5

drkdiggler · 02-01-2004, 07:47 PM

Hi, I have been running a software RAID5 array on a Redhat 8.0 install. The card that I am using is a Highpoint Rocket Raid 404 with a Seagate 80 GB hard drive on each of the 4 channels. I setup the RAID5 array using the following /etc/raidtab:

raiddev /dev/md0
raid-level 5
nr-raid-disks 4
nr-spare-disks 0
chunk-size 128
persistent-superblock 1
parity-algorithm left-symmetric
device /dev/hde1
raid-disk 0
device /dev/hdg1
raid-disk 1
device /dev/hdi1
raid-disk 2
device /dev/hdk1
raid-disk 3

The filesystem on the array is ext2 (I had ext3 for a while too and I got the same problem) My issue is that my filesystem corrupts itself over time. The server never gets shut off, but whenever I run fsck I get errors like the following:

/dev/md0: clean, 1724/29310976 files, 10298750/58611072 blocks
[root@jwitt root]# fsck -fCVt ext2 /dev/md0
fsck 1.27 (8-Mar-2002)
[/sbin/fsck.ext2 (1) -- /infinity5] fsck.ext2 -f -C0 /dev/md0
e2fsck 1.27 (8-Mar-2002)
Pass 1: Checking inodes, blocks, and sizes
Inode 2518496 is in use, but has dtime set. Fix<y>? yes

Inodes that were part of a corrupted orphan linked list found. Fix<y>?

These last errors happened less than a day after I had run fsck previously and fixed errors. In this time frame I haven't even turned the computer off or rebooted. I have also noticed that slowly but surely, files on the array become corrupted. Running SMART tests on the drives haven't revealed any problems. Are there possibly just some bad blocks fouling everything up? If so, how do I go about detecting them and marking them so that the OS doesnt use them? Finally, (sorry for the long questions) fsck seems to hang during certain parts of its first section (checking inodes, blocks, and sizes) and then all of the sudden the progress will jump ahead, I don't know if this is normal or not.

THANK YOU FOR ANY HELP, THIS IS A VERY ANNOYING PROBLEM!

drkdiggler · 02-02-2004, 07:43 PM

I just ran fsck again after 24 hours and I hadn't even remounted it. There were errors again. Can someone please help me out?

Present · 02-22-2004, 04:53 AM

Well, i'm trying to get linux to install on a 404, and having no luck as it sees through the controller (i'm installing SUSE9.0 Pro, but i've had this problem with Mandrake/RH before). how did you get past this problem? if i can get past this, maybe i can try to recreate your scenario.

drkdiggler · 02-22-2004, 05:11 AM

Hey, I actually figured out the problem, but forgot to post back to the forum. It was the drivers, because I remember back to when I had Redhat 7.0 which didn't have the drivers built into the kernel and I didn't have these problems. After following the directions from Highpoint about disabling the drivers built into the kernel, and using their binaries, everything has been fine for the last 2 weeks. I haven't had a single problem with the file system <knocks on wood> and I even calculated MD5 sums for every file on the array and they haven't changed over that period either. If you go to the Highpoint website http://www.highpoint-tech.com/ they have the instructions on how to install an OS to the controller.

Present · 02-22-2004, 03:05 PM

i'm just finishing an attempt to make a driver from the source provided for the 404, but from what several people have told me, the problem is likely with the controller. from what i hear, it is not a true hardware controller (Raid in my case), but passes off some of the responsibilities to software. it could be this loose definition of control that is leading to data problems (i've read/heard the same about many promise controllers).

one possible suggestion is to move the drive to an onboard controller, or try a controller that is truely hardware based and has native support (3ware was recommended to me on a different thread on this forum).

GL

Present · 02-22-2004, 08:22 PM

ahh, just went back and re-read your post and noticed you are using Software raid. I have seen several people with problems when Linux tries to write through the chip. I believe there is support for using the "quasi-onboard-raid" with RedHat for the Hpt374 at highpoints web-site, and that may resolve the corruption issue. Try this URL:

http://www.highpoint-tech.com/USA/brr404.htm

They have support all the way through RH9

may be quite a pain to switch stuff over. i'm personally looking at replacing my 404 as the linux support is quite sketchy. I just tried a slack installation about 5 ways and couldn't get it to work right. ahh well.

GL

Present · 02-23-2004, 08:39 PM

lol, i just reread your last post, and noticed you fixed the problem completely!!

good work. for some reason when i read the post before i thought you were having a problem with sums (guess i didn't read very closely).

you can disregard the last several posts (unless you begin to have problems again).