OK, I haven't found this issue mentioned in this fourm, but I could have missed it, if I did just link it related and pls don't flame.
System info:
gentoo-2.6.11 kernel no patches. (Si support compiled in, raid support compiled in)
1.4 GHz athlon on a K266 MB with 512MB ram
Background:
A while ago I decided to try software raid to but together a bunch of 250 GB harddrives. My goal was to put just over 1TB under my desk. I wanted to string together 6 250 GB Western Digital SATA drives in raid 5 for a non wallet destroying price(software raid). I did some research to make sure the drives were not blacklisted. Then I bought two 4 port SATA controllers.
http://www.syba.com/product/43/02/05/index.html (SATA card)
http://www.wdc.com/en/products/Products.asp?DriveID=59 (Drives)
I believe the controllers are the SI 3114 chipset.
How the problem Exhibits itself:
When the array gets busy drives get kicked out of the raid array.
Access to the kicked out drive(s) results in a timeout for what ever utility I use to get stats on it. (mdadm, fdisk, mount, anything that accesses the device)
Anything that uses the drive is extremely slow or does not work.
-fdisk read the partition information incorrectly and cannot modify anything
-raid becomes extremely slow
-If I try to wipe the kicked out disk nothing happens when I do a mkfs
After time other I/O systems start to fail.
Upon reboot the raid array does not unmount and I get an error code from the drive(s) that got kicked out. (I don't have the code in front of me but I will update later)
I found a similar issue with an earlier model here:
http://forums.gentoo.org/viewtopic-t...ata+4port.html
Why I think this is a driver problem:
I yank all the drives from the raid array and made them stand alone vanilla sata drives. I error checked the drives and found no problems.
Other observations:
I have a total of 8 of these drives, the bigger I make the array the faster disks get kicked out.
There is no bias to any single drive getting kicked out.
If the array is broken and rebuilt I have no problem until the array gets busy again.
If the array is > 6 disks a dsk get kicked out about 15 minutes after the array is created.
My WAG:
Something is causeing an extreme smount of PCI traffic. By extreme I mean waaaaay more than it should. This causes some type of internal timeout. The drive gets marked bad by the md driver(rightfully so since it is non-reponsive) and kicked out. The kernel can't recover and the badness spreads to other I/O systems.
I'll link if I find a solution, if you know of a solution pls pls pls link me to it.
-Darkseer
This is related I think:
http://www.thisishull.net/archive/in...p/t-21928.html