Quote:
Originally Posted by deathsfriend99
That IS interesting. That sounds like the behavior I have been seeing. Sometimes it kicks a drive out, I pull the drive, run a Seagate Diagnostic on it, and it comes up good. I'd slap it back in and it'd work for a while, then another drive would fail.
Sometimes the array will just go read-only until I unmount it and run fsck.
Perhaps it's a port multiplier issue.
I was incorrect in the version of CentOS. These are running 5.7. Maybe I'll upgrade them to 6 and see what happens. I hate to do Centos 7. It's so awful!
|
I've spent the past few days recreating my file server. I started out with OpenSUSE, but quickly got fed up with systemd, and ended up going back to installing gentoo. Most of the things were fine, but my SiI 3132 contollers were stubbornly uppity with me. The symptoms weren't as bad as they used to be, but although they kept working, they still kept giving me these glares, like:
Code:
Oct 25 02:54:44 [kernel] [ 5410.276611] ata12.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x6 frozen
Oct 25 02:54:44 [kernel] [ 5410.276618] ata12.00: failed command: WRITE FPDMA QUEUED
Oct 25 02:54:44 [kernel] [ 5410.276627] ata12.00: cmd 61/01:28:08:08:00/00:00:00:00:00/40 tag 5 ncq 512 out
Oct 25 02:54:44 [kernel] [ 5410.276627] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 25 02:54:44 [kernel] [ 5410.276631] ata12.00: status: { DRDY }
Oct 25 02:54:44 [kernel] [ 5410.276636] ata12: hard resetting link
Oct 25 02:54:50 [kernel] [ 5415.748990] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
Oct 25 02:54:50 [kernel] [ 5415.758754] ata12.00: configured for UDMA/100
Oct 25 02:54:50 [kernel] [ 5415.758760] ata12.00: device reported invalid CHS sector 0
Oct 25 02:54:50 [kernel] [ 5415.758765] ata12: EH complete
And I don't like it. It didn't knock the disks off, didn't break the raid, and didn't even put the filesystem to read-only. Something was still wrong. After reading around forums a bit more it seems that SiI 3132 revision 1 is buggy, and won't work properly. Apparently revision 5 of the same card should work without issues.
Code:
06:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
According to lspci, what I have is rev 1 card. Since it seemed to work fine with only one SATA drive attached, for years back when port multiplier code wasn't in kernel - that's what I decided to try and go back to. I turned off the port multiplier support, recompiled, and connected just one drive to the card. Time will tell how it'll end up - but at least the system booted fine, and found the drive without issues.
SiI 3132 seems pretty common chip for eSATA card in JBOD configurations, so it's worth checking what chip lspci shows for you - and if it's 3132, what revision it is.
Here's a couple of threads that may be relevant for the case:
http://ubuntuforums.org/showthread.php?t=2061555
http://www.linuxquestions.org/questi...26-4175445070/
--- EDIT ---
After running a full RAID resync on 3TB drives (lasted around 10 hours), the SiI 3132 Rev 1 showed no errors whatsoever with just one of the drives connected to it, and the port multiplier code disabled on kernel. Unfortunately if I've understood correctly, a common setup in JBOD environment is to have 3132 with two ports (and port multiplier) providing the external SATA capabilities, each port connecting to JBOD array. If running with only one JBOD system, and thus using only one port on SiI 3132, then I imagine this would work fine with Rev 1 chip with the multiplier code turned off - assuming the JBOD array behind the 3132 shows up as a single disk. If it shows as array of individual disks, then port multiplier may be necessary - I don't know, I've never ran such a configuration.
In either case, if you need both ports on the SiI 3132, then I think Rev 1 chip is not entirely stable. In that case I'd try to either replace the card with Rev 5 card, or another model that's stable. Another option might be to try leaving the multiplier code disabled, and use two SiI 3132 Rev 1 cards with each connected to only one JBOD, if one came with each JBOD system, provided the host system has two free PCI-E slots that can support them.
--- END EDIT ---