LinuxQuestions.org

LinuxQuestions.org (http://www.linuxquestions.org/questions/index.php)
-   Linux - Hardware (http://www.linuxquestions.org/questions/forumdisplay.php?f=18)
-   -   SATA (libata) link reset problem with port multiplier (PMP): sil3132, sil3726 (http://www.linuxquestions.org/questions/showthread.php?t=4175445070)

mrkn 01-10-2013 08:55 PM

SATA (libata) link reset problem with port multiplier (PMP): sil3132, sil3726
 
Using Slackware64 14.0, kernel version 3.2.29

I recently purchased and set-up a Rosewill RSV-S8 RAID enclosure (see http://www.rosewill.com/products/118...ifications.htm), which consists of two primary components:

* A PCI card using a Silicon Image 3132 chipset, with two eSATA ports.

* Two SATA port multipliers in the enclosure itself, each using a Silicon Image 3726 chipset. Note that, although this controller can support five SATA connections per multiplier, this hardware configuration only uses four per.

I've updated the sil3132 BIOS to the most recent available from the Silicon Image website (7.7.03), set to "non-RAID" mode (for a JBOD configuration). The PCI card shows up in my lspci as:

Code:

01:00.0 SATA controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
Before I get too deep into the bad behavior, let me establish that this is working, that is: I have drives plugged into the enclosure, they mount in my system, and I have a functional software RAID (using mdadm) over them.

The problem is that the kernel takes a very long time to recognize the drives. If I power-up the enclosure after my system is started up, it will take a few minutes for everything to show up. If I have the enclosure powered before I start up, this delay will block the boot process for that duration (that is, for several minutes).

At the bottom of this post I've pasted grepped versions of my dmesg output, as relevant to my issue. The "first" eSATA/PMP link is seen as ata5, and the "second" link is seen as ata6; as is shown in the output, I have four HDDs plugged into ata5, but none into ata6 (waiting for another good sale).

The behavior pattern is:

* The PMP is found;
* The PMP's hardware connections are enumerated and reset;
* The reset fails after a timeout (10 seconds, 10 seconds, 35 seconds, then 5 seconds), then is retried three more times (with corresponding timeouts);
* This whole enumerate/reset/fail-x4 process itself repeats twice more;
* Finally, the message "failed to recover link after 3 tries, disabling" occurs and the system can proceed. At this point, if there are actually drives plugged into the PMP, they get recognized almost instantly and the system can proceed.
* And then this happens all over again for the second PMP.

I've observed the same behavior with the enclosure completely empty, and with my HDDs plugged into the "second" PMP instead of the first.

Does anyone have any ideas on what might cause this repeated failed-reset issue (and hence delay)?

My personal theory is that this has to do with the missing fifth SATA link from the PMP, like libata (via the sil controllers) can't tell that nothing's there, and futilely waits for a response from a hardware port that doesn't exist. But I don't really know enough about libata to investigate this further.

I appreciate any input, but since as I said this is functional (eventually), I don't have any interest in replacing the hardware just yet.

Code:

dmesg | grep ata5

[    4.032197] ata5: SATA max UDMA/100 host m128@0xfea84000 port 0xfea80000 irq 16
[    6.167053] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[    6.167406] ata5.15: Port Multiplier 1.1, 0x1095:0x3826 r23, 6 ports, feat 0x1/0x9
[    6.171950] ata5.00: hard resetting link
[    6.488250] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[    6.488366] ata5.01: hard resetting link
[    6.804251] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    6.804366] ata5.02: hard resetting link
[    7.120250] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    7.120364] ata5.03: hard resetting link
[    7.436249] ata5.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    7.436364] ata5.04: hard resetting link
[    7.741375] ata5.04: SATA link down (SStatus 0 SControl 320)
[    7.741513] ata5.05: hard resetting link
[  17.748055] ata5.05: softreset failed (timeout)
[  17.748183] ata5.05: hard resetting link
[  27.755055] ata5.05: softreset failed (timeout)
[  27.755182] ata5.05: hard resetting link
[  62.765056] ata5.05: softreset failed (timeout)
[  62.765211] ata5.05: limiting SATA link speed to 1.5 Gbps
[  62.765292] ata5.05: hard resetting link
[  67.767055] ata5.05: softreset failed (timeout)
[  67.767182] ata5.05: reset failed, giving up
[  67.767263] ata5.15: hard resetting link
[  69.899051] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  69.899376] ata5.00: hard resetting link
[  70.215249] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  70.215362] ata5.01: hard resetting link
[  70.531251] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  70.531364] ata5.02: hard resetting link
[  70.847249] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  70.847363] ata5.03: hard resetting link
[  71.163244] ata5.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  71.163357] ata5.05: hard resetting link
[  81.170051] ata5.05: softreset failed (timeout)
[  81.170179] ata5.05: hard resetting link
[  91.177055] ata5.05: softreset failed (timeout)
[  91.177187] ata5.05: hard resetting link
[  126.187056] ata5.05: softreset failed (timeout)
[  126.187209] ata5.05: hard resetting link
[  131.189055] ata5.05: softreset failed (timeout)
[  131.189182] ata5.05: reset failed, giving up
[  131.189263] ata5.15: hard resetting link
[  133.321051] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  133.321373] ata5.00: hard resetting link
[  133.637246] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  133.637359] ata5.01: hard resetting link
[  133.953246] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  133.953360] ata5.02: hard resetting link
[  134.269245] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  134.269358] ata5.03: hard resetting link
[  134.585248] ata5.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  134.585361] ata5.04: hard resetting link
[  134.890370] ata5.04: SATA link down (SStatus 0 SControl 320)
[  134.890507] ata5.05: hard resetting link
[  144.897055] ata5.05: softreset failed (timeout)
[  144.897184] ata5.05: hard resetting link
[  154.904056] ata5.05: softreset failed (timeout)
[  154.904182] ata5.05: hard resetting link
[  189.914055] ata5.05: softreset failed (timeout)
[  189.914208] ata5.05: hard resetting link
[  194.916056] ata5.05: softreset failed (timeout)
[  194.916182] ata5.05: reset failed, giving up
[  194.916261] ata5.05: failed to recover link after 3 tries, disabling
[  194.916345] ata5.15: hard resetting link
[  197.048050] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  197.353192] ata5.00: hard resetting link
[  197.669239] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  197.669352] ata5.01: hard resetting link
[  197.985246] ata5.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  197.985359] ata5.02: hard resetting link
[  198.301250] ata5.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  198.301362] ata5.03: hard resetting link
[  198.617248] ata5.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  198.617860] ata5.00: ATA-9: WDC WD30EZRX-00DC0B0, 80.00A80, max UDMA/133
[  198.617968] ata5.00: 5860533168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[  198.618614] ata5.00: configured for UDMA/100
[  198.619209] ata5.01: ATA-9: WDC WD30EZRX-00DC0B0, 80.00A80, max UDMA/133
[  198.619317] ata5.01: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  198.619938] ata5.01: configured for UDMA/100
[  198.620537] ata5.02: ATA-9: WDC WD30EZRX-00DC0B0, 80.00A80, max UDMA/133
[  198.620644] ata5.02: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  198.621274] ata5.02: configured for UDMA/100
[  198.621858] ata5.03: ATA-9: WDC WD30EZRX-00DC0B0, 80.00A80, max UDMA/133
[  198.621964] ata5.03: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  198.622598] ata5.03: configured for UDMA/100
[  198.622760] ata5: EH complete

Code:

dmesg | grep ata6

[    4.032310] ata6: SATA max UDMA/100 host m128@0xfea84000 port 0xfea82000 irq 16
[  200.765054] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  200.765384] ata6.15: Port Multiplier 1.1, 0x1095:0x3826 r23, 6 ports, feat 0x1/0x9
[  200.770127] ata6.00: hard resetting link
[  201.174065] ata6.00: link resume succeeded after 1 retries
[  201.276370] ata6.00: SATA link down (SStatus 0 SControl 310)
[  201.276510] ata6.01: hard resetting link
[  201.581369] ata6.01: SATA link down (SStatus 0 SControl 320)
[  201.581508] ata6.02: hard resetting link
[  201.886368] ata6.02: SATA link down (SStatus 0 SControl 320)
[  201.886506] ata6.03: hard resetting link
[  202.191371] ata6.03: SATA link down (SStatus 0 SControl 320)
[  202.191510] ata6.04: hard resetting link
[  202.496370] ata6.04: SATA link down (SStatus 0 SControl 320)
[  202.496509] ata6.05: hard resetting link
[  212.503057] ata6.05: softreset failed (timeout)
[  212.503188] ata6.05: hard resetting link
[  222.510056] ata6.05: softreset failed (timeout)
[  222.510184] ata6.05: hard resetting link
[  257.520056] ata6.05: softreset failed (timeout)
[  257.520212] ata6.05: limiting SATA link speed to 1.5 Gbps
[  257.520293] ata6.05: hard resetting link
[  262.522056] ata6.05: softreset failed (timeout)
[  262.522183] ata6.05: reset failed, giving up
[  262.522265] ata6.15: hard resetting link
[  264.654052] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  264.654372] ata6.05: hard resetting link
[  274.661057] ata6.05: softreset failed (timeout)
[  274.661186] ata6.05: hard resetting link
[  284.668059] ata6.05: softreset failed (timeout)
[  284.668185] ata6.05: hard resetting link
[  319.678054] ata6.05: softreset failed (timeout)
[  319.678211] ata6.05: hard resetting link
[  324.680055] ata6.05: softreset failed (timeout)
[  324.680183] ata6.05: reset failed, giving up
[  324.680265] ata6.15: hard resetting link
[  326.812054] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  326.812379] ata6.00: hard resetting link
[  327.216076] ata6.00: link resume succeeded after 1 retries
[  327.318369] ata6.00: SATA link down (SStatus 0 SControl 310)
[  327.318506] ata6.01: hard resetting link
[  327.623367] ata6.01: SATA link down (SStatus 0 SControl 320)
[  327.623506] ata6.02: hard resetting link
[  327.928371] ata6.02: SATA link down (SStatus 0 SControl 320)
[  327.928510] ata6.03: hard resetting link
[  328.233369] ata6.03: SATA link down (SStatus 0 SControl 320)
[  328.233507] ata6.04: hard resetting link
[  328.538391] ata6.04: SATA link down (SStatus 0 SControl 320)
[  328.538529] ata6.05: hard resetting link
[  338.545056] ata6.05: softreset failed (timeout)
[  338.545184] ata6.05: hard resetting link
[  348.552055] ata6.05: softreset failed (timeout)
[  348.552183] ata6.05: hard resetting link
[  383.556056] ata6.05: softreset failed (timeout)
[  383.556211] ata6.05: hard resetting link
[  388.558056] ata6.05: softreset failed (timeout)
[  388.558185] ata6.05: reset failed, giving up
[  388.558265] ata6.05: failed to recover link after 3 tries, disabling
[  388.558350] ata6.15: hard resetting link
[  390.690052] ata6.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  390.995254] ata6: EH complete


mrkn 01-11-2013 01:34 PM

I decided to rip open the enclosure today - just as a safety check for dangling hardware connections, etc - and noticed that the port multiplier is actually a Silicon Image 3826, not 3726 as advertised on the Rosewill site. (In retrospect this should have been obvious from the dmesg logs, which plainly show a PMP revision of 0x3826)

I'd be willing to bet that the 3826 is a minor update from the 3726, but from browsing the linux-ide mailing list and skimming some source (drivers/ata/libata-pmp.c) it looks like there are a number of quirk-hacks for the 3726 chipset, which probably need to apply to the 3826 as well.

I've never used a custom kernel before but I think that's probably my next step.

EDIT In fact, a quick google search shows that this should be the case:
https://bugzilla.redhat.com/show_bug.cgi?id=890237
The attached PDF (!?) shows that, yeah, the suggested patch is to apply the same quirks for the 3826 as for the 3726.
This still hasn't made its way into the main Linux sources, though.

mrkn 01-13-2013 04:08 PM

After an embarrassing number of missteps in executing my custom kernel build, I did eventually verify that patching libata-pmp.c to use the same quirks for the 3826 as the 3726, solves this problem handily.


All times are GMT -5. The time now is 09:41 AM.