TOO WEIRD!
Sifting through the dmesg output (yet again) I noticed a trend:
When the card failed to set up the drives, the dmesg output related to the card looked like this:
Code:
sata_promise 0000:01:02.0: version 1.04
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 9
PCI: setting IRQ 9 as level-triggered
ACPI: PCI Interrupt 0000:01:02.0[A] -> Link [LNKB] -> GSI 9 (level, low) -> IRQ 9
irq 9: nobody cared (try booting with the "irqpoll" option)
[<c0146772>] __report_bad_irq+0x36/0x7d
[<c0146968>] note_interrupt+0x1af/0x1eb
[<c01ea916>] acpi_ev_sci_xrupt_handler+0x12/0x19
[<c01e5562>] acpi_irq+0xb/0x14
[<c0145ed6>] handle_IRQ_event+0x23/0x49
[<c0145fb5>] __do_IRQ+0xb9/0xee
[<c01067aa>] do_IRQ+0x71/0x83
[<c0104e1a>] common_interrupt+0x1a/0x20
[<c0124db8>] __do_softirq+0x51/0xbb
[<c0124e58>] do_softirq+0x36/0x3a
[<c01067af>] do_IRQ+0x76/0x83
[<c0104e1a>] common_interrupt+0x1a/0x20
[<c0102b0c>] default_idle+0x0/0x59
[<c0102b3d>] default_idle+0x31/0x59
[<c0102c03>] cpu_idle+0x9e/0xb8
[<c03a26eb>] start_kernel+0x349/0x351
[<c03a219e>] unknown_bootoption+0x0/0x204
handlers:
[<c01e5557>] (acpi_irq+0x0/0x14)
Disabling IRQ #9
ata1: SATA max UDMA/133 cmd 0xE082A200 ctl 0xE082A238 bmdma 0x0 irq 9
ata2: SATA max UDMA/133 cmd 0xE082A280 ctl 0xE082A2B8 bmdma 0x0 irq 9
ata3: SATA max UDMA/133 cmd 0xE082A300 ctl 0xE082A338 bmdma 0x0 irq 9
ata4: SATA max UDMA/133 cmd 0xE082A380 ctl 0xE082A3B8 bmdma 0x0 irq 9
scsi0 : sata_promise
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata1.00: ata1: dev 0 multi count 0
ata1.00: qc timeout (cmd 0xef)
ata1.00: failed to set xfermode (err_mask=0x4)
ata1.00: limiting speed to UDMA/100
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata1.00: ata1: dev 0 multi count 0
ata1.00: qc timeout (cmd 0xef)
ata1.00: failed to set xfermode (err_mask=0x4)
ata1.00: limiting speed to PIO0
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata1.00: ata1: dev 0 multi count 0
ata1.00: qc timeout (cmd 0xef)
ata1.00: failed to set xfermode (err_mask=0x4)
ata1.00: disabled
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
*repeat the last 20 lines 2 more times for each of the connected hdd's*
That's obviously not right! That whole string of "errors" at the beginning just before it disables IRQ9 makes it seem to be an ACPI/IRQ issue (I already tried disabling ACPI, BTW).
If I booted up with no drives connected dmesg told me this:
Code:
sata_promise 0000:01:02.0: version 1.04
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 9
PCI: setting IRQ 9 as level-triggered
ACPI: PCI Interrupt 0000:01:02.0[A] -> Link [LNKB] -> GSI 9 (level, low) -> IRQ 9
ata1: SATA max UDMA/133 cmd 0xE082A200 ctl 0xE082A238 bmdma 0x0 irq 9
ata2: SATA max UDMA/133 cmd 0xE082A280 ctl 0xE082A2B8 bmdma 0x0 irq 9
ata3: SATA max UDMA/133 cmd 0xE082A300 ctl 0xE082A338 bmdma 0x0 irq 9
ata4: SATA max UDMA/133 cmd 0xE082A380 ctl 0xE082A3B8 bmdma 0x0 irq 9
scsi0 : sata_promise
ata1: SATA link down (SStatus 0 SControl 300)
scsi1 : sata_promise
ata2: SATA link down (SStatus 0 SControl 300)
scsi2 : sata_promise
ata3: SATA link down (SStatus 0 SControl 300)
scsi3 : sata_promise
ata4: SATA link down (SStatus 0 SControl 300)
That's better, anyway. So what changes when I connect a drive to the card?
I don't know, but the card is in the top PCI slot now and I don't want to move it to a lower slot only to change it to an even lower IRQ, so I dug around in the BIOS looking for a way to perhaps force the slot to a higher IRQ. No such luck on that front, but I did find a setting to enable bus mastering on the slot. This is a very modern card and I see no reason to do that, but after 13 hours of hard luck I was willing to try anything.
I enabled the bus master on slot #1 and booted up with the drives attached:
Code:
sata_promise 0000:01:02.0: version 1.04
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 9
PCI: setting IRQ 9 as level-triggered
ACPI: PCI Interrupt 0000:01:02.0[A] -> Link [LNKB] -> GSI 9 (level, low) -> IRQ 9
ata1: SATA max UDMA/133 cmd 0xE082A200 ctl 0xE082A238 bmdma 0x0 irq 9
ata2: SATA max UDMA/133 cmd 0xE082A280 ctl 0xE082A2B8 bmdma 0x0 irq 9
ata3: SATA max UDMA/133 cmd 0xE082A300 ctl 0xE082A338 bmdma 0x0 irq 9
ata4: SATA max UDMA/133 cmd 0xE082A380 ctl 0xE082A3B8 bmdma 0x0 irq 9
scsi0 : sata_promise
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata1.00: ata1: dev 0 multi count 0
ata1.00: configured for UDMA/133
scsi1 : sata_promise
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata2.00: ata2: dev 0 multi count 0
ata2.00: configured for UDMA/133
scsi2 : sata_promise
ata3: SATA link down (SStatus 0 SControl 300)
scsi3 : sata_promise
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48
ata4.00: ata4: dev 0 multi count 0
ata4.00: configured for UDMA/133
Vendor: ATA Model: WDC WD2500KS-00M Rev: 02.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
sda:
sd 0:0:0:0: Attached scsi disk sda
*repeat the last 12 lines 2 times for sdb and sdc*
Now that's more like it! How funky is that?
Is this going to hurt the drive performance enough to care? Would I do better to move it to a lower slot and cope with a lower IRQ? I'm sure either option affects performance, but is either really going to be measurable enough to worry about?
Any thoughts before I leave to deliver the server in about 6 or 7 hours are appreciated...
J
(wow, a whole thread of me talking to myself...)