LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Kernel panics - RAID or XFS related (https://www.linuxquestions.org/questions/linux-general-1/kernel-panics-raid-or-xfs-related-815508/)

gimpy530 06-21-2010 03:19 PM

Kernel panics - RAID or XFS related
 
After building a new system I have been getting kernel panics randomly. I have been able to look at /var/log/messages and determine that either my RAID card, XFS, or something along those lines is causing the problem. It has now gotten bad enough that any heavy use of the RAID will cause it to panic and die.

There is one large file on this that I need which is not on my backups but when I try to get it off, the transfer will cause a kernel panic within minutes.

What is causing the panic? Is there any way to resolve it?

The RAID card is an LSI SAS3081E-R and I am using the built in drivers/modules within Ubuntu 10.04 x64 (pretty sure the kernel itself has the driver included). There is source available for the driver from LSI's site but I'm not sure if that will solve the problem and in addition to that, I can't figure out how to compile and install it. It has a Makefile already but make just complains "no targets".

Code:

Jun 21 13:00:22 cyan kernel: [  706.206498]  ffff88019652daf8 ffff88018ff61698 ffff8801935ddd50 ffffea00038f6650
Jun 21 13:00:22 cyan kernel: [  706.206472] Process xfsdatad/2 (pid: 862, threadinfo ffff8801935dc000, task ffff88019652dac0)
Jun 21 13:00:22 cyan kernel: [  706.206456] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 21 13:00:22 cyan kernel: [  706.206440] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 21 13:00:22 cyan kernel: [  706.206425] CR2: 000000000043b780 CR3: 000000019204b000 CR4: 00000000000006e0
Jun 21 13:00:22 cyan kernel: [  706.206411] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jun 21 13:00:22 cyan kernel: [  706.206392] FS:  00007f94286fb6f0(0000) GS:ffff880028070000(0000) knlGS:0000000000000000
Jun 21 13:00:22 cyan kernel: [  706.206375] R13: ffff00018ff616b0 R14: 0000000000000202 R15: ffffc900117cabc8
Jun 21 13:00:22 cyan kernel: [  706.206360] R10: ffa947f1edff1002 R11: ffff880123fe15c8 R12: ffffea00038f6650
Jun 21 13:00:22 cyan kernel: [  706.206345] RBP: ffff8801935ddd60 R08: 4000000000000000 R09: 8010000000000000
Jun 21 13:00:22 cyan kernel: [  706.206330] RDX: ffff00018ff61698 RSI: ffff880038b81c40 RDI: ffffea00038f6650
Jun 21 13:00:22 cyan kernel: [  706.206315] RAX: 0200ff000000282c RBX: ffffea00038f6650 RCX: 0000000000000034
Jun 21 13:00:22 cyan kernel: [  706.206301] RSP: 0018:ffff8801935ddd20  EFLAGS: 00010282
Jun 21 13:00:22 cyan kernel: [  706.206266] RIP: 0010:[<ffffffff810e37a7>]  [<ffffffff810e37a7>] test_clear_page_writeback+0x47/0x150
Jun 21 13:00:22 cyan kernel: [  706.206246] Pid: 862, comm: xfsdatad/2 Not tainted 2.6.31-21-server #59-Ubuntu TA790GX A3+
Jun 21 13:00:22 cyan kernel: [  706.205997] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp bridge stp vboxnetflt vboxnetadp vboxdrv kvm_amd kvm ppdev vmnet parport_pc vmblock vsock vmci vmmon nfsd nfs lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel radeon snd_hda_codec ttm iptable_filter snd_hwdep ip_tables drm snd_pcm snd_timer x_tables i2c_algo_bit snd soundcore i2c_piix4 snd_page_alloc xfs shpchp lp amd64_edac_mod parport exportfs edac_core raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear mptsas mptscsih r8169 mii mptbase scsi_transport_sas ohci1394 ieee1394
Jun 21 13:00:22 cyan kernel: [  706.205986] CPU 2


H_TeXMeX_H 06-22-2010 05:33 AM

First thing I would do is use memtest86 to check the RAM on this system for errors. Make sure all the cables are good. Also, post the output of 'dmesg'.

gimpy530 06-23-2010 12:12 PM

I actually already ran a memtest86 which came up clean.

After further testing I found that the problem only comes up with one of the two RAID arrays on this card. I also attempted to format with EXT4 just to be sure there wasn't some FS problem that xfs_check and such were not finding. It kernel panics during mkfs. So this is not a FS issue, which I already suspected.

I also ran Western Digital's diagnostics on the drives which came up clean after an extended test on each. Also the result I expected.

The dmesg is too long for this forum so I tossed it onto pastebin. Note that /dev/sdd1 is the one showing the problems and /dev/sdc1 is the other array on the card. They are both RAID0. There are also two drives on a separate controller running RAID1. I went RAID happy when I built this server.

http://pastebin.com/hYxi7K3F

Here's an lspci too:

Code:

02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)
        Subsystem: LSI Logic / Symbios Logic Device 3140
        Flags: bus master, fast devsel, latency 0, IRQ 18
        I/O ports at d000 [size=256]
        Memory at fe9fc000 (64-bit, non-prefetchable) [size=16K]
        Memory at fe9e0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at fe600000 [disabled] [size=2M]
        Capabilities: <access denied>
        Kernel driver in use: mptsas
        Kernel modules: mptsas


H_TeXMeX_H 06-23-2010 12:58 PM

Some things I notice:

Code:

spurious 8259A interrupt: IRQ7.
This has been known to cause system hangs:
http://jmz.iki.fi/blog/computers/spu...interrupt-irq7

There's also

Code:

#
[  19.191624] EDAC amd64: WARNING: ECC is NOT currently enabled by the BIOS. Module will NOT be loaded.
#
[  19.191625]    Either Enable ECC in the BIOS, or use the 'ecc_enable_override' parameter.
#
[  19.191626]    Might be a BIOS bug, if BIOS says ECC is enabled
#
[  19.191627]    Use of the override can cause unknown side effects.
#
[  19.191637] amd64_edac: probe of 0000:00:18.2 failed with error -22

Which is unusual. See:
https://bugs.launchpad.net/linux/+bug/422536

Also, can you post 'lspci -vv', so I can see what drivers are being used for the SATA controllers, I see one pata driver loaded.

gimpy530 06-23-2010 03:12 PM

I booted with noacpi but it still panics/oops.

Tried moving the card to a different PCIe slot, still fails.

Let me know if you need the full lspci or if this will do. I'm hoping I won;t have to go through another motherboard on this thing.

Code:

02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)
        Subsystem: LSI Logic / Symbios Logic Device 3140
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at d000 [size=256]
        Region 1: Memory at fe9fc000 (64-bit, non-prefetchable) [size=16K]
        Region 3: Memory at fe9e0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at fe600000 [disabled] [size=2M]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v1) Endpoint, MSI 00
                DevCap:        MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl:        Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta:        CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap:        Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Latency L0 <64ns, L1 <1us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl:        ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta:        Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        Capabilities: [98] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [b0] MSI-X: Enable- Mask- TabSize=1
                Vector table: BAR=1 offset=00002000
                PBA: BAR=1 offset=00003000
        Capabilities: [100] Advanced Error Reporting <?>
        Kernel driver in use: mptsas
        Kernel modules: mptsas


H_TeXMeX_H 06-26-2010 02:32 PM

I must have missed this update. Yes, post the full 'lspci' and maybe also post 'cat /proc/interrupts', maybe it's related to interrupts.

gimpy530 06-26-2010 11:39 PM

Here's the information. I also opened a case with LSI but they have not gotten back to me yet.

Code:

white@cyan:~$ cat /proc/interrupts
          CPU0      CPU1      CPU2      CPU3     
  0:        25          0          0          1  IO-APIC-edge      timer
  1:          0          0          0          8  IO-APIC-edge      i8042
  7:          1          0          0          0  IO-APIC-edge   
  8:          0          0          0          1  IO-APIC-edge      rtc0
  9:          0          0          0          0  IO-APIC-fasteoi  acpi
 14:          0          0          0          0  IO-APIC-edge      pata_atiixp
 15:          0          0          0        449  IO-APIC-edge      pata_atiixp
 16:          0          0          1        254  IO-APIC-fasteoi  ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
 17:          0          0          0          0  IO-APIC-fasteoi  ehci_hcd:usb1
 18:          0          0          0        295  IO-APIC-fasteoi  ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, ioc0
 19:          0          0          0        18  IO-APIC-fasteoi  ehci_hcd:usb2, HDA Intel
 22:          0          0          3      4250  IO-APIC-fasteoi  ahci, ohci1394
 24:      2952          0          0          0  HPET_MSI-edge      hpet2
 27:          0          0          0        160  PCI-MSI-edge      eth0
NMI:          0          0          0          0  Non-maskable interrupts
LOC:        58      2871      1675      2699  Local timer interrupts
SPU:          0          0          0          0  Spurious interrupts
CNT:          0          0          0          0  Performance counter interrupts
PND:          0          0          0          0  Performance pending work
RES:      1556      1490      2263      2296  Rescheduling interrupts
CAL:        164        203        204        83  Function call interrupts
TLB:        575        388        365        343  TLB shootdowns
TRM:          0          0          0          0  Thermal event interrupts
THR:          0          0          0          0  Threshold APIC interrupts
MCE:          0          0          0          0  Machine check exceptions
MCP:          1          1          1          1  Machine check polls
ERR:          1
MIS:          0

Code:

white@cyan:~$ lspci -vv
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
        Subsystem: Advanced Micro Devices [AMD] RS780 Host Bridge
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
        Latency: 0
        Capabilities: <access denied>

00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
        I/O behind bridge: 0000c000-0000cfff
        Memory behind bridge: fe400000-fe5fffff
        Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
        Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: <access denied>
        Kernel modules: shpchp

00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 0)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: fe600000-fe9fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: <access denied>
        Kernel driver in use: pcieport-driver
        Kernel modules: shpchp

00:07.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 3)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: fea00000-feafffff
        Prefetchable memory behind bridge: 00000000fdf00000-00000000fdffffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: <access denied>
        Kernel driver in use: pcieport-driver
        Kernel modules: shpchp

00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (prog-if 01)
        Subsystem: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 22
        Region 0: I/O ports at b000 [size=8]
        Region 1: I/O ports at a000 [size=4]
        Region 2: I/O ports at 9000 [size=8]
        Region 3: I/O ports at 8000 [size=4]
        Region 4: I/O ports at 7000 [size=16]
        Region 5: Memory at fe3ff800 (32-bit, non-prefetchable) [size=1K]
        Capabilities: <access denied>
        Kernel driver in use: ahci

00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller (prog-if 10)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fe3fe000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller (prog-if 10)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fe3fd000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller (prog-if 20)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 17
        Region 0: Memory at fe3ff000 (32-bit, non-prefetchable) [size=256]
        Capabilities: <access denied>
        Kernel driver in use: ehci_hcd

00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller (prog-if 10)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at fe3fc000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller (prog-if 10)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at fe3f7000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller (prog-if 20)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 19
        Region 0: Memory at fe3f6800 (32-bit, non-prefetchable) [size=256]
        Capabilities: <access denied>
        Kernel driver in use: ehci_hcd

00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Capabilities: <access denied>
        Kernel modules: i2c-piix4

00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller (prog-if 8a [Master SecP PriP])
        Subsystem: ATI Technologies Inc SB700/SB800 IDE Controller
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64
        Interrupt: pin A routed to IRQ 16
        Region 0: I/O ports at 01f0 [size=8]
        Region 1: I/O ports at 03f4 [size=1]
        Region 2: I/O ports at 0170 [size=8]
        Region 3: I/O ports at 0374 [size=1]
        Region 4: I/O ports at ff00 [size=16]
        Capabilities: <access denied>
        Kernel driver in use: pata_atiixp

00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
        Subsystem: Biostar Microtech Int'l Corp Device 821b
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin ? routed to IRQ 16
        Region 0: Memory at fe3f0000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0

00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (prog-if 01)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=64
        Memory behind bridge: feb00000-febfffff
        Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller (prog-if 10)
        Subsystem: Biostar Microtech Int'l Corp Device 3700
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin C routed to IRQ 18
        Region 0: Memory at fe3f5000 (32-bit, non-prefetchable) [size=4K]
        Kernel driver in use: ohci_hcd

00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Capabilities: <access denied>

00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Kernel modules: amd64_edac_mod

00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Capabilities: <access denied>

00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3300 Graphics
        Subsystem: Biostar Microtech Int'l Corp Device 0217
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Region 1: I/O ports at c000 [size=256]
        Region 2: Memory at fe5f0000 (32-bit, non-prefetchable) [size=64K]
        Region 5: Memory at fe400000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: <access denied>
        Kernel modules: radeon

01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller
        Subsystem: ATI Technologies Inc RS780 Azalia controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 19
        Region 0: Memory at fe5e8000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)
        Subsystem: LSI Logic / Symbios Logic Device 3140
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at d000 [size=256]
        Region 1: Memory at fe9fc000 (64-bit, non-prefetchable) [size=16K]
        Region 3: Memory at fe9e0000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at fe600000 [disabled] [size=2M]
        Capabilities: <access denied>
        Kernel driver in use: mptsas
        Kernel modules: mptsas

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
        Subsystem: Biostar Microtech Int'l Corp Device 2309
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 27
        Region 0: I/O ports at e800 [size=256]
        Region 2: Memory at fdfff000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at fdff8000 (64-bit, prefetchable) [size=16K]
        Expansion ROM at feae0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: r8169
        Kernel modules: r8169

04:07.0 FireWire (IEEE 1394): Agere Systems FW322/323 (rev 70) (prog-if 10)
        Subsystem: Biostar Microtech Int'l Corp Device 4401
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64 (3000ns min, 6000ns max), Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 22
        Region 0: Memory at febff000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: ohci1394
        Kernel modules: firewire-ohci, ohci1394


H_TeXMeX_H 06-27-2010 04:18 AM

Ok, looking through the dmesg it's hard to tell how you have everything set up. Can you outline how these are hooked up (SATA or IDE) and which ones are in which array ? and which array is the one that is causing the hang.

I see you have an IDE drive, is that part of one of the arrays ?

EDIT:
To compile the driver, run './compile' in that directory, you can't run 'make' directly.

gimpy530 06-28-2010 01:18 AM

The IDE is the DVD drive.

sda and sdb are in a software RAID1 and host all the system data (root, var, opt, etc). These drives are connected to the motherboard's on board controller. These are two 80GB SATA drives. They work fine.

sdc is one of the arrays off the RAID card. This is two 250GB SATA drives connected via a mini-SAS to SATA breakout cable.

sdd is the same as above but is instead two 1TB drives.

After further testing I have found that both sdc and sdd are giving the problem. Previously I could so anything with sdc and no panics were created. Not true anymore, they both cause panics.

I still can't get it to compile without error but I'm going to wait to talk to LSI before I bother to try to fix the compilation errors.

H_TeXMeX_H 06-28-2010 03:55 AM

Aha, ok, then it must be a problem with the RAID card, maybe the drivers or the card.

Well, if you post the compilation errors maybe I can help ... or maybe not. Anyway, good luck.

gimpy530 06-28-2010 12:26 PM

As I feared LSI only supports RHEL. I'll never understand why companies like RHEL. Here is the output of the compilation. Not sure what most of that means as I know nothing about C. BASH scripts are as far as my coding abilities go. Note that no mptsas.o was created after running this, which is the module in use by my kernel now.

Code:

make: Entering directory `/usr/src/linux-headers-2.6.31-21-server'
  LD      /home/white/raid-src/fusion/built-in.o
  CC [M]  /home/white/raid-src/fusion/mptbase.o
  CC [M]  /home/white/raid-src/fusion/mptscsih.o
  CC [M]  /home/white/raid-src/fusion/mptspi.o
  CC [M]  /home/white/raid-src/fusion/mptfc.o
  CC [M]  /home/white/raid-src/fusion/mptsas.o
  CC [M]  /home/white/raid-src/fusion/mptctl.o
/home/white/raid-src/fusion/mptsas.c: In function ‘mptsas_smp_handler’:
/home/white/raid-src/fusion/mptsas.c:2757: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2758: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2775: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2805: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2808: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2821: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2823: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2854: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2855: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2864: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2867: error: ‘struct request’ has no member named ‘data_len’
make[1]: *** [/home/white/raid-src/fusion/mptsas.o] Error 1
make[1]: *** Waiting for unfinished jobs....
In file included from /home/white/raid-src/fusion/mptctl.c:3178:
/home/white/raid-src/fusion/csmi/csmisas.c: In function ‘csmisas_get_raid_features’:
/home/white/raid-src/fusion/csmi/csmisas.c:4229: warning: the frame size of 1232 bytes is larger than 1024 bytes
/home/white/raid-src/fusion/csmi/csmisas.c: In function ‘csmisas_get_connector_info’:
/home/white/raid-src/fusion/csmi/csmisas.c:5350: warning: the frame size of 1296 bytes is larger than 1024 bytes
make: *** [_module_/home/white/raid-src/fusion] Error 2
make: Leaving directory `/usr/src/linux-headers-2.6.31-21-server'


H_TeXMeX_H 06-28-2010 01:40 PM

Hmm, looks like they don't maintain the driver anymore. But there are these to look at, you can try the patches on the red hat support.

https://bugzilla.redhat.com/show_bug.cgi?id=493093
http://forums.overclockers.com.au/sh...php?p=11359911
http://kerneltrap.org/mailarchive/li...6885071/thread

gimpy530 06-29-2010 04:01 AM

Well, I still can't get it to compile. I tried to put the patches into the files but I was only able to get one to say it worked. It now compiles with more errors. Anything I can do other than throwing away a $200 card?

Code:

sudo ./compile
make: Entering directory `/usr/src/linux-headers-2.6.31-21-server'
  CC [M]  /home/white/raid-src/fusion/mptsas.o
  CC [M]  /home/white/raid-src/fusion/mptlan.o
  CC [M]  /home/white/raid-src/fusion/mptctl.o
/home/white/raid-src/fusion/mptsas.c: In function ‘mptsas_smp_handler’:
/home/white/raid-src/fusion/mptsas.c:2758: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2759: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2776: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2806: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2809: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2822: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2824: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2855: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2856: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2865: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptsas.c:2868: error: ‘struct request’ has no member named ‘data_len’
/home/white/raid-src/fusion/mptlan.c: In function ‘mpt_register_lan_device’:
/home/white/raid-src/fusion/mptlan.c:1495: error: ‘struct net_device’ has no member named ‘open’
/home/white/raid-src/fusion/mptlan.c:1496: error: ‘struct net_device’ has no member named ‘stop’
/home/white/raid-src/fusion/mptlan.c:1497: error: ‘struct net_device’ has no member named ‘get_stats’
/home/white/raid-src/fusion/mptlan.c:1498: error: ‘struct net_device’ has no member named ‘set_multicast_list’
/home/white/raid-src/fusion/mptlan.c:1499: error: ‘struct net_device’ has no member named ‘change_mtu’
/home/white/raid-src/fusion/mptlan.c:1500: error: ‘struct net_device’ has no member named ‘hard_start_xmit’
/home/white/raid-src/fusion/mptlan.c:1503: error: ‘struct net_device’ has no member named ‘tx_timeout’
make[1]: *** [/home/white/raid-src/fusion/mptlan.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [/home/white/raid-src/fusion/mptsas.o] Error 1
In file included from /home/white/raid-src/fusion/mptctl.c:3178:
/home/white/raid-src/fusion/csmi/csmisas.c: In function ‘csmisas_get_raid_features’:
/home/white/raid-src/fusion/csmi/csmisas.c:4229: warning: the frame size of 1232 bytes is larger than 1024 bytes
/home/white/raid-src/fusion/csmi/csmisas.c: In function ‘csmisas_get_connector_info’:
/home/white/raid-src/fusion/csmi/csmisas.c:5350: warning: the frame size of 1296 bytes is larger than 1024 bytes
make: *** [_module_/home/white/raid-src/fusion] Error 2
make: Leaving directory `/usr/src/linux-headers-2.6.31-21-server'


H_TeXMeX_H 06-29-2010 04:16 AM

The problem is that the patches only patch the driver to work with kernel version up to 2.6.29, and your kernel version is 2.6.31. You can try to get someone to help patch the driver to at least that version of the kernel. I can't do it, because I don't know enough, I can't guarantee that any fix I would be able to make will produce a stable driver.

Maybe post on that Red Hat bugzilla and ask for a patch to this version ? Or e-mail LSI and tell them to patch the driver to this kernel version.

borisk 07-01-2010 05:48 PM

Here is the diffs needed to build a 4.22... module:
 
Code:

--- ./message/fusion/Makefile        2009-11-12 08:23:46.000000000 -0800
+++ ./message_new/fusion/Makefile        2010-06-30 16:20:17.415708295 -0700
@@ -16,4 +16,4 @@
 obj-$(CONFIG_FUSION_FC)                += mptbase.o mptscsih.o mptfc.o
 obj-$(CONFIG_FUSION_SAS)        += mptbase.o mptscsih.o mptsas.o
 obj-$(CONFIG_FUSION_CTL)        += mptctl.o
-obj-$(CONFIG_FUSION_LAN)        += mptlan.o
+#obj-$(CONFIG_FUSION_LAN)        += mptlan.o



--- ./message/fusion/mptsas.c        2009-11-12 08:23:46.000000000 -0800
+++ ./message_new/fusion/mptsas.c        2010-06-30 16:20:46.545081810 -0700
@@ -2754,8 +2754,8 @@
        /* do we need to support multiple segments? */
        if (req->bio->bi_vcnt > 1 || rsp->bio->bi_vcnt > 1) {
                printk(MYIOC_s_ERR_FMT "%s: multiple segments req %u %u, rsp %u %u\n",
-                    ioc->name, __func__, req->bio->bi_vcnt, req->data_len,
-                    rsp->bio->bi_vcnt, rsp->data_len);
+                    ioc->name, __func__, req->bio->bi_vcnt, blk_rq_bytes(req),
+                    rsp->bio->bi_vcnt, blk_rq_bytes(rsp));
                return -EINVAL;
        }
 
@@ -2772,7 +2772,7 @@
        smpreq = (SmpPassthroughRequest_t *)mf;
        memset(smpreq, 0, sizeof(*smpreq));
 
-        smpreq->RequestDataLength = cpu_to_le16(req->data_len - 4);
+        smpreq->RequestDataLength = cpu_to_le16(blk_rq_bytes(req) - 4);
        smpreq->Function = MPI_FUNCTION_SMP_PASSTHROUGH;
 
        if (rphy)
@@ -2802,10 +2802,10 @@
 
        flagsLength = flagsLength << MPI_SGE_FLAGS_SHIFT;
 
-        flagsLength |= (req->data_len - 4);
+        flagsLength |= (blk_rq_bytes(req) - 4);
 
        dma_addr_out = pci_map_single(ioc->pcidev, bio_data(req->bio),
-                                      req->data_len, PCI_DMA_BIDIRECTIONAL);
+                                      blk_rq_bytes(req), PCI_DMA_BIDIRECTIONAL);
        if (!dma_addr_out)
                goto put_mf;
        ioc->add_sge(psge, flagsLength, dma_addr_out);
@@ -2818,9 +2818,9 @@
                MPI_SGE_FLAGS_END_OF_BUFFER;
 
        flagsLength = flagsLength << MPI_SGE_FLAGS_SHIFT;
-        flagsLength |= rsp->data_len + 4;
+        flagsLength |= blk_rq_bytes(rsp) + 4;
        dma_addr_in =  pci_map_single(ioc->pcidev, bio_data(rsp->bio),
-                                      rsp->data_len, PCI_DMA_BIDIRECTIONAL);
+                                      blk_rq_bytes(rsp), PCI_DMA_BIDIRECTIONAL);
        if (!dma_addr_in)
                goto out_unmap;
 
@@ -2851,8 +2851,11 @@
                smprep = (SmpPassthroughReply_t *)ioc->sas_mgmt.reply;
                memcpy(req->sense, smprep, sizeof(*smprep));
                req->sense_len = sizeof(*smprep);
+                rsp->resid_len = blk_rq_bytes(rsp) - smprep->ResponseDataLength;
+                /*
                req->data_len = 0;
                rsp->data_len -= smprep->ResponseDataLength;
+                */
        } else {
                printk(MYIOC_s_ERR_FMT
                    "%s: smp passthru reply failed to be returned\n",
@@ -2861,10 +2864,10 @@
        }
 out_unmap:
        if (dma_addr_out)
-                pci_unmap_single(ioc->pcidev, dma_addr_out, req->data_len,
+                pci_unmap_single(ioc->pcidev, dma_addr_out, blk_rq_bytes(req),
                                  PCI_DMA_BIDIRECTIONAL);
        if (dma_addr_in)
-                pci_unmap_single(ioc->pcidev, dma_addr_in, rsp->data_len,
+                pci_unmap_single(ioc->pcidev, dma_addr_in, blk_rq_bytes(rsp),
                                  PCI_DMA_BIDIRECTIONAL);
 put_mf:
        if (mf)



All times are GMT -5. The time now is 03:07 AM.