LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Odd pci errors (https://www.linuxquestions.org/questions/linux-hardware-18/odd-pci-errors-616584/)

Tomasu 01-27-2008 04:30 AM

Odd pci errors
 
I'm getting some odd errors from two of my PCI devices, one is a WinTV 401/DBX card, and the other is a Dlink GbE PCI card (based on skge).

heres a fresh sample:
Quote:

bttv0: OCERR @ 1fdad014,bits: HSYNC OFLOW OCERR*
skge 0000:00:0a.0: PCI error cmd=0x157 status=0x22b0
skge 0000:00:0a.0: unable to clear error (so ignoring them)
NETDEV WATCHDOG: eth0: transmit timed out
Thats a single sample from dmesg, but they don't usually fail at the same time. Usually I find I have to reset one, and then a few hours (of use) later I have to reset the other.

I've spent several hours looking these up and getting nowhere, I'd appreciate some help. Oh, and I've also tried booting with and without ACPI and APIC (I've had issues with APIC and ACPI on slightly older hardware).

aus9 01-27-2008 04:43 AM

that it varies could make it real tricky....I am suspecting hw detection irq conflicts.....and pls do not insert any other devices to keep it simple....some hw could also be failing...lets hope not

1) do a number of full reboots...no suspend to ram etc pls
on each reboot
Code:

su
cat /proc/interrupts > /irqN.txt...where n is 1 then 2 etc
lsmod > /modulesN.txt


after say 4 reboots see if there are any differences and post what they are or that there are none.

the modulesfile is harder to check by eye so you may want to run a diff command or I prefer a gui xxdiff.


2) to eliminate hw questions can you confirm...you have not made any hw changes recently ....and assuming you know about static electricity prevention...pls push down on all pci cards to make sure they are still properly seated pls

3) can you confirm if these devices worked correctly under a different operating system or distro pls

Tomasu 01-27-2008 05:13 AM

Quote:

Originally Posted by aus9 (Post 3036460)
that it varies could make it real tricky....I am suspecting hw detection irq conflicts.....and pls do not insert any other devices to keep it simple....some hw could also be failing...lets hope not

1) do a number of full reboots...no suspend to ram etc pls
on each reboot
Code:

su
cat /proc/interrupts > /irqN.txt...where n is 1 then 2 etc
lsmod > /modulesN.txt


after say 4 reboots see if there are any differences and post what they are or that there are none.

the modulesfile is harder to check by eye so you may want to run a diff command or I prefer a gui xxdiff.


2) to eliminate hw questions can you confirm...you have not made any hw changes recently ....and assuming you know about static electricity prevention...pls push down on all pci cards to make sure they are still properly seated pls

3) can you confirm if these devices worked correctly under a different operating system or distro pls


1. I'll get back to you on this (its currently working, and Its being used to watch TV :o)

2+3. This is the fun bit, all the hardware has changed, the box used to be my main desktop, but I bought a new one 6 months ago, and this machine has sat a little. I've since installed a GbE nic, which is new, and the TV card which worked fine the last time I used it.

I have noticed that my server, with another identical GbE card has similar pci errors when under SUPER HIGH load. But it hasn't happened in a while. It makes me thing this particular error stems from bad drivers? But nothing explains the tv card errors so far.

Tomasu 01-27-2008 06:15 AM

Quote:

Originally Posted by Tomasu (Post 3036476)
1. I'll get back to you on this (its currently working, and Its being used to watch TV :o)

All done now, they are all the same, at least with ACPI and APIC off, there will be no dynamic mapping of IRQs, and it seems I got lucky with slot placement, since no major devices seem to be sharing IRQs.

In case it might help, heres a bunch of current hopefully relevant information:
Code:

root@chauncey:~# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:07.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
00:07.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
00:0a.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
00:0f.0 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
01:00.0 VGA compatible controller: ATI Technologies Inc RV350 AR [Radeon 9600]
01:00.1 Display controller: ATI Technologies Inc RV350 AR [Radeon 9600] (Secondary)
root@chauncey:~# cat /proc/interrupts
          CPU0
  0:      27948    XT-PIC-XT        timer
  1:        10    XT-PIC-XT        i8042
  2:          0    XT-PIC-XT        cascade
  3:          2    XT-PIC-XT        ehci_hcd:usb5, VIA8237
  5:      1351    XT-PIC-XT        eth0, uhci_hcd:usb3, uhci_hcd:usb4
  8:          2    XT-PIC-XT        rtc
  9:          0    XT-PIC-XT        acpi
 10:      16848    XT-PIC-XT        bttv0, Bt87x audio
 11:      20675    XT-PIC-XT        uhci_hcd:usb1, uhci_hcd:usb2, radeon@pci:0000:01:00.0
 12:        104    XT-PIC-XT        i8042
 14:      6859    XT-PIC-XT        ide0
 15:      4894    XT-PIC-XT        ide1
NMI:          0
LOC:          0
ERR:          0
MIS:          0
root@chauncey:~# lsmod
Module                  Size  Used by
radeon                113344  2
drm                    76532  3 radeon
ac                      6404  0
battery                13064  0
ipv6                  246628  25
xfs                  510744  1
snd_pcm_oss            39872  0
snd_mixer_oss          16064  1 snd_pcm_oss
loop                  17860  0
snd_bt87x              15268  0
tsdev                  8928  0
bt878                  11704  0
tuner                  61192  0
tvaudio                23228  0
msp3400                29600  0
snd_via82xx            27704  0
gameport              15880  1 snd_via82xx
snd_ac97_codec        93156  1 snd_via82xx
button                  9104  0
ac97_bus                3104  1 snd_ac97_codec
snd_pcm                72996  4 snd_pcm_oss,snd_bt87x,snd_via82xx,snd_ac97_codec
psmouse                37136  0
bttv                  167028  1 bt878
snd_timer              21924  1 snd_pcm
snd_page_alloc        10920  3 snd_bt87x,snd_via82xx,snd_pcm
snd_mpu401_uart        8896  1 snd_via82xx
video_buf              24452  1 bttv
firmware_class        10240  1 bttv
lirc_atiusb            18048  0
ir_common              34980  1 bttv
lirc_dev              14612  1 lirc_atiusb
snd_rawmidi            23424  1 snd_mpu401_uart
snd_seq_device          8620  1 snd_rawmidi
serio_raw              7492  0
compat_ioctl32          2208  1 bttv
i2c_algo_bit            6756  1 bttv
shpchp                31860  0
snd                    49124  10 snd_pcm_oss,snd_mixer_oss,snd_bt87x,snd_via82xx,snd_ac97_codec,snd_pcm,snd_timer,snd_mpu401_uart,snd_rawmidi,snd_seq_device
btcx_risc              5608  1 bttv
pci_hotplug            28896  1 shpchp
soundcore              8352  1 snd
tveeprom              15792  1 bttv
i2c_viapro              9300  0
i2c_core              24032  7 tuner,tvaudio,msp3400,bttv,i2c_algo_bit,tveeprom,i2c_viapro
via_agp                10784  1
videodev              27456  1 bttv
v4l2_common            17472  5 tuner,tvaudio,msp3400,bttv,videodev
v4l1_compat            13380  2 bttv,videodev
agpgart                32488  2 drm,via_agp
pcspkr                  4032  0
rtc                    13784  0
evdev                  10240  4
ext3                  122344  3
jbd                    56136  1 ext3
mbcache                9088  1 ext3
dm_mirror              22400  0
dm_snapshot            17700  0
dm_mod                53408  9 dm_mirror,dm_snapshot
raid456              123408  1
async_xor              4960  1 raid456
async_memcpy            3680  1 raid456
async_tx                8620  3 raid456,async_xor,async_memcpy
xor                    15144  2 raid456,async_xor
raid1                  23424  1
md_mod                75188  4 raid456,raid1
ide_cd                37152  0
cdrom                  33376  1 ide_cd
ide_disk              17312  10
via82cxxx              9476  0 [permanent]
generic                5572  0 [permanent]
ide_core              113508  4 ide_cd,ide_disk,via82cxxx,generic
ata_generic            8356  0
ehci_hcd              32428  0
uhci_hcd              23920  0
skge                  39120  0
usbcore              131624  4 lirc_atiusb,ehci_hcd,uhci_hcd
libata                114480  1 ata_generic
scsi_mod              137388  1 libata
thermal                16348  0
processor              35752  1 thermal
fan                    5860  0
root@chauncey:~# uname -a
Linux chauncey 2.6.23-1-686-bigmem #1 SMP Fri Dec 21 14:39:17 UTC 2007 i686 GNU/Linux


onebuck 01-27-2008 09:22 AM

Hi,

Did you try changing the slot assignment for the pci cards?

aus9 01-27-2008 04:01 PM

well I am glad it ain't irq conflicts....but looking at your first post...since that is not occurring can you post a link to your full /var/log/dmesg

I use www.rip.com which is free for small files.

thinking out loud...I wonder if you need a module preload for the ethernet before the tv card?...I would like to know if your dmesg shows the current working sequence...whatever it is.....and then when you have a fail...keep a copy of that and we might see the hw is detected out-of-sequence...leading to a module load fail?

of course with tricky hw, it may the one we do not see....that is the issue...so its nice to see your working dmesg.

Tomasu 01-27-2008 09:08 PM

Quote:

Originally Posted by aus9 (Post 3036890)
well I am glad it ain't irq conflicts....but looking at your first post...since that is not occurring can you post a link to your full /var/log/dmesg

I use www.rip.com which is free for small files.

thinking out loud...I wonder if you need a module preload for the ethernet before the tv card?...I would like to know if your dmesg shows the current working sequence...whatever it is.....and then when you have a fail...keep a copy of that and we might see the hw is detected out-of-sequence...leading to a module load fail?

of course with tricky hw, it may the one we do not see....that is the issue...so its nice to see your working dmesg.

Well, when I said "currently working", it works for a while, then I get an error some time later, usually hours later.

http://pastebin.ca/875555

That's the current dmesg log.

aus9 01-27-2008 10:00 PM

yeah thanks for that...and for line numbers I use F11 with my text editor to get them.

a quick look has these lines of interest
lines 339 & 340 have i2c properties not installing....its possible if you did a vanilla kernel and enabled full i2c support for this card...you MAY remove these errors.

2) lines 354 357 appear to be the source of your hw issue.

3) the last time I checked Gentoo is supposed to get you to compile your own kernel, have you done so?

it might just be an easy step to read your linux documentation for drivers and enable more in or allow more modules....then do your gentoo modules etc

but I do not use Gentoo so can not help with those steps.

Tomasu 01-27-2008 10:31 PM

Quote:

Originally Posted by aus9 (Post 3037143)
yeah thanks for that...and for line numbers I use F11 with my text editor to get them.

a quick look has these lines of interest
lines 339 & 340 have i2c properties not installing....its possible if you did a vanilla kernel and enabled full i2c support for this card...you MAY remove these errors.

2) lines 354 357 appear to be the source of your hw issue.

3) the last time I checked Gentoo is supposed to get you to compile your own kernel, have you done so?

it might just be an easy step to read your linux documentation for drivers and enable more in or allow more modules....then do your gentoo modules etc

but I do not use Gentoo so can not help with those steps.

This is a debian sid/unstable box.


edit: also, the bt878* modules are from alsa to handle the audio. Which, odly enough, works just fine, even with the odd exit error. Theres two separate devices on the TV card, the Video Capture Device, and the Audio Capture Device, both of which seem to work, untill they stop working...

aus9 01-28-2008 02:21 AM

ok I am prepared to admit maybe not hw but sw.

can you see any pattern to what you were doing at the same time as when either of those devices fails?

I am now thinking, maybe its a sound server issue, I do not use Debian either, but are you using KDE by any chance? In which case we can fix some sound server issues thru the control center

Tomasu 01-28-2008 02:53 AM

Quote:

Originally Posted by aus9 (Post 3037286)
ok I am prepared to admit maybe not hw but sw.

can you see any pattern to what you were doing at the same time as when either of those devices fails?

I am now thinking, maybe its a sound server issue, I do not use Debian either, but are you using KDE by any chance? In which case we can fix some sound server issues thru the control center

It is running KDE, but the onboard sound supports multiple hardware streams, so artsd can have its own dedicated channel while mythtv gets one as well. And its not the onboard sound that's having the problem, its the capture device on the WinTV card.

aus9 01-28-2008 06:42 AM

fair enough...does it fail after running after a certain amount of time?

I know you have already attempted to eliminate power saving issues but I am running out of ideas.

does it happen in synch with your crontab jobs?
/etc/crontab....first 4 numbers are mm hh

Tomasu 01-28-2008 10:57 AM

Quote:

Originally Posted by aus9 (Post 3037516)
fair enough...does it fail after running after a certain amount of time?

I know you have already attempted to eliminate power saving issues but I am running out of ideas.

does it happen in synch with your crontab jobs?
/etc/crontab....first 4 numbers are mm hh

I'll have to wait and see. I've made some changes, removed half my ram (didnt need to, but eh), and swapped out the Radeon 9600xt for an older 9200 that I feel more confident about (the 9600xt had its fan replaced after it failed, its possible the 9600xt might have been damaged in some way, even though it "seems" to work).

I've been watching some tv on it for a few hours now, with no pci errors. But that's not saying much, the errors pop up semi randomly.

Tomasu 01-28-2008 02:24 PM

Quote:

Originally Posted by Tomasu (Post 3037736)
I'll have to wait and see. I've made some changes, removed half my ram (didnt need to, but eh), and swapped out the Radeon 9600xt for an older 9200 that I feel more confident about (the 9600xt had its fan replaced after it failed, its possible the 9600xt might have been damaged in some way, even though it "seems" to work).

I've been watching some tv on it for a few hours now, with no pci errors. But that's not saying much, the errors pop up semi randomly.

Still getting these darned errors. Its really annoying.

As for cron jobs, no, it doesn't seem to be occurring with cron jobs.

aus9 01-28-2008 04:09 PM

have you tried KnoppMyth?

http://mysettopbox.tv/knoppmyth.html

http://www.knoppmythwiki.org/index.p...kingComponents


All times are GMT -5. The time now is 05:32 AM.