kernel oops or nvidia driver problem ?
my sys log on kernel2.6 show :
Badness in pci_find_subsys at drivers/pci/search.c:167 Call Trace: [pci_find_subsys+232/240] pci_find_subsys+0xe8/0xf0 [pci_find_device+47/64] pci_find_device+0x2f/0x40 [pci_find_slot+40/80] pci_find_slot+0x28/0x50 [_end+279580788/1070243400] os_pci_init_handle+0x39/0x68 [nvidia] [_end+278091431/1070243400] _nv001243rm+0x1f/0x24 [nvidia] [_end+279428957/1070243400] _nv000816rm+0x2f5/0x384 [nvidia] [_end+278808436/1070243400] _nv003801rm+0xd8/0x100 [nvidia] [_end+279427735/1070243400] _nv000809rm+0x2f/0x34 [nvidia] [_end+278812056/1070243400] _nv003816rm+0xf0/0x104 [nvidia] [_end+278806102/1070243400] _nv003795rm+0x6ea/0xaec [nvidia] [_end+278192303/1070243400] _nv004046rm+0x3a3/0x3b0 [nvidia] [_end+279247343/1070243400] _nv001476rm+0x277/0x45c [nvidia] [_end+278102498/1070243400] _nv000896rm+0x4a/0x64 [nvidia] [_end+278108668/1070243400] rm_isr_bh+0xc/0x10 [nvidia] [_end+279570681/1070243400] nv_kern_isr_bh+0xf/0x13 [nvidia] [tasklet_action+70/112] tasklet_action+0x46/0x70 [do_softirq+144/160] do_softirq+0x90/0xa0 [do_IRQ+253/304] do_IRQ+0xfd/0x130 [common_interrupt+24/32] common_interrupt+0x18/0x20 Badness in pci_find_subsys at drivers/pci/search.c:167 Call Trace: [pci_find_subsys+232/240] pci_find_subsys+0xe8/0xf0 [pci_find_device+47/64] pci_find_device+0x2f/0x40 [pci_find_slot+40/80] pci_find_slot+0x28/0x50 [_end+279580788/1070243400] os_pci_init_handle+0x39/0x68 [nvidia] [_end+278091431/1070243400] _nv001243rm+0x1f/0x24 [nvidia] [_end+278816933/1070243400] _nv003797rm+0xa9/0x128 [nvidia] [_end+279261929/1070243400] _nv001490rm+0x55/0xe4 [nvidia] [_end+279429020/1070243400] _nv000816rm+0x334/0x384 [nvidia] [_end+278808436/1070243400] _nv003801rm+0xd8/0x100 [nvidia] [_end+279427735/1070243400] _nv000809rm+0x2f/0x34 [nvidia] [_end+278812056/1070243400] _nv003816rm+0xf0/0x104 [nvidia] [_end+278806102/1070243400] _nv003795rm+0x6ea/0xaec [nvidia] [_end+278192303/1070243400] _nv004046rm+0x3a3/0x3b0 [nvidia] [_end+279247343/1070243400] _nv001476rm+0x277/0x45c [nvidia] [_end+278102498/1070243400] _nv000896rm+0x4a/0x64 [nvidia] [_end+278108668/1070243400] rm_isr_bh+0xc/0x10 [nvidia] [_end+279570681/1070243400] nv_kern_isr_bh+0xf/0x13 [nvidia] [tasklet_action+70/112] tasklet_action+0x46/0x70 [do_softirq+144/160] do_softirq+0x90/0xa0 [do_IRQ+253/304] do_IRQ+0xfd/0x130 [common_interrupt+24/32] common_interrupt+0x18/0x20 is this kernel oops or problem with nvidia driver please help how to solve the problem ? |
Not a Success Story - moved to Linux-General
|
I have the same situation w/ 2.6.2 on an Asus A7N266-VM w/ nforce1. I haven't yet found anything that doesn't work.
Code:
013a5d4>] buffered_rmqueue+0xa4/0x110 |
system info
hello rickenbacherus
what is your system ? i use 2.6.3 (2.6.2 2.6.1 2.6.0) kernel with debian -sid maked with make-kpkg nvidia driver i also compiled with make-kpkg (but with nvidia installer and before minion.de patch i have same error) curent nvidia driver version 1.0.5336-4 I thy without via_agp and agpgart modules and same error thy without hotplug and same error try desable acpi, apm no effect same error recive in my syslog btw linux is working only XFree86 use 100% cpu mouse working too but keyboard is off lspci 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03) 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP] 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06) 00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 16) 00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 16) 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) 00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 00:0d.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 0a) 00:0d.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 0a) 00:13.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366/368/370/370A/372 (rev 03) 01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1) cat /proc/interrupts CPU0 0: 64271910 XT-PIC timer 1: 5378 XT-PIC i8042 2: 0 XT-PIC cascade 5: 4926372 XT-PIC nvidia 8: 4 XT-PIC rtc 9: 60139 XT-PIC EMU10K1 10: 187518 XT-PIC uhci_hcd, uhci_hcd, eth0 11: 0 XT-PIC acpi 12: 127541 XT-PIC i8042 14: 59629 XT-PIC ide0 15: 1 XT-PIC ide1 NMI: 0 LOC: 64269617 ERR: 75012 MIS: 0 cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 4 model name : AMD Athlon(tm) processor stepping : 2 cpu MHz : 1000.214 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow bogomips : 1970.17 lsmod Module Size Used by snd_pcm_oss 48676 0 snd_mixer_oss 17152 1 snd_pcm_oss nvidia 2070632 12 snd_emu10k1 66244 5 snd_rawmidi 20576 1 snd_emu10k1 snd_pcm 86308 3 snd_pcm_oss,snd_emu10k1 snd_timer 21764 1 snd_pcm snd_seq_device 6728 2 snd_emu10k1,snd_rawmidi snd_ac97_codec 59972 1 snd_emu10k1 snd_page_alloc 9220 2 snd_emu10k1,snd_pcm snd_util_mem 3264 1 snd_emu10k1 snd_hwdep 7456 1 snd_emu10k1 snd 46692 19 snd_pcm_oss,snd_mixer_oss,snd_emu10k1,snd_rawmidi,snd_pcm,snd_timer,snd_seq_device,snd_ac97_codec,sn d_util_mem,snd_hwdep soundcore 7392 1 snd uhci_hcd 29584 0 usbcore 91036 3 uhci_hcd via_agp 5184 1 agpgart 26408 2 via_agp rtc 10424 0 |
I've been having a similar problem since I flashed my MB BIOS (which repaired my problem with corrupted console graphics).
After that programs have been crashing in certain points and more often not starting until system is rebooted once or twice. my /var/log/messages has "Badness in pci_find_subsys at drivers/pci/search" too. Some relevant parts: kernel: 2.6.4 nVidia module: 5336 motherboard: MSI K7T Pro2 (BIOS 3.6) processor: Duron 700 graphics: Leadtek GeForce 4 MX 440 |
stability variant
in /etc/X11/XF86Config-4 for section device
i use: Option "NvAGP" "0" # disable agp and error go away |
Other (better) solution
I had the same problem with kernel 2.6.5 and XFree86-4.3.0-55 (fedora core1).
Note that the lockup of X only started after upgrading from RedHat 9 with custom compiled KDE to fedora core 1 + KDE rpms of that distribution. Using the same kernel and nvidia driver before, I had no lock up (but still the syslog errors mentioned in this thread). You can find information about this instability on http://www.minion.de/nvidia.html stating: Quote:
saying: Q: My system runs, but seems unstable. What is wrong? A: Your stability problems may be AGP-related. See Appendix F for details. Appendix D: The following driver options are supported by the NVIDIA XFree86 driver: Option "NvAGP" "integer" Configure AGP support. Integer argument can be one of: 0 : disable agp 1 : use NVIDIA's internal AGP support, if possible 2 : use AGPGART, if possible 3 : use any agp support (try AGPGART, then NVIDIA's AGP) Please note that NVIDIA's internal AGP support cannot work if AGPGART is either statically compiled into your kernel or is built as a module, but loaded into your kernel (some distributions load AGPGART into the kernel at boot up). Default: 3 (the default was 1 until after 1.0-1251). And appendix F: There are several choices for configuring the NVIDIA kernel module's use of AGP: you can choose to either use NVIDIA's AGP module (NVAGP), or the AGP module that comes with the linux kernel (AGPGART). This is controlled through the "NvAGP" option in your XF86Config file: Option "NvAgp" "0" ... disables AGP support Option "NvAgp" "1" ... use NVAGP, if possible Option "NvAgp" "2" ... use AGPGART, if possible Option "NvAGP" "3" ... try AGPGART; if that fails, try NVAGP The default is 3 (the default was 1 until after 1.0-1251). You should use the AGP module that works best with your AGP chip set. If you are experiencing problems with stability, you may want to start by disabling AGP and observing if that solves the problems. Then you can experiment with either of the other AGP modules. You can query the current AGP status at any time via the /proc filesystem interface (see APPENDIX O: PROC INTERFACE). To use the Linux AGPGART module, it will need to be compiled with your kernel, either statically linked in, or built as a module. NVIDIA AGP support cannot be used if AGPGART is loaded in the kernel. It is recommended that you compile AGPGART as a module and make sure that it is not loaded when trying to use NVIDIA AGP. Please also note that changing AGP drivers generally requires a reboot before the changes actually take effect. The following AGP chipsets are supported by NVIDIA's AGP; for all other chipsets it is recommended that you use the AGPGART module. o Intel 440LX o Intel 440BX o Intel 440GX o Intel 815 ("Solano") o Intel 820 ("Camino") o Intel 830 o Intel 840 ("Carmel") o Intel 845 ("Brookdale") o Intel 845G o Intel 850 ("Tehama") o Intel 860 ("Colusa") o AMD 751 ("Irongate") o AMD 761 ("IGD4") o AMD 762 ("IGD4 MP") o VIA 8371 o VIA 82C694X o VIA KT133 o VIA KT266 o RCC 6585HE o Micron SAMDDR ("Samurai") o Micron SCIDDR ("Scimitar") o nForce AGP o nForce 2 AGP o ALi 1621 o ALi 1631 o ALi 1647 o ALi 1651 o ALi 1671 o SiS 630 o SiS 633 o SiS 635 o SiS 645 o SiS 730 o SiS 733 o SiS 735 o SiS 745 If you are experiencing AGP stability problems, you should be aware of the following: o Support for the processor's Page Size Extension on Athlon Processors Some linux kernels have a conflicting cache attribute bug that is exposed by advanced speculative caching in newer AMD Athlon family processors (AMD Athlon XP, AMD Athlong 4, AMD Athlon MP, and Models 6 and above AMD Duron). This kernel bug usually shows up under heavy use of accelerated 3D graphics with an AGP graphics card. Linux distributions based on kernel 2.4.19 and later *should* incorporate the bug fix. But, older kernels require help from the user in ensuring that a small portion of advanced speculative caching is disabled (normally done through a kernel patch) and a boot option is specified in order to apply the whole fix. NVIDIA's driver automatically disables the small portion of advanced speculative caching for the affected AMD processors without the need to patch the kernel; it can be used even on kernels which do already incorporate the kernel bug fix. Additionally, for older kernels the user performs the boot option portion of the fix by explicitly disabling 4MB pages. This can be done from the boot command line by specifying: mem=nopentium Or by adding the following line to etc/lilo.conf: append = "mem=nopentium" o AGP drive strength BIOS setting (Via based mainboards) Many Via based mainboards allow adjusting the AGP drive strength in the system BIOS. The setting of this option largely affects system stability, the range between 0xEA and 0xEE seems to work best for NVIDIA hardware. Setting either nibble to 0xF generally restults in severe stability problems. If you decide to experiment with this, you need to be aware of the fact that you are doing so at your own risk and that you may render your system unbootable with improper settings until you reset the setting to a working value (w/ a PCI graphics card or by resetting the BIOS to its default values). o System BIOS version Make sure to have the latest system BIOS provided by the board manufacturer. o AGP Rate You may want to decrease the AGP rate setting if you are seeing lockups with the value you are currently using. You can do so by extracting the .run file: sh NVIDIA-Linux-x86-1.0-5336-pkg1.run --extract-only cd NVIDIA-Linux-x86-1.0-5336-pkg1/usr/src/nv/ Then edit os-registry.c, and make the following changes: - static int NVreg_ReqAGPRate = 7; + static int NVreg_ReqAGPRate = 4; /* force AGP Rate to 4x */ or + static int NVreg_ReqAGPRate = 2; /* force AGP Rate to 2x */ or + static int NVreg_ReqAGPRate = 1; /* force AGP Rate to 1x */ and then remove the two leading underscores: - { "__ReqAGPRate", &NVreg_ReqAGPRate }, + { "ReqAGPRate", &NVreg_ReqAGPRate }, Then recompile and load the new kernel module. On Athlon motherboards with the VIA KX133 or 694X chip set, such as the ASUS K7V motherboard, NVIDIA drivers default to AGP 2x mode to work around insufficient drive strength on one of the signals. You can force AGP 4x by setting NVreg_EnableVia4x to 1. Note that this may cause the system to become unstable. On ALi1541 and ALi1647 chipsets, NVIDIA drivers disable AGP to work around timing issues and signal integrity issues. You can force AGP to be enabled on these chipsets by setting NVreg_EnableALiAGP to 1. Note that this may cause the system to become unstable. ========================================= Therefore, the first thing you should try is to remove the AGPGART from your kernel configuration (CONFIG_AGP not defined, or as module and not loaded) (this may require recompilation of your kernel), and then setting Option "NvAGP" "1" in your XF86Config instead of "2" or "3". Regards, Carlo Wood |
All times are GMT -5. The time now is 12:18 AM. |