2in1 problem thread. (nvidia kernel module vs X module, and strange workbug phenom)
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
2in1 problem thread. (nvidia kernel module vs X module, and strange workbug phenom)
hey all...got two delicious problems for you all today.
First one should be a cinche.
1: on boot,
(EE) NVIDIA(0): Failed to initialize the NVIDIA kernel module! Please ensure
(EE) NVIDIA(0): that there is a supported NVIDIA GPU in this system, and
(EE) NVIDIA(0): that the NVIDIA device files have been created properly.
(EE) NVIDIA(0): Please consult the NVIDIA README for details.
(EE) NVIDIA(0): *** Aborting ***
somewhere else, i found some logging that says the kernel module VERSION is different from the X module version, and that that's bad. i'll post that as soon as i find where it is...
anyway, the quick fix is easy enough. all i do is kill gdm, `rmmod nvidia`, and restart gdm and presto, it starts, with the right driver...`nvidia` gets loaded.
the kernel module installed is the one from the nvidia binary driver. i'm sure i just have to get the right module loaded at boot, i just don't know how to do it.
one more weird thing...although i can run glx apps without X crashing, and beryl works just fine, on glxgears i get like maybe 15 fps, and some glx apps don't do so hot (scorched3d crashes frequently, some screensavers really choke up) is that just a linux-drivers-blow thing, or is my card not being fully used?
OKAY, problem 2 is a bit more obtuse, i'll be amazed if anyone can help me with it.
2: sometimes when i'm just happily minding my own beeswax (surfing the web, listening to music...nothing cpu intensive) the CPU starts, out of no where, working on something furiously. so furiously, in fact, that X completely stops responding, i can't get to any other terminals, and the one time i did get to a terminal, the computer was so busy it wouldn't even run my "top" command. eventually i just have to cut the juice.
i have a cpu applet in my panel, and it shoots up to 100% and stays there, till it eventually stops showing new data, the computer is so busy. worth noting is that about half of the load is "IOWait" as opposed to actual work. anyway here's a list of apps that i'm usually running when it happens...unfortunately i'm usually running enough apps at one time that i can't really pin it down.
also to note is that
a. it's not the http cache cleaner
b. i'm usually using a lot of my ram, and usually around 50% of my paging file.
okay, thanks for any responses to my problems, let me know if there's any more info i can provide, especially with the second one i really don't know what to post.
here's some system info:
amd64, x2 4200 (dual core)
1gig or so ram
ubuntu edgy 6.10, kernel 2.6.17-11-generic
nvidia 7600GT using driver 1.0-9755 (for amd64)
xorg 7.1
gnome 2.16
When you used nvidia-installer, were there any error messages? You can check the nvidia log in /var/log location. If there are errors, please post them.
hey...swap size is 500meg. nvidia installer worked without any errors, but i found that mismatched version message in the installer log. posted the whole end of the log:
Code:
NVIDIA: left KBUILD.
-> done.
-> Kernel module compilation complete.
-> Kernel messages:
[ 134.678476] NVRM: API mismatch: the client has the version 1.0-9755, but
[ 134.678478] NVRM: this kernel module has the version 1.0-8776. Please
[ 134.678479] NVRM: make sure that this kernel module and all NVIDIA driver
[ 134.678480] NVRM: components have the same version.
[ 138.646395] eth0: no IPv6 routers present
[ 138.764159] NVRM: API mismatch: the client has the version 1.0-9755, but
[ 138.764161] NVRM: this kernel module has the version 1.0-8776. Please
[ 138.764162] NVRM: make sure that this kernel module and all NVIDIA driver
[ 138.764164] NVRM: components have the same version.
[ 142.851234] NVRM: API mismatch: the client has the version 1.0-9755, but
[ 142.851236] NVRM: this kernel module has the version 1.0-8776. Please
[ 142.851238] NVRM: make sure that this kernel module and all NVIDIA driver
[ 142.851239] NVRM: components have the same version.
[ 147.319218] Bluetooth: Core ver 2.8
[ 147.319224] NET: Registered protocol family 31
[ 147.319226] Bluetooth: HCI device and connection manager initialized
[ 147.319246] Bluetooth: HCI socket layer initialized
[ 147.367872] Bluetooth: L2CAP ver 2.8
[ 147.367877] Bluetooth: L2CAP socket layer initialized
[ 147.423885] Bluetooth: RFCOMM socket layer initialized
[ 147.423904] Bluetooth: RFCOMM TTY layer initialized
[ 147.423906] Bluetooth: RFCOMM ver 1.7
[ 176.448744] ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [APC5] -> GSI 16
(level, low) -> IRQ 50
[ 176.449040] PCI: Setting latency timer of device 0000:02:00.0 to 64
[ 176.449320] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 1.0-9755 Mon
Feb 26 23:16:31 PST 2007
-> Installing both new and classic TLS OpenGL libraries.
-> Installing both new and classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility OpenGL libraries? (Answer: Yes)
-> Parsing log file:
-> done.
-> Validating previous installation:
-> done.
-> Uninstalling NVIDIA Accelerated Graphics Driver for Linux-x86_64
(1.0-9755):
-> done.
-> Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for
Linux-x86_64 (1.0-9755) is complete.
-> Searching for conflicting X files:
-> done.
-> Searching for conflicting OpenGL files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64'
(1.0-9755):
executing: '/sbin/ldconfig'...
executing: '/sbin/depmod -aq'...
-> done.
-> Driver file installation is complete.
-> Running post-install sanity check:
-> done.
-> Post-install sanity check passed.
-> Shared memory test passed.
-> Running runtime sanity check:
-> done.
-> Runtime sanity check passed.
-> Would you like to run the nvidia-xconfig utility to automatically update you
r X configuration file so that the NVIDIA X driver will be used when you res
tart X? Any pre-existing X configuration file will be backed up. (Answer: N
o)
-> Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64
(version: 1.0-9755) is now complete. Please update your XF86Config or
xorg.conf file as appropriate; see the file
/usr/share/doc/NVIDIA_GLX-1.0/README.txt for details.
sorry, forgot my xorg. here it is. (edit) i should mention that the x log says everything in here is okay, and i only get errors when it loads the nvidia module.(/edit)
Code:
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 1.0 (buildmeister@builder26) Fri Dec 15 10:40:27 PST 2006
# /etc/X11/xorg.conf (xorg X Window System server configuration file)
#
# This file was generated by dexconf, the Debian X Configuration tool, using
# values from the debconf database.
#
# Edit this file with caution, and see the /etc/X11/xorg.conf manual page.
# (Type "man /etc/X11/xorg.conf" at the shell prompt.)
#
# This file is automatically updated on xserver-xorg package upgrades *only*
# if it has not been modified since the last upgrade of the xserver-xorg
# package.
#
# If you have edited this file but would like it to be automatically updated
# again, run the following command:
# sudo dpkg-reconfigure -phigh xserver-xorg
Section "ServerLayout"
Identifier "Default Layout"
Screen "Default Screen" 0 0
InputDevice "Generic Keyboard"
InputDevice "Configured Mouse"
EndSection
Section "Files"
# path to defoma fonts
FontPath "/usr/share/fonts/X11/misc"
FontPath "/usr/share/fonts/X11/cyrillic"
FontPath "/usr/share/fonts/X11/100dpi/:unscaled"
FontPath "/usr/share/fonts/X11/75dpi/:unscaled"
FontPath "/usr/share/fonts/X11/Type1"
FontPath "/usr/share/fonts/X11/100dpi"
FontPath "/usr/share/fonts/X11/75dpi"
FontPath "/usr/share/fonts/X11/misc"
FontPath "/var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType"
EndSection
Section "Module"
Load "i2c"
Load "bitmap"
Load "ddc"
Load "extmod"
Load "freetype"
Load "glx"
Load "int10"
Load "type1"
Load "vbe"
EndSection
Section "InputDevice"
Identifier "Generic Keyboard"
Driver "kbd"
Option "CoreKeyboard"
Option "XkbRules" "xorg"
Option "XkbModel" "pc105"
Option "XkbLayout" "us"
Option "XkbOptions" "lv3:ralt_switch"
EndSection
Section "InputDevice"
Identifier "Configured Mouse"
Driver "mouse"
Option "CorePointer"
Option "Device" "/dev/input/mice"
# Option "Protocol" "ExplorerPS/2"
Option "Protocol" "ImPS/2"
Option "ZAxisMapping" "4 5"
Option "Emulate3Buttons" "true"
EndSection
Section "Monitor"
Identifier "Acer AL1916W"
Option "DPMS"
EndSection
Section "Device"
Identifier "NVIDIA GeForce 7600GT
Driver "nvidia"
# additions for beryl
Option "DisableGLXRootClipping" "True"
Option "XvmcUsesTextures" "true"
Option "AllowGLXWithComposite" "true"
Option "Coolbits" "1"
Option "RenderAccel" "true"
Option "NoLogo" "true"
EndSection
Section "Screen"
Identifier "Default Screen"
Device "NVIDIA GeForce 7600GT
Monitor "Acer AL1916W"
DefaultDepth 24
# Compiz addition
Option "AddARGBGLXVisuals" "True"
SubSection "Display"
Depth 1
Modes "1280x1024" "1024x768" "832x624" "800x600" "640x480"
EndSubSection
SubSection "Display"
Depth 4
Modes "1280x1024" "1024x768" "832x624" "800x600" "640x480"
EndSubSection
SubSection "Display"
Depth 8
Modes "1280x1024" "1024x768" "832x624" "800x600" "640x480"
EndSubSection
SubSection "Display"
Depth 15
Modes "1280x1024" "1024x768" "832x624" "800x600" "640x480"
EndSubSection
SubSection "Display"
Depth 16
Modes "1280x1024" "1024x768" "832x624" "800x600" "640x480"
EndSubSection
SubSection "Display"
Depth 24
Modes "1440x1024" "1280x1024" "1024x768" "832x624" "800x600" "640x480"
EndSubSection
EndSection
Section "Extensions"
Option "Composite" "Enable"
EndSection
Do you remember how you installed the binary driver?
Did you tell it to update (nvidia-installer --update) or did you download a new one from nvidia?
And do you remember if you installed the 1.0-8776 driver, or if slackware did it for you? (I don't know much about slackware. If they have a different method of installing nvidia proprietary software, it may conflict with nvidia's install method)
It might be best to uninstall the nvidia drivers completely before updating them.
I read somewhere that nvidia drivers need to be patched in order to install them successfully on a 2.6.17 machine. Let's hope it doesn't come to that. The thread I read it on is below. It's for suse, though, not slackware.
hey all...got two delicious problems for you all today.
First one should be a cinche.
1: on boot,
(EE) NVIDIA(0): Failed to initialize the NVIDIA kernel module! Please ensure
(EE) NVIDIA(0): that there is a supported NVIDIA GPU in this system, and
(EE) NVIDIA(0): that the NVIDIA device files have been created properly.
(EE) NVIDIA(0): Please consult the NVIDIA README for details.
(EE) NVIDIA(0): *** Aborting ***
1. Try loading the nvidia driver as the last thing you do during startup sequence, in fact, put the insmod in rc.local to see if that helps. It looks like the driver's loading before the card is fully initialized and /dev entry created, evidenced by:
(EE) NVIDIA(0): that the NVIDIA device files have been created properly.
They are talking about the entries in /proc that the driver uses to communicate with the hardware.
If you load it as the last thing in the startup sequence, this issue should go away.
2. Freezing is a really vague problem. I'd unhook every piece of hardware you own, except what's necessary for the system to run, and see if it still happens. If it does, kill every last service you don't need for the system to run. If it still hangs then, I'd bet on hardware, like a spotty switch, NIC, loose cable, hard drive dying etc. I/O issues are the most common cause of stuff like this. This will happen with a fubar'd CD in the drive, a bad or loose drive cable, intermittent network cable, etc. IOWAIT is "I just sent data to a device and I'm waiting for it to respond with something I understand". This will hang your computer like a former dictator if the driver doesn't get an ACK.
haha okay, i'll cut my comp off...nothing but rice crackers and skim milk. hopefully it is just a bunk cd or something.
as for loading the driver...i thought that the driver doesn't load until X does? you mean load the kernel module last? i think the kernel module is getting loaded on runlevel 2... but my bash is a little rusty. this is S20nvidia_kernel (S20 is ubuntu speak for "enabled" i think)
/etc/rc2.d/S20nvidia_kernel:
Code:
#!/bin/sh
PATH=/sbin:/usr/sbin:/usr/local/sbin:/bin:/usr/bin:/usr/local/bin
# How many cards?
[ -r /etc/default/nvidia-kernel ] && . /etc/default/nvidia-kernel
# test if anything is requested
if [ -z "$NVIDIA_CARDS" ] || [ "$NVIDIA_CARDS" -lt 1 ]; then
# Nothing to do but exit.
exit 0
fi
make_nodes () {
if ! [ -e /dev/nvidiactl ]; then
mknod -m 0660 /dev/nvidiactl c 195 255
chgrp video /dev/nvidiactl
fi
for i in $(seq 0 $(($NVIDIA_CARDS - 1))); do
if ! [ -e /dev/nvidia$i ]; then
mknod -m 0660 /dev/nvidia$i c 195 $i
chgrp video /dev/nvidia$i
fi
done
}
case "$1" in
start|restart|reload|force-reload)
make_nodes
;;
stop)
:
;;
*)
echo "Usage: /etc/init.d/nvidia-kernel {start|stop|restart|reload|force-reload}"
exit 1
;;
esac
exit 0
nvidia isn't in /etc/modules, and i'm not really sure what to do with /etc/modprobe.d/nvidia-kernel-nkc:
Code:
alias char-major-195* nvidia
how about disabling the S20nvidia-kernel, and adding "/etc/init.d/nvidia-kernel start" to my rc.local? would that work? i'm pissing in the dark.
samstar: i'm running ubuntu, not slack...hopefully that's not my prob
For x to start the video driver needs to be loaded. You specify the driver name in your xorg.conf but it needs to be loaded already for x to start.
If the driver's not loaded it will fail. This is why you should set your box to boot up to console and start x manually using startx when using vendor provided video drivers. Less hassle when you need to upgrade the kernel and your video driver dies.
If you boot up to shell you just need to re run the nvidia installation script, as opposed to waiting for things to fail so you can drop to a shell. It's just easier on your constitution.
There's more ways than one to skin this cat. This is just how I've been doing it since nvidia first put out a driver for linux. It's hard for old habits to die.
It's an issue that you deal with so infrequently it really doesn't matter as long as your box works.
ahh I see...I didn't realize it had to be loaded beforehand.
yea, when i used to run deb I'd start X by hand, but since switching to ubuntu i've just left the default boot...anyway, i've done the changes, i'll let you know what happens.
okay sorry it's been a while since i posted, been trying stuff out.
new news:
booting X last/booting X by hand doesn't make a difference
I noticed that the module that gets loaded during boot is actually smaller (nearly half the size) of the one that x loads after I remove the boot one. where can I change the module that loads at boot? for that matter, what do I even change it to? both modules have the same name. are they in fact two differnt modules, or is there something fishier afoot.
Did you solve this issue? I'm having this same problem now too. I have to manually rmmod nvidia, modprobe nvidia, then restart the GDM, before I can get into X.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.