LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Nvidia drivers causing X lockup? (https://www.linuxquestions.org/questions/slackware-14/nvidia-drivers-causing-x-lockup-297427/)

Gato Azul 03-03-2005 11:36 PM

Nvidia drivers causing X lockup?
 
Normally I'd be the one offering crazy ideas {ones that probably wouldn't work :rolleyes:}, but this time I was fresh out of them. Here was the deal: Every so often my machine would seemingly lock up. The keyboard stopped responding (num/caps/scroll lock wouldn't respond), ctrl + alt + backspace didn't kill X, the mouse did move, but the mouse pointer stayed the same (i.e. if the text selection cursor is showing, it continues to stay the current cursor, even when moved off of text). At first, I thought perhaps it could be a kernel panic, though my lock lights on my keyboard weren't flashing, so that couldn't be it. I finally decided to use my laptop to go see if I could ssh into my box. It worked fine and everything was running. I ran lsmod to see if maybe the keyboard driver had died, but all was well. I then ran top and saw that X was taking up 84.3% of the CPU. I made sure to note the time that the lockup occured -- 10:18 pm and tried to capture as much info as I could before killing X. I did that, killed X, went back to my desktop and everything was fine and dandy. X had respawned and I could log back in, all was well. Since this wasn't the first time it's happened, I decided I'd had enough of this -- after all, it's GNU/Linux (and Slackware in particular). Why should I have to put up with lockups?

So I did some searching, trying to figure out what could cause X to lock up like that. I'm running Slackware 10.0 (updated with security updates, nothing more), XFCE 4.2, have the latest NVIDIA drivers 6629, a GeForce4 MX420, and at the time was only running Firefox 1.0.1. After Googling and searching the LinuxQuestions.org forums, I came across two interesting threads. One person seemed to have a similar problem with Slax, although my screen never went blank. It just stayed on what it was showing at the time, with no response to anything I did other than having mouse movement. Another thread had some more useful information, however, as this person had some similar problems that I had, but thought it was related to the NVIDIA driver (6629 coincidentally). I did what the last post said and checked the /var/log/messages for NVRM. I did that, but nothing. I thought just for fun that I'd check /var/log/syslog for NVRM and lo and behold, here's what came up:

Mar 3 22:18:22 DarkWaterLabs kernel: NVRM: Xid: 6, PE0000 047c 00e6e7ea 00e6e7e8 00000000 00e6e7ea

Right here on March 3rd, at 22:18, or 10:18 (right when my machine locked up), logged to the syslog was the evil NVRM message. But what did this mean? Some more Googling. I came across this post with someone who had the same problem as me. Evidently that log message means the NVIDIA driver caused OpenGL to crash and wiped out X with it (is that right?).

So, now I'm here looking at this patch and wondering before I completely destroy my system (I hate patching things) if anyone else has had similar problems to mine and/or tried the nvidia driver patch? Am I completely out of my mind about what's causing this? If this all _is_ on target, hopefully this post will help out some others who have experienced this superly frustrating problem. Also, if anyone else has any suggestions/alternatives/fixes/advice/anything, I'm all ears! Thanks so much :)

BrianW 03-04-2005 03:56 PM

Awhile back I had problems with X locking up for a few seconds or either just freezing completely requiring a reboot. I noticed when X would lock up for a few seconds to a minute when playing a game, after X unfroze I noticed horrible graphic problems, I would try glxgears and it would have no fps drop but it would look very horrible, leading me to think it was a glx or driver related crash.
Upon further inspection, I opened my window since it was about 0-20F outside when this was happening and the crashes disappeared. I yanked out my graphics card and then pulled the fan off and found it was no longer working. I ordered a new fan and everything worked flawless again. Not sure if this is your problem but thought I'd toss it out anywho just in case.

Good luck

EDIT: While I remeber, I believe I use to get some errors in dmesg when my graphics card overheated on me. I did search for a few days before I found out the real problem and I did try all sorts of patches for my driver and rolling back to previous drivers also. That patch looks like something I used on my system during that period (this was actually ~ a month ago) and I didn't notice any issues when I had that patched for about a week.

funkateer 03-05-2005 08:54 AM

I had exactly the same problem, then applied the patch and it seems to work ok. I haven't had any NVRM crashes anymore.

Seppel 03-05-2005 04:25 PM

Hi Gato Azul,

same problem here - I just installed the patch and hopefully it will work. If you need, here's a small howto from gui:

1. Make sure you got the Nvidia-driver (NVIDIA-Linux-x86-1.0-6629-pkg1.run) and the patch saved on your hd.
2. Open a virtual console, type su, type your root password
3. type "init 3", this will shut down X and get you into the text mode.
4. Log in with root
5. Go to directory where the Nvidia-installer is
6. type "./NVIDIA-Linux-x86-1.0-6629-pkg1.run --extract-only"
7. type "cd NVIDIA-Linux-x86-1.0-6629-pkg1"
8. copy the patch file to current location; "cp /path/to/NVIDIA_kernel-1.0-6629-1161283.diff.txt ."
9. type "patch -p0 <NVIDIA_kernel-1.0-6629-1161283.diff.txt", this will patch the Nvidia driver
10. type "./nvidia-installer" and follow the instructions
11. type startx and look if the new kernel module is working; if so, do:
12. Log out of X
13. type "init 4"

If the new module or anything else does not work, tell us :-)

Greetings and best wishes for success,

Seppel

ciotog 04-01-2005 02:56 AM

I've found that using the nvidia AGP driver instead of the kernel AGP driver was the only solution that worked for me.

Simply add or change the following line in the "Device" section of xorg.conf:

Code:

        Option          "NvAgp" "1"
Haven't had a lockup since, and performance is excellent.

netfunk 04-12-2005 03:48 PM

Ditto
 
I've been having the exact same problem. Screen contents remain displayed on lockup, and the mouse sometimes still moves, but no response, no ctrl-alt-backspace or virtual terminals for me, and the NVRM: Xid message shows up in my /var/log/syslog.

I'm running Debian Testing sarge, circa now, 12 April 2005, with driver 7176 (whatever the latest 1.0-patchlevel is right now), and I have a GeForce4 MX400 (or 420, idk) and kernel 2.6.11.6. Hopefully the patch applies against -7176 (with or without tweaking) and see if that changes anything. It's really annoying because these lockups occur about once a day.

If that doesn't work I could try adding Option "NvAgp" 1 to my XFree86Config-4 file.

Y0jiMb0 08-10-2005 10:20 AM

It works!
 
Quote:

I've found that using the nvidia AGP driver instead of the kernel AGP driver was the only solution that worked for me.

Simply add or change the following line in the "Device" section of xorg.conf:

code:

Option "NvAgp" "1"



Haven't had a lockup since, and performance is excellent.
Thanks ciotog!!!
It works!

Best regards

LiquidSlumber 02-15-2006 03:11 PM

I was having a similar problem with my computer. The gui would lock up completely, I couldn't even move the mouse, but I could still ssh in to my computer. I added the Option "NvAgp" "1" thing to my xorg.conf in the device section, and I think, just maybe, that it fixed my computer. I'll have to test it more, but I know can do things that would almost always crash my computer like play a music video via mythtv!

CJ Chitwood 09-18-2007 09:11 AM

It works!
 
Quote:

Originally Posted by ciotog (Post 1566355)
I've found that using the nvidia AGP driver instead of the kernel AGP driver was the only solution that worked for me.

Simply add or change the following line in the "Device" section of xorg.conf:

Code:

        Option          "NvAgp" "1"
Haven't had a lockup since, and performance is excellent.

THANK YOU SO MUCH!
DANK U ZEER!
MERCI BEAUCOUP!
VIELEN DANK!
GRAZIE MOLTO!
本当にありがとう! (gotta love free translators! Hope they're accurate!)
대단히 감사합니다!
OBRIGADO MUITO MUITO!
Большое спасибо!
¡MUCHOS GRACIAS!


I don't know how I missed so simple of a line... I have been putting it off, putting it off, putting it off, then finally decided to start searching for the answer. I've gone through about 4 different versions of their drivers, the latest incarnation being .7185 I think, all with the same problem... I would start loading X, and at my logon screen, it would get about 4/5 of the image from top to bottom, and about 2/5 of the next line drawn, then freeze. I'd have mouse, but no keyboard. The only thing other than the reset button that worked for me was the "Magic SysRq Key", which is hardly any better.

Sir, I thank you profusely for this... I've been doing without for so long because of this problem, and trying different things.... the driver worked before, with the only changes being that I reinstalled my entire O/S (Debian -- Etch, I think... or Sarge? 3.1 with some 4.0 elements).



Thank you again!


FWIW: If future readers are unsure, put the line mentioned in the quote above right after the line that says whether to use "nv" or "nvidia" as the driver.

Cheers!

simcox1 09-18-2007 10:03 AM

Option "NvAgp" "1"

That means to use the AGP that comes with the nvidia driver, as opposed to the AGP in the kernel. The default is "2", which means use the one in the kernel if possible. So if you were having problems, it was likely due to the AGP built in to the kernel. Now that you're using the driver AGP your problem is solved.

H_TeXMeX_H 09-18-2007 12:27 PM

Quote:

Originally Posted by BrianW (Post 1509949)
Awhile back I had problems with X locking up for a few seconds or either just freezing completely requiring a reboot. I noticed when X would lock up for a few seconds to a minute when playing a game, after X unfroze I noticed horrible graphic problems, I would try glxgears and it would have no fps drop but it would look very horrible, leading me to think it was a glx or driver related crash.
Upon further inspection, I opened my window since it was about 0-20F outside when this was happening and the crashes disappeared. I yanked out my graphics card and then pulled the fan off and found it was no longer working. I ordered a new fan and everything worked flawless again. Not sure if this is your problem but thought I'd toss it out anywho just in case.

Good luck

EDIT: While I remeber, I believe I use to get some errors in dmesg when my graphics card overheated on me. I did search for a few days before I found out the real problem and I did try all sorts of patches for my driver and rolling back to previous drivers also. That patch looks like something I used on my system during that period (this was actually ~ a month ago) and I didn't notice any issues when I had that patched for about a week.

I have a very similar experience with my nvidia card. Xserver kept crashing during games ... such as Wolfenstein, and I didn't know why. Tried about as many things as you did. Then I went in there and found that the fan was almost completely clogged with dust ... :(. After cleaning it out, it never happened again :). Remember to keep your computer clean ... but use those cans of compressed air as much as possible to avoid damaging hardware on the motherboard ... obviously sometimes the dust is too much for compressed air to handle, in which case, be VERY careful.

H_TeXMeX_H 09-18-2007 03:51 PM

Quote:

Originally Posted by ciotog (Post 1566355)
I've found that using the nvidia AGP driver instead of the kernel AGP driver was the only solution that worked for me.

Simply add or change the following line in the "Device" section of xorg.conf:

Code:

        Option          "NvAgp" "1"
Haven't had a lockup since, and performance is excellent.

One note on performance, it seems that there is a reason why the built in agp driver (agpgart) is the default, and that may be that performance is significantly better with it than with nvagp. On my laptop:

glxgears

without nvagp:
Quote:

6345 frames in 5.0 seconds = 1268.962 FPS
6348 frames in 5.0 seconds = 1268.932 FPS
6349 frames in 5.0 seconds = 1269.690 FPS
6355 frames in 5.0 seconds = 1270.829 FPS
6351 frames in 5.0 seconds = 1270.035 FPS
6352 frames in 5.0 seconds = 1270.273 FPS
6352 frames in 5.0 seconds = 1270.270 FPS
6350 frames in 5.0 seconds = 1269.893 FPS
6355 frames in 5.0 seconds = 1270.820 FPS
6356 frames in 5.0 seconds = 1271.092 FPS
6362 frames in 5.0 seconds = 1272.263 FPS
6348 frames in 5.0 seconds = 1269.571 FPS
6351 frames in 5.0 seconds = 1270.076 FPS
6350 frames in 5.0 seconds = 1269.917 FPS
6354 frames in 5.0 seconds = 1270.710 FPS
6355 frames in 5.0 seconds = 1270.816 FPS
6354 frames in 5.0 seconds = 1270.728 FPS
6340 frames in 5.0 seconds = 1267.987 FPS
6354 frames in 5.0 seconds = 1270.754 FPS
with nvagp:
Quote:

6001 frames in 5.0 seconds = 1200.062 FPS
6002 frames in 5.0 seconds = 1200.230 FPS
6001 frames in 5.0 seconds = 1200.076 FPS
6005 frames in 5.0 seconds = 1200.861 FPS
6011 frames in 5.0 seconds = 1202.017 FPS
6021 frames in 5.0 seconds = 1204.198 FPS
6020 frames in 5.0 seconds = 1203.894 FPS
6007 frames in 5.0 seconds = 1201.245 FPS
6016 frames in 5.0 seconds = 1203.148 FPS
6016 frames in 5.0 seconds = 1203.052 FPS
6007 frames in 5.0 seconds = 1201.303 FPS
6015 frames in 5.0 seconds = 1202.887 FPS
6016 frames in 5.0 seconds = 1203.113 FPS
6015 frames in 5.0 seconds = 1202.975 FPS
However, being lockup free is more important that getting more performance out of the card. (I've never had this problem with my laptop so I use agpgart).

arubin 09-19-2007 02:37 AM

I had a lock up problem when I installed slackware 12 with the latest Nvidia driver. When I downgraded to an earlier version of the Nvidia driver I got no more lock ups.

ciotog 09-19-2007 06:32 AM

If I recall correctly I did have the very occasional lockup after switching over to NvAGP, but at a significantly lower frequency. Also the fan started making an awful racket, even after being cleaned and oiled (I tried a single drop on the shaft which helped temporarily). The heatsink on it was very cheap as well, so ultimately I replaced it with one that had a proper heatsink.

Actually it's fan failed as well, and currently I'm running it without the fan - the heatsink itself seems to be dissipating enough heat that it hasn't overheated all summer.

edit: I should mention that with the new card I'm using the default AGP driver.

simcox1 09-19-2007 07:24 AM

I've always used the NvAGP because AGPGART had problems when using a 2.4 kernel. It seems to work fine now with a 2.6. The FPS readings from glxgears are 1132 with NvAGP and 1144 with AGPGART. So it's slightly better.


All times are GMT -5. The time now is 05:50 PM.