Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I am experiencing stability issues that seem to related to my nvidia drivers. Is there a way to force a load on the cpu and the gpu independent on each other to help determine if the problem is truely as random as it seems. When I crash there are no logs generated and no recovery possible. I just have to do a hard reset (not happy about that). And nothing seems to be grabbing resources except possibly at the time of crash (which is why I want to force a load). I've run memtest86 for 4 hours without a single error, blown out the case, reseated everything..........I'm confident in the general area that I'm looking at, I'm just trying to narrow that down.
A question that quickly comes to mind is which version of Xorg are you running? The newer versions of Xorg do not set the Ctrl+Alt+Bkspace key sequence by default anymore. However, you can re-enable them. It just depends on which way the X server is set to run: With HAL or without. If the X server doesn't need HAL to run, then you can add the following line to your xorg.conf inside the keyboard section of "InputDevice":
Option "XkbOptions" "terminate:ctrl_alt_bksp"
... and in the "ServerFlags" section, find the DontZap option, uncomment it and add "False" to the end of it, like so:
Option "DontZap" "False"
Of course, if your X server *does* require HAL to run, then you need to put a different line in /etc/hal/fdi/policy/10-keymap.fdi:
CTRL+ALT+backspace reboots x and takes me to the console under normal circumstances.. I can't find reference to the crashes in any of the logs......that I can make out anyway. When the system freezes it's like a kernel panic. Keyboard LEDs blink, screen freezes, audio, skips in place.
Hum. I (obviously) don't know how every distribution works, but - in my limited experience - blinking keyboard lights indicate a BIOS error condition, not a Linux/GNU problem. If I were you, I'd turn on your BIOS logging and check that log (assuming, of course, that you're using a BIOS that supports logging). If your BIOS doesn't support logging, check your BIOS manual for methods for debugging BIOS issues.
Again, in my limited experience, the most common hardware problems causing the BIOS to "barf" are loose hard drive connectors or failing memory chips. If your system is one that uses cabled hard drives (most desktop systems do), try unplugging and replugging the cables (at each end) to "clean" the contacts from any minor corrosion and reset the connectors. If that fails to help, try running the memtest program from the boot prompt. (If it's not on your boot menu, download a rescue CD and boot from it. SystemRescueCD is a good choice.)
Note: A full memtest of a 4GB memory block can take a much as eight hours - or more - to run.
It might also be useful to pull any "cards" in your computer and reseat them at the same time as you check the drive cables.
Of course, if your system is a laptop one, then the memtest is the easiest thing to try. The other connections seldom go bad on a laptop before it breaks for some other reason. (I believe that few laptops are designed to survive for more than a year or two of moderate use.)
Last edited by PTrenholme; 12-17-2009 at 09:19 PM.
All of those things were suspect and were done (pulling cards, cables, even removing additional cards/hard drives that aren't being used at the present in case it was a power issue. I've gone far enough to know that I can run a 188.8.131.52 kernel with NVIDIA proprietary drivers without an issue, but starting at 2.6.31 and any version of the NVIDIA drivers (including 190.53 just released) I die in under 5 hours. I will have to look closer at the kernel options starting at 2.6.31. I assume whatever it is has to be a default answer in that kernel and higher, because my .config file from 184.108.40.206 and only default answers to make oldconfig doesn't work.
I haven't tried bios logging, did try changing bios settings. I'll have to look into bios logging to find out if it's an option.
I just looked back at your posts in this thread (and noticed that you mentioned doing the obvious things in you first post - sorry ), but I don't see any mention of which distribution you're using when the problem occurs. (You list three distributions in your "member info" section, but it's not clear that you're using the 2.6.31 kernel with all of them.)
I had a lot of problems with the nVidia chipset drivers on this laptop when Fedora switched to loading the nouveau driver with the kernel since that driver does not (correctly) support the MCP67 chips. (Which worries me since Linus has accepted the nouveau for inclusion in the 2.6.33 kernel. But that's a different issue.) Anyhow, you may want to consider reviewing your initial RAM drive (initrd) image file to see if another driver for your card is being loaded with the kernel. (This is not likely, since the X-server would, naturally, fail to load the nvidia driver if another driver was already loaded, and would then fail to start.)
For what it's worth, I'm now using the nVidia 190.42 driver with the Fedora 220.127.116.11 kernel (and MCP67 chipset) with no problems. But, of course, you're using a different chipset, so that's probably irrelevant.
Anyway, to answer the question you asked when you started this thread, one simple way to load up your system is to start several instances of the glxgears app. (On this laptop, three instances running at the same time bumped CPU usage to 99% on both processors.)
Last edited by PTrenholme; 12-18-2009 at 08:41 PM.
Reason: Typo fix
Thank you for that tip. I'm currently running slackware64 current. I got the same issue with slackware13(32bit) stable when I used the 18.104.22.168 kernel as I now get with 2.6.32 or 22.214.171.124 on curent_64. I've been through probably 15 rolls in the last 2 weeks or so and it's just not happening. I thought that surely it was me learning to roll kernels that was causing the problem, so I rolled versions starting at 126.96.36.199 (shipped with slack13 and slack64) and I'm good up to 188.8.131.52, but when I go to 2.6.31.x The issue kicks in. I'm downloading kernels from kernel.org and nouveou isn't supposed to be included until 2.6.33 (I was HOPING that would be a good open source alternative, but after your post I'm not so sure) and in a few rolls I've removed every video driver except VESA to make sure there was no weird loading of conflicting modules. My most recent troubleshooting path has been to start with the stock .config file-> new kernel -> make oldconfig -> default answers -> make -> test till break. I now believe it to be either a bug (I don't know enough about the problem to report it I don't think) or more likely some nuance in the default options beginning in 2.6.31. I don't run any exotic hardware, it's a 3 year old gigabyte motherboard with an intel chipset and an e6600 core2duo with 6GB ram.
It's daunting and time consuming to wade through even just the NEW options in a new version config, but I have tried. I have come to appreciate just how much work is going on in the kernel development community that I never would have understood just sticking with a stock installation. I'm learning A LOT through this, I just really would like to learn THE ANSWER to my problem at some point! LOL
Is your nVidia hardware a separate card or one "built in" to your mother board?
Have you looked at the changes made in the kernel between the 184.108.40.206 that works for you and the next kernel that fails? (Was that 220.127.116.11 - if that even exists - or did you jump from 2.6.30 to 2.6.31?)
Have to asked the nVidia Linux support people for help? (Last time I did that I received a automated response suggesting I try the procedure that I told them had failed in my message, so contacting them may not be as responsive as you might wish it would be. I never bothered to get back to them after receiving the "response" their system generated.) But, hey, they might know what the problem is, and how to fix it.
in my limited experience - blinking keyboard lights indicate a BIOS error condition, not a Linux/GNU problem
I'm currently testing 18.104.22.168 and it's gone longer than any other incarnation so far that I know of. I did a little digging around to check for changes in the kernel, that seemed likely, but didn't find much (I prefer hammers to books LOL). I found some error reports in outside forums that described kernel panics with 22.214.171.124 (the first kernel version above 126.96.36.199 I'd tried which didn't work), but they didn't seem to be the same problem exactly. Then I saw their motherboards were gigabyte like mine, but for AMD instead of intel. Then I remembered your post. So I went through the bios settings again when the machine froze with yet another kernel and found the setting for "pci express frequency" (I'm thinking that was what it was called....I'll edit later if that proves to be incorrect) which was set to "auto". I set it manually to 100 (the bios menu said their were no guarantees above 100) and rebooted into 188.8.131.52.
I then used your suggestion with glxgears and got 6 instances of glx gears going, then opened up firefox, konqueror, and google chrome and pointed them all to youtube and started videos going while watching top and then threw amarok on top of that to get cpu% up to a consistent 93 and kept that going for 10 minutes or so with no problems except the expected stuttering video. I'm up for going on 3 hours so far.
If this is the answer then the only thing left to do is figure out what changed from 184.108.40.206 to 220.127.116.11 that caused a bios setting that has worked unnoticed by me for years to go silly.
Last edited by damgar; 12-18-2009 at 10:48 PM.
Reason: Poor typing
I'm going to call this solved. glxgears is a great tool I wasn't aware of. Thanks for that.
Also, thanks for mentioning the bios as ultimately you were dead on. I feel like an idiot for not finding this earlier, but I guess with all the things I learned that DIDN'T fix the problem I am now a SMARTER IDIOT than when I started! LOL