Machine freezes, no errors
I have been having a problem with my main box that I would like another set of eyes to look at. The box is a 64 bit AMD, running Debian testing with a 64 bit, 2.6.12-4 kernel. I have x windows/KDE installed. I usually leave the box on all the time.
The box has been freezing up. It sometimes happens when the screensaver is running overnight or while I am at work, and it sometimes crashes while I am using it. It always seems to happen while I am viewing webpages. All at ounce the mouse will freeze, and then it is locked. The keyboard also freezes (hitting caps lock or num lock doesn't change the indicator light on the keyboard, ctrl-alt-F2 through F8 has no effect, so it isn't just X windows crashing. I have tried to ssh into the box from my laptop when this happens, and can't It also won't ping, so it is for real dead. The last image stays locked on the monitor, and all the power indicators on the tower itself stay on, but it is frozen/dead. This seems ot happen most frequently when I view .mpg or .wmv files through totem. It will often happen that I can watch the video, but the minute it stops, everything freezes. The problem is not just that it crashes my machine, but there is no log of it. Dmesg is useless as it gets rewritten at reboot, but /var/log/messages contains nothing. I don't see errors anywhere, but something is causing the crash. I'd appreciate it if people could suggest logs to check, or possible solutions to this major problem! Peace, JimBass |
try increasing
/proc/sys/vm/min_free_kbytes by like a factor of 4 or something (worth a try) |
I upgraded min_free_kbytes from 3547 to 14188. We'll see if that helps any. I didn't think RAM would be a factor in this, I have 768 Mb of RAM and 1.5 Gb of swap space.
Thanks for the suggestion, and if anything new happens (or crashes don't happen) I'll write here. Peace, JimBass |
It just froze again. I had opened a new tab in firefox, went to google (obviously not a graphics or processor intense page), wrote 3 terms in the search box, hit enter and it froze.
Upon reboot it went back to the old 3547 in the file in question. I boot into runlevel 3 with no GUI, so I increased it agin to 14188, then returned to being my regular user and started X. A) is there a way to get the 14188 to stick between reboots (thinking proc is generated each reboot, because the values don't stick) B) Any other suggestions since that didn't seem to solve the problem? Peace, JimBass |
Linux should never crash or freeze because of a misconfiguration, just not work. Could the machine be overheating? Or could you check your RAM with memtest86 to make sure you don't have a bad bit somewhere?
|
thats all i can think of accept for video drivers
do the x logs show anything wierd just thought because of larger width processor pages might need more space or something for page allocation (someone else a few weeks ago was showing a kernel page allocation error for 64 bit) just guessing -- like you hadn't figured out that already on my machine you can change proc values with the file /etc/sysctl.conf with a line like sys.vm.min_free_kbytes = boot scripts just need to run sysctl -p which i bet they already do is this debian-amd64 https://alioth.debian.org/docman/vie...d64-howto.html or just regular 32 bit debian unstable for the 64 bit version there are some different howtos for video drivers and flash for mozilla doesn't work and stuff. the fact that nothing is showing up in logs suggests what anyone ? kernel segfaulting which is a memory thing right ? or what else ? i'm at a loss |
Linux is like a classy woman. She doesn't like certain videos. She will take care of you. She might be trying to tell you something.
Certain videos are infected with hijackers. Linux is pretty resilient against this, but will sometimes freeze. Sometimes the same video will keep playing, but you can't do anything with the machine but watch the video. The video plays over and over until you reboot. Other times the kernel recognizes this should not be happening, but is unable to shut down the process cleanly. I've noticed with 2.6.12, killing processes is sometimes impossible without rebooting. They just hold on for dear life, and there is nothing you can do to kill them. This can freeze the machine. If you are using the testing distro, you should report the bug to debian. |
Gentlemen,
First of all, it pleases me very much that 2 senior members both took time out of their Saturday night (or Sunday morning depending on your physical location) to check out my problems! People like you are a large part of the reason that this site is both so popular and successful, as well as keeping myself as a contributing member! In response to the recent questions - BTMiller - overheating is a possibility, but not very likely. I had my main box die about 6 months ago due to overheating, so I swore to myself that it wouldn't happen again. I spent the extra money to have 2 high end fans in the case, 1 in back by the processor and RAM and the 2nd up front under the SATA drives to keep the circulation happening. I'm located in NYC, and the temp hasn't been much worse than 90 F or 30 C in about 10-14 days. It is hot enough that I keep my airconditioner on whenever I am home. The sides of the case feel warm, but not hot. Also, I have a feature in my BIOS that allows for automatic shutdown if the CPU temp exceeds 60 C, and that hasn't happened AFAIK. I do occaisionally check the CPU temp when I reboot after a lockup, and it hasn't even hit 50 C yet. In regards to memtest86, that seems like a fine Idea. I don't have it installed at present, and I do remember that being an option from GRUB in the booting of my FC3 system. OK, I just used apt-get to install memtest86. I see that it put nothing in /usr/bin or /usr/sbin, but the command Code:
sudo find / -name *memtest86* -print Code:
/var/lib/dpkg/info/memtest86.list Foo_bar_foo - I do have a full 64 bit install of debian. I installed the testing version of Deb for AMD64 bit machines. It gave me kernel /vmlinuz-2.6.8-11-amd64-generic in my menu/lst file, and I used the .config file in that kernel as the basis for my 2.6.12-4 kernel. I have these 4 options in menu.lst at present - Code:
title Debian GNU/Linux, kernel 2.6.12.4-take1 Code:
(WW) NVIDIA(0): horizontal sync start (1178) not a multiple of 8 Code:
# Peace, JimBass Ok, I am over the posting limit by +5000 chars, so I'll cut some of the repeats in the XFree86.0.log, and see if that makes it work - yep |
AwesomeMachine, you posted while I was posting that monstrosity. It doesn't only lock up while watching videos, that is just what I am doing most often when it locks up. Also, this happened with kernels 2.6.8-12, and 2.6.12-4, so I don't think it is a Debian bug in particular. Also, I have never had it lock up where I can just kill the video process and it will fix itself. When it locks up, it locks up hard, and the only thing I can do is hit reset and reboot. Thats for your input in any case.
Peace, JimBass |
I used memtest+ on my knoppix disk, and it ran the full battery of tests twice through in about 45 minutes, it didn't show any errors. I was actually hoping it would find an error, that way I could replace the faulty piece of RAM, and move on.
Peace, JimBass |
i read on lists.debian.org that this is nvidia drivers like i suspected
someone reports using proprietary version 1.0-7664 works. http://www.nvidia.com/object/linux_d..._1.0-7664.html could always try latest as well http://www.nvidia.com/object/linux_d..._1.0-7676.html lets see you will have to (i hope i get this right) this is while x is not running ! (cntrl. Alt. Backspace) you need gcc 3.4 (apt-get install gcc-3.4) if you haven't *duhh* export CC='/usr/bin/gcc-3.4' sh NVIDIA-Linux-x86_64-1.0-7664-pkg2.run -x cd NVIDIA-Linux-x86_64-1.0-7664-pkg2 make make install If /usr/X11R6/lib64 is not a softlink to /usr/X11R6/lib directory cd /usr/X11R6 rm -rf lib64 ln -s lib lib64 put nvidia in /etc/modules put alias char-major-195 nvidia in /etc/modutils/aliases and you will have to edit /etc/X11/XF86Config and change the module to nvidia make sure you have Load "glx" and nothing about dri or GLCore at this point i would reboot and let udev make the nvidia devices hope it works |
I've been having some similar problems with no resolution.
|
I built this box in late July, and I built it with driver 1.0.7676. The only difference I see is that my version of that driver was downloaded and installed on July 29th, and the link on the NVidia page now says it is from August 9th. I will uninstall the version I have, install the newest 1.0.7676, and if that fails I'll regress it to 1.0.7664.
For anyone who is using this to solve a similiar problem, getting the NVidia installer to work with debian 64 bit is a problem because NVida expects a lib64 directory, and debian doesn't use lib64, rather just lib. I posted the solution I had working on the NVidia linux forums, you can read it here I will post back after I can try both versions of the driver, and see if any of them solve the problem. Peace, JimBass |
and you used gcc 3.4 for your kernel the way the debian guys are doing ?
also the obvious test is to use the nv driver and see if all is stable. also might try nvidia agp module instead of the stock one. you pointed out the strangeness of the rm -rf /usr/X11R6/lib64 after the install in the instruction i was given above rather than ln -s /usr/X11R6/lib /usr/X11R6/lib64 before the driver install it may be possible the person above was using 32bit compatability libs after that which is weird BUT given the fact that the processor with 64 bit kernel can still run in 32 bit mode this may have been the persons solution you could try it with mv /usr/X11R6/lib64 /usr/X11R6/lib64BK ln -s /usr/X11R6/lib /usr/X11R6/lib64 i guess only if you build the 32 bit compatability things and they build correctly (don't know how the Makefile is setup) again i know this doesn't make logical sense |
I installed gcc 4.0 prior to compiling the 2.6.12-4 kernel, so my kernel (thought not the stock debian amd64 kernel) was built with that, so I didn't export 3.4. My current gcc is 4.0.1. I did take your earlier suggestion, and just did
Code:
ln -s /usr/X11R6/lib /usr/X11R6/lib64 As things stand tight now, removal of the nvidia driver was no problem with the --uninstall tag at the end. Nice thing about that is it leaves all your X config files alone, so I checked them, but they were still set for Nvidia. Reinstall of the driver then a reboot/modprobe had X working again. With 1.0.7676 I left it on all day while I went to work, and came home to a frozen screen saver. I rebooted, removed the driver, installed version 1.0.7664, rebooted, started X. It is up and running at present, and we'll see how long. If this fails, I will go back to the nv driver, but boy do those suck! Tough question, better to have no 3-d acceleration but not crash, or have things look nice, but crash out? Hopefully that won't be an issue, abd it will just run on the 1.0.7664. I'll write back if it crashes. If it makes it 48 hours, that will be better than ever before. Thanks again for all the help, and I'll update once something happens. Peace, JimBass |
All times are GMT -5. The time now is 08:14 PM. |