LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   New kernel - computer freezing - boot time CPU error message (https://www.linuxquestions.org/questions/slackware-14/new-kernel-computer-freezing-boot-time-cpu-error-message-4175411499/)

clifford227 06-14-2012 04:03 PM

New kernel - computer freezing - boot time CPU error message
 
Hello,

A few days ago I was messing around with slackpkg for the first time. Along with various other packages, I upgraded the kernel, unfortunately my computer has started freezing when programs max out the CPU, and Im also getting the following error (but only occasionally) in the boot messages:

CPU 0: Machine Check Exception: 004
Kernel panic -not syncing: CPU context corrupt.

I guess there's a strong possibility that this isnt just a coincidence, and that the new kernel has screwed something up.

Am I OK just upgradepkg-ing back to my previously installed kernel? (which was an offical Slackware packaged kernel 2.6.27.31 smp i686)

guanx 06-14-2012 05:08 PM

Do not overclock any part of your system.

Clean your heatsinks. If that doesn't work, get a new computer.

ReaperX7 06-14-2012 10:33 PM

Did you run lilo after you installed the new kernel?

ReggiePerrin 06-15-2012 03:58 AM

I would try running the OS from the dvd drive using the "live disc", as it is called.
This is (usually) the same Linux install disc. Instead of installing, just run the OS from there, and try to max out the CPU as before. See if the same symptoms happen.
You won't have the same programs installed as in your own final install, but you should be able to "max out" your cpu at any rate.
This is part of a way of diagnosing what has gone wrong with a system.

Can you post here your system specifications, please?
It is a good thing to know if your system is too lowly specified to run a modern operating system, or there is a hardware issue (drivers inadequate or not available, etc).

If it was running OK, and you simply updated, and the problems suddenly occurred, this would tend to indicate (but not rule out) a software problem and not a sudden coincidental hardware failure, like a PSU or RAM.
There are many things which can cause your stated symptoms. I would definitely clean the heat sinks (do NOT use a vacuum cleaner!).
Modern OSs are usually quite able to run with the cpu "maxxed out". All that happens is that the system slows down somewhat, especially if you dont have much RAM.

Hope this helps.

Perromuerto 06-15-2012 09:43 AM

A description in wikipedia
 
There is a detailed explanation in wikipedia:

http://en.wikipedia.org/wiki/Machine_Check_Exception

Essentially your hardware is toasted!

Ratamahatta 06-15-2012 04:30 PM

I support ReggiePerrins post. Here are just some additions.

On my old openSUSE 10.3 I had a packet installed that was called "lmsensors". That may help to actually check whether it really is a heat problem or not.

You might want to check the syslog after this happened. /var/log/messages.X or /var/log/kern.log.X (this file doesn't exist on openSUSE 10.3, but on current aptosid and ubuntu) contain logs (dmesg and more) from previous sessions. The kern.log.X often contains quite detailed information (e.g. "Null pointer dereference" and a stack trace).

The page linked by Perromuerto mentions that it might be a software bug or wrong kernel architecture just as well. So double check that your CPU architecture matches the kernel's! If you tried the things ReggiePerrin suggested and the architectures match, I'd say do go back to the old kernel if you can.

purevw 06-16-2012 09:00 AM

It does sound like a possible heating problem. As far as your system being too old to run a modern OS, that can cause your system to take forever to do a job (at which point you should consider a new computer), but it should not cause an MCE. lmsensors can give you temperature info as stated by Ratamahatta, assuming that your CPU is modern enough to have temp diodes. hddtemp can give your hard drive temps, if the drives are SMART capable. Gkrellm is offered by most flavors of Linux also. It runs on your desktop and can give you a real time visual of everything your system is doing, what the temps are, and what PIDs are using the most resources, network, hard drive throughput, RAM, etc.. It is a fairly small window that can be placed out of the way. It runs on my system 24/7.

The following advice involves you getting inside your computer and working. If you are not comfortable with that, then you should find someone who is.

The CPU cooler, as well as all other coolers and power supply should be cleaned at least every few months. I use an air compressor, but compressed air from a can can work also. Do not allow the compressed air to spin the cooling fans as the fans can fly apart or bearings can be damaged if spun too quickly. After blowing it out, allow it to set for a couple of hours. If your compressor does not have a moisture trap, there is a chance that tiny amounts of water were sprayed on the components. It will need time to dry.

One last thing to consider is the thermal compound or pad below the CPU cooler, and the cooler itself. Many manufacturers would let you think that their product will last forever, but this is never the case. I replace my thermal compound every couple of years.

H_TeXMeX_H 06-22-2012 06:46 AM

First thing to do would be switch to the older, good kernel. I would also run memtest86 to be sure it's not a RAM issue.


All times are GMT -5. The time now is 02:42 PM.