LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to trap kernel crash error messages (http://www.linuxquestions.org/questions/linux-newbie-8/how-to-trap-kernel-crash-error-messages-457523/)

red_lego_man 06-23-2006 06:09 AM

How to trap kernel crash error messages
 
I have a problem with Fedora Core 5 on my thinkpad laptop, which causes it to crash. It's not specifically the crash I need help with, but capturing the messages,

Basically a whole load of messages scroll off the top of the console screen when the box crashes, ending with :

Continuing in 85 seconds. el: Continuing in 120 seconds.
Continuing in 48 seconds. nel: tinuing in 84 seconds.
Continuing in 11 seconds. nel: tinuing in 47 seconds.
Continuing in 1 seconds. rnel: tinuing in 10 seconds.

Is there someway of getting this whole message dumped to a file, so that I can see what is at the top?
Also, can I have the box reboot instead of, as it does now, display the "Continuing" messages and then just stopping.
It takes a switch off/on to get things back up and running.

Any help much appreciated.

Tinkster 06-24-2006 12:03 AM

Hi, and welcome to LQ!

Check whether FC offers a LKCD rpm...

That said: depending on the kind of crash you may find info on the
systems logs (check the stuff in /var/log) after a reboot.

syg00 06-24-2006 12:53 AM

LKCD is deprecated, and very unlikely it or kdump would be needed.

Check the logs as advised. If the machine is totally broken, use a liveCD (or your CD #1) to look at the logs on the hard disk.

If you have the full source package, have a llok at ../Documentation/oops-tracing.txt for hints and tips, as well as were everything (i.e. logs) should be.

red_lego_man 06-24-2006 03:39 AM

Quote:

Originally Posted by syg00
LKCD is deprecated, and very unlikely it or kdump would be needed.

Check the logs as advised. If the machine is totally broken, use a liveCD (or your CD #1) to look at the logs on the hard disk.

If you have the full source package, have a llok at ../Documentation/oops-tracing.txt for hints and tips, as well as were everything (i.e. logs) should be.


A quick reboot gets things going again, but there is absolutely no mention of any problems in the messages file or any others in /var/log.

I'll see if I can find the "oops-tracing.txt" (who would have guessed to search for "oops" !?). The information always seems to be out there, it's just knowing exactly what to search for!

HOWEVER, I have found that using SHIFT-PGUP shows me stuff that has scrolled off the top of the console screen, so at least I can write the messages down next time it happens.

In the meantime, is there a way of making the machine automatically reboot instead of just halting after the crash?

Thanks for your help so far, btw, I hope to be able to start answering question around here soon, as I do have quite a lot of Unix (ie not Linux) experience under my belt.

syg00 06-24-2006 04:24 AM

Quote:

Originally Posted by red_lego_man
In the meantime, is there a way of making the machine automatically reboot instead of just halting after the crash?

I doubt it - an oops generally means you are dead in the water.
An oops has to be in the logs - if not, I'd be sure that would be accepted by the kernel devs as a reportable kernel bug.

Of course, if the messages you are seeing are issued from (one of) the init scripts, it ain't an oops ... ;)
Then you'll need something like bootlog.
A quick search indicates it might be a real problem - maybe with FC; I'm just guessing.

red_lego_man 06-24-2006 01:31 PM

Quote:

Originally Posted by syg00
I doubt it - an oops generally means you are dead in the water.
An oops has to be in the logs - if not, I'd be sure that would be accepted by the kernel devs as a reportable kernel bug.

Of course, if the messages you are seeing are issued from (one of) the init scripts, it ain't an oops ... ;)
Then you'll need something like bootlog.
A quick search indicates it might be a real problem - maybe with FC; I'm just guessing.


Well it went down again, and the SHIFT+PGUP thing doesn't work when it's hung. I haven't googled yet, but this is that last entry on the screen (it's repeated a few times):
Code:

BUG: spinlock recursion on CPU#0, swapper/0 (Not tainted)
BUG: spinlock lockup on CPU#0, swapper/0, c0341620 (Not tainted)

There is nothing in any logs on the disk - I've grep'ed with find through every readable file on the machine.
Ho hum, I'll get to google later, meantime i have to give the kids a bath...

btmiller 06-24-2006 04:29 PM

If there is a kernel oops, there's no guarantee that things are operational enough to write a message out to the logs. So it's not necessarily surprising that there's nothing in them. You might want to look into configuring a netdump server on your LAN and configuring the problem machine to send crash data over the network to it. If you Google there are a number of tutorials, but I found this guide, which might help you.

syg00 06-24-2006 08:20 PM

Possible - more likely the log has "rolled-over" and the original message(s) lost due to the volume of messages.

Noticed this on lkml in response to a query about that spinlock recursion
Quote:

Please try 2.6.17. The spinlock was removed.
What kernel are you running BTW ???.

red_lego_man 06-26-2006 03:01 AM

Quote:

Originally Posted by syg00
Possible - more likely the log has "rolled-over" and the original message(s) lost due to the volume of messages.

Noticed this on lkml in response to a query about that spinlock recursionWhat kernel are you running BTW ???.

I was running 2.6.15, but now I'm running 2.6.17, because I too found that same comment about using 2.6.17. Seems to have done the trick (fingers crossed), my machine hasn't crashed since Saturday (today is Monday).:)


All times are GMT -5. The time now is 09:14 AM.