LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 09-01-2008, 06:20 AM   #1
RipClaw
LQ Newbie
 
Registered: Oct 2005
Distribution: PCQ Linux 2006
Posts: 19

Rep: Reputation: 0
Wink How to debug & isolate system hangs ?


Hi Folks,

I would like to know, about, what would be the systematic way to find the source of the problem, when a computer totally freezes.

The three possibilities are
1> Hardware Problem [bad memory chip/signal attenuation etc]
2> Software Problem
3> Hardware + Software problem

Then, on what situation would this debugger card be useful?
http://www.connecttech.com/dumpswitch_microsite/
 
Old 09-02-2008, 04:56 AM   #2
blackhole54
Senior Member
 
Registered: Mar 2006
Posts: 1,896

Rep: Reputation: 61
Well, your question takes me back about 20 years to when I was working on embedded systems using 8 bit processors. Until I did a quick check just now, I didn't even realize x86 had an NMI! So w/o further searching I can only give you a general idea.

It sounds to me like the only thing the card does is generate an NMI (non-maskable interrupt) when the button is pushed and that anything useful for actual debugging is external to the card. I would think the NMI would push the program counter and all of the registers onto the stack so that you have saved the state of the machine. According to this Wikipedia article:

Quote:
With the introduction of Windows 2000, Microsoft allowed the use of an NMI to cause a system to either break into a debugger, or dump the contents of memory to disk and reboot.
I am not aware of any such capability within Linux, but perhaps there is. You could search the Internet to try to find out. If it doesn't already exist, you would have to write your own software and tie it to the NMI. If, for example, the NMI was handled with a debugger, the debugger could look at the stack and tell you the state of all of the registers and program counter at the time you interrupted things. If the debugger didn't disturb any critical memory, then the whole state of the system would be preserved and you could, in principle, trace back how you got to the hung state. I have worked with debugger/emulators that could then even single step the program that had been running, but I believe that requires additional hardware which may not be available to you. (My memory is very vague and, anyway, this was all on 8 bit processors.)

However, troubleshooting these situations can be very complicated. I have never attempted it on anything close to the complexity of a running Linux system. But my recollection is that it was as much art as science. So I don't know of any general, systematic way to proceed. Just look at the information of where you are, possibly single step if you have that capability and if other interrupts and real time events don't turn that into a meaningless exercise, and try to deduce what it might make sense to do or look at next.

Alternatively, you could have software that simply dumped all the memory along with the registers and program counter and try to figure it out from there. I have never done anything like that and so can't give you an pointers. Perhaps somebody else with experience with large(r) systems can help give you some advice.

As far as figuring out hardware issues, all I know is to deduce or hypothesize it based on what you observe the software has done. If it is flaky hardware rather than a solid failure, things can get quite tricky.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux server hangs, how to debug? linuxvijith Linux - Server 2 10-07-2007 05:33 PM
nousb & nothotplug options needed or system hangs __spc__ Linux - Hardware 3 02-22-2006 03:44 PM
How to debug system freezes due to networking? shridharj Linux - Networking 1 11-06-2005 12:59 AM
net driver causes system hangs (keyboard hangs) yjchen Linux - Networking 0 10-21-2004 10:12 PM
MDK10 Official - new kernel & system hangs kingprad Mandriva 1 06-19-2004 07:02 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 04:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration