LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   How to troubleshoot? softwere or hardware (https://www.linuxquestions.org/questions/linux-general-1/how-to-troubleshoot-softwere-or-hardware-521276/)

kaz2100 01-20-2007 06:41 PM

How to troubleshoot? softwere or hardware
 
Hi,

My Debian (etch) penguin has several problems. Random things happen sporadically.

One of them is dead X.org, log says at the end
Code:

SetGrabKeysState - disabled
SetGrabKeysState - enabled

Backtrace:
0: /usr/bin/X11/X(xf86SigHandler+0x84) [0x80c4354]
1: [0xffffe420]
2: /usr/lib/xorg/modules/extensions/libGLcore.so(_swrast_write_rgba_span+0x89c) [0xb6388b1c]
3: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb637b904]
4: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb637cf5c]
5: /usr/lib/xorg/modules/extensions/libGLcore.so(_swrast_Line+0x23) [0xb6371c13]
6: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb63b32f1]
7: /usr/lib/xorg/modules/extensions/libGLcore.so(_tnl_RenderClippedLine+0x23) [0xb63d44b3]
8: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb63cd0f3]
9: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb63d081a]
10: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb63d45b5]
11: /usr/lib/xorg/modules/extensions/libGLcore.so(_tnl_run_pipeline+0x13f) [0xb63bad8f]
12: /usr/lib/xorg/modules/extensions/libGLcore.so(_tnl_playback_vertex_list+0x1ba) [0xb63c184a]
13: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb62d459c]
14: /usr/lib/xorg/modules/extensions/libGLcore.so [0xb62d47bf]
15: /usr/lib/xorg/modules/extensions/libGLcore.so(_mesa_CallList+0x7e) [0xb62d6c0e]
16: /usr/lib/xorg/modules/extensions/libglx.so [0xb7c8d245]
17: /usr/lib/xorg/modules/extensions/libglx.so(__glXRender+0xf3) [0xb7c86fd3]
18: /usr/lib/xorg/modules/extensions/libglx.so [0xb7c8bf6a]
19: /usr/bin/X11/X(Dispatch+0x19b) [0x8086cab]
20: /usr/bin/X11/X(main+0x489) [0x806e699]
21: /lib/tls/libc.so.6(__libc_start_main+0xc8) [0xb7dceea8]
22: /usr/bin/X11/X(FontFileCompleteXLFD+0xa9) [0x806d9d1]

Fatal server error:
Caught signal 11.  Server aborting

and file system (ext3) was broken at next reboot. (no trace at /var/log/...)

Disk dies and /var/log/message says
Code:

Jan 15 15:21:17 penguin kernel: EXT3-fs: INFO: recovery required on readonly file system.
Jan 15 15:21:17 penguin kernel: EXT3-fs: write access will be enabled during recovery.

And often disk dies saying something like "sata parity error" in console, and no trace in /var/log/... (I guess not-writable)

My kernel is 2.6.17.14, and tried several older ones, and same. These things happen once in several days to week.

I thought it could be hardware problem, and I reinstalled wINdoes (out of box state), and left power on, -I think it froze, but I have NULL experience in that wired world- so, I sent back to the manufacturer, and came back "NO problem after extensive test run."

Does anybody know how to troubleshoot?

Happy Penguins!!

stress_junkie 01-20-2007 07:08 PM

That sounds like a hardware problem. It's hard to say if it is the disk or not. You would have to put the disk in another computer and see if the problems moved to the new computer. I had a situation recently where I started to have a lot of disk problems. I moved the disk to another computer and it works fine. I put a new disk in the original computer and that disk appeared to have a lot of problems. After I had replaced the disk and the disk cable and still had problems with the computer, and the disks did not have any problems in a different computer, I have to conclude that the disk was not at fault in my case. It must be the motherboard.

So, as far as you are concerned, I believe that you have a hardware problem but I cannot determine which component is faulty.

kaz2100 01-20-2007 09:03 PM

Hi,

Thanks for your comment.

Yes, your comment makes sense.

I forgot to mention in the first post, but disk drive was replaced in the past (by manufacturer. It was sent back twice...) - I think this rules out faulty disk drive, although not 100%.

What puzzles me is the fact that manufacturer cannot detect any problem with extensive test. Are they as lousy as wIdOEws??

If logic board, what is the next step to pinpoint?

Any suggestion will be appreciated.

By the way, my penguin is Toshiba Satellite A100-S2322TD, (I submitted HCL).

Happy Penguins!

tur third 01-22-2007 10:19 AM

When I had a suspected hardware problem I use memtest86 on a bootable CD. This correctly identifed that something was wrong. Just leave it running for an hour or so. You should not get any errors.

If there are problems, it will not tell you what is wrong. Also not all hardware problems will appear as memory errors.

However it might give you some more information, and rule out memory or cpu problems.
http://www.memtest86.com/

Matir 01-22-2007 06:11 PM

I agree with using memtest86. If your drive is fine, then bad or improperly clocked memory or motherboard settings seem like a likely culprit.


All times are GMT -5. The time now is 09:31 AM.