LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 11-07-2010, 03:36 PM   #1
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Rep: Reputation: 0
Mystery system resets


This new (six-month-old used) 2.6.32-25-generic (Ubuntu 10.04.1) system (Asus AB-P2800) is crashing every single day, so far always while in X. It freezes up solid, then some watchdog notices after about 10 seconds and resets it, as if I had pushed the reset button. I can't find any trace of kernel problems in /var/log so I am thinking it's a hardware problem-- but my question is: how can I be sure??
 
Old 11-07-2010, 04:29 PM   #2
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Yes, I agree that it's probably hardware.

"Power supply" is my #1 guess, followed by "RAM".

This link is a bit dated (2006), but the advice is sound:

Quote:
http://www.pcreview.co.uk/forums/thread-2650408.php

Get a copy of memtest86+ from www.memtest.org .

Boot your system with the memtest86+ floppy or CD image, and
allow two complete passes. Is you memory error free ?
If a lot of errors are reported, test the memories one at
a time, and eliminate the bad one.

If Memtest86+ is clean, next step is to get a
copy of Prime95 from mersenne.org . This test runs
in Windows (or in Linux) and runs the CPU at 100% load.
If the computer crashes instantly when the Prime95
torture test runs, then it could be power. If the
program stops and reports an error, but the OS stays running,
then it could be either the processor or the memory.
(Prime95 is a better test for flaky memory, than
memtest86+ is. But memtest86+ has the advantage of being
able to test all bytes in the memory, and memtest86+ is
most valuable when there is a permanent stuck bit in the
memory. So both types of tests have value.)

Report back how your testing goes.

If you suspect a bad disk, there are other tests, like
the disk manufacturer's test programs, that can tell you
of problems there.

Paul
PS:
Be sure to make sure your fans are all running, and your system is free of dust.

Last edited by paulsm4; 11-07-2010 at 04:31 PM.
 
Old 11-07-2010, 10:34 PM   #3
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Original Poster
Rep: Reputation: 0
[QUOTE=paulsm4;4152096]Yes, I agree that it's probably hardware.
"Power supply" is my #1 guess, followed by "RAM".

memtest86+ didn't turn up any errors (just ran 1 pass)-- and the CPU temperature doesn't seem to be a problem. It's not crashing while under heavy load, just normal load, mousing about in X etc.

Quote:
Originally Posted by paulsm4 View Post
PS: Be sure to make sure your fans are all running, and your system is free of dust.
Yes, first thing I did was open the case and check it out-- heat sink was a bit furry so I removed it and blew it out. Didn't add new thermal grease between it and the CPU but temperature has been fine-- BIOS reports CPU about 125F, and the fan modulates properly to keep it there.

Is there any trace left by the watchdog? There seem to be several
different watchdogs to choose from (at least 3)-- but one appears to be built into the kernel-- at least I don't see any way of turning it off-- there's always a watchdog/0 process, etc, and I don't see what starts it. ???

But the behavior is it freezes, then about 10 seconds pass, then it resets. That sounds like the software watchdog, not just a hardware glitch or voltage drooping or whatever. If it just hung solid, there would be no reset, right? Or is there a hardware reset too?

Wish I could find trace as to what's ailing it. Some of the other software watchdogs leave logs, but I'm leery of using them since the original always seems to be there. Two different watchdogs at the same time doesn't sound very good.

???
 
Old 11-07-2010, 11:06 PM   #4
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
OK. Two more things you can try (if you haven't already):

1. Check the logs for clues
/var/log/*
<= Especially /var/log/messages, and /var/log/kern.log

2. Modify your Watchdog configuration:
http://manpages.ubuntu.com/manpages/...atchdog.8.html
/etc/watchdog.conf
 
Old 11-08-2010, 12:07 PM   #5
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by paulsm4 View Post
OK. Two more things you can try (if you haven't already):

1. Check the logs for clues
/var/log/*
<= Especially /var/log/messages, and /var/log/kern.log
Yes, I grepped around in /var/log and did not find any mention of the watchdog (aside from it starting). Both messages and kern.log just cut off and show the restart at 9:22:52, no information as to what happened:

kern.log:
Nov 8 07:32:16 spunky kernel: [ 205.900018] PPP: VJ decompression error
Nov 8 09:22:52 spunky kernel: imklog 4.2.0, log source = /proc/kmsg started.

messages:
Nov 8 08:25:23 spunky pppd[2121]: secondary DNS address 67.211.172.30
Nov 8 09:22:52 spunky kernel: imklog 4.2.0, log source = /proc/kmsg started.

Quote:
Originally Posted by paulsm4 View Post
2. Modify your Watchdog configuration:
http://manpages.ubuntu.com/manpages/...atchdog.8.html
/etc/watchdog.conf
I de-installed every watchdog package but there are still watchdog processes starting up (with very low PIDs 5 and 8) every time; ps shows them as watchdog/0 and watchdog/1. That's the [logical] CPU# after the slash I assume. I couldn't find anything about these processes in /proc. There is no /etc/watchdog.conf.

Whatever this watchdog is, does it leave any record at all of its activities? The system is only running a few hours now between resets. I disabled some other hardware I'm not using (audio, etc) but it made no difference. I don't see anything in /var/log about the watchdog starting anymore, now that I deinstalled everything (used to see "rtkit-daemon[1921]: Watchdog thread running")-- yet there the watchdog processes.

Stumped as what to try next. Can't find any indication of a problem so far-- it just freezes and resets.
 
Old 11-08-2010, 02:08 PM   #6
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Does it have an nvidia card and the nvidia drivers ? That's what I'd suspect if there are no clues anywhere. That or some hardware issue.
 
Old 11-09-2010, 09:46 AM   #7
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by H_TeXMeX_H View Post
Does it have an nvidia card and the nvidia drivers ? That's what I'd suspect if there are no clues anywhere. That or some hardware issue.
Nope, ATI Radeon 9100 IGP...
 
Old 11-09-2010, 09:57 AM   #8
Willian
Member
 
Registered: Oct 2010
Location: Earth
Distribution: Slackware64
Posts: 38

Rep: Reputation: 2
hey you said about temperature in BIOS but the temperature changes when in load BIOS do not makes a significant load on your system, and 55°C is too hot for a processor without load. Put some termic grease on the heatsink.
Have you updated your S.O.? Updates on ubuntu are a little bit critical, sometimes it makes system unstable. Check your VGA driver, and make a memmory test. I don't think the power supply is your problem because it freezes and not shutdown.

Thanks
 
Old 11-10-2010, 09:23 AM   #9
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Willian View Post
hey you said about temperature in BIOS but the temperature changes when in load BIOS do not makes a significant load on your system, and 55°C is too hot for a processor without load. Put some termic grease on the heatsink.
Have you updated your S.O.? Updates on ubuntu are a little bit critical, sometimes it makes system unstable. Check your VGA driver, and make a memmory test. I don't think the power supply is your problem because it freezes and not shutdown.Thanks
The Asus booklet that came with the computer said 120F was the normal CPU operating temp. The fan picks up once it hits 125 and drives it back down. The system also does not crash under heavy load-- so again I don't think it's CPU temperature.

I disabled all the other watchdogs (so there's just the built-in watchdog/0 and watchdog/1 now), and yesterday there was no crashes, the first day with none. But, I also was not using my original computer much-- which had been plugged into the same outlet and was connected with Ethernet (while I moved stuff to the new box). Might be factors.

Will wait and see if the crashes have gone away on their own (unlikely), so if I can see a pattern. So far, it's always in X, usually when doing something benign like moving the mouse around.
 
Old 11-10-2010, 11:59 AM   #10
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
If nothing works, only other thing I would recommend is try a newer kernel (if they have one, or if you can compile one). I've had stability problems with some kernels recently, upgrading has helped.
 
Old 11-10-2010, 06:33 PM   #11
Willian
Member
 
Registered: Oct 2010
Location: Earth
Distribution: Slackware64
Posts: 38

Rep: Reputation: 2
Mr. ian.macky, have you checked the VGA drive and the RAM? If it is not the problem then you really do not have problem with your hardware, I suppose a kernel problem maybe.

PS: The grease can not be the problem but I higly recomends you put a little bit of termal grease on heatsink.

Thanks

Last edited by Willian; 11-10-2010 at 06:36 PM.
 
Old 11-12-2010, 07:23 PM   #12
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by H_TeXMeX_H View Post
If nothing works, only other thing I would recommend is try a newer kernel (if they have one, or if you can compile one). I've had stability problems with some kernels recently, upgrading has helped.
I'm on 2.6.32-25-generic which is the latest, I think.
 
Old 11-12-2010, 07:27 PM   #13
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by Willian View Post
Mr. ian.macky, have you checked the VGA drive and the RAM? If it is not the problem then you really do not have problem with your hardware, I suppose a kernel problem maybe.
I've run memtest with no problems. It pegs the CPU which runs the fan up pretty good, and no problems. I'm sure that fan RPMs is a good indicator of CPU temp, and overheating doesn't seem to be the problem.

Quote:
Originally Posted by Willian View Post
PS: The grease can not be the problem but I higly recomends you put a little bit of termal grease on heatsink. Thanks
The original grease was on there when I pulled off the heat sink-- looked like enough on both sides to bridge any space between them (there shouldn't be any gap, right?)-- so I didn't add more. Don't think that's the problem, alas.
 
Old 11-12-2010, 07:34 PM   #14
ian.macky
LQ Newbie
 
Registered: Nov 2010
Posts: 15

Original Poster
Rep: Reputation: 0
What's this about checking the VGA driver? What's to check?

Today was a bad day-- it's crashed 6 or 7 times-- 3 times within fifteen minutes. I was in X just dragging a window around
all three of those close times.

What I *have* noticed is that when the machine freezes, the fan ramps up a bit, typical for one core running. Not at all like the higher fan speed when running memtest (which is using both cores?)-- or the burst of war speed you get when powering on the box.

If the CPU's running, then it's not frozen-- maybe the kernel's in a tight loop somewhere??

Not that that helps much. I'm running the latest kernel and the hardware was supposedly fine before it was shipped to me. Memory seems fine. Doesn't seem to be overheating. So, still no idea what's wrong. Ram memtest some more, no problems. Got rid of any extra SW I could, turned off various services and daemons-- made no difference.

Crashes frequently! Machine is nearly useless. Well, no-- you just have to SAVE OFTEN. ...but it's bad, yes it's bad.
 
Old 11-13-2010, 10:11 AM   #15
Willian
Member
 
Registered: Oct 2010
Location: Earth
Distribution: Slackware64
Posts: 38

Rep: Reputation: 2
Are you with composite effects active? You are using any driver for your VGA (a installed one)?
When you are running the system without Graphic interface it freezes?

And forget power supply and RAM, they are not the problem.

Last edited by Willian; 11-13-2010 at 10:21 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mystery drive letter on multiboot system jscottdl General 7 05-17-2007 12:46 AM
System resets after reading a large file alexcpp Linux - Hardware 10 05-13-2007 11:16 PM
System clock continually resets itself incorrectly pete1234 Linux - General 2 06-19-2006 09:24 PM
system clock resets self at boot pearman Linux - Newbie 8 07-21-2005 03:53 AM
System resets itself! NMI error?! xianzai Fedora 1 04-06-2005 12:53 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 11:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration