LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Mandriva (http://www.linuxquestions.org/questions/mandriva-30/)
-   -   Mandrake 10.1 randomly shuts down (http://www.linuxquestions.org/questions/mandriva-30/mandrake-10-1-randomly-shuts-down-322519/)

JeffSketch 05-11-2005 05:16 PM

Mandrake 10.1 randomly shuts down
 
I've been using Mandrake 10.1 now without any problems for almost 6 months. Now, it has started randomly shutting itself down. I get a message on the screen saying 'Power saving enabled' and then the system shuts down. I've tried disabling acpid, and turning off acpi in the kernel (booting with acpi=off in lilo.conf), with no luck.

I googled for the problem and the closest I found was a bad video card, but the video card is on the motherboard, so I got a new video card, put it in, and disabled the onboard video card, but the problem remains. Sometimes, the system will only stay up for a few minutes before shutting down, barely long enough to do anything.

I'm a very seasoned linux user with about 7 years of linux experience, but this problem is baffling me. If anyone has any ideas, I'd appreciate it.

I'm not sure if this helps, but if I leave the computer off for a while (like overnight), it will stay up for a little longer (10-20 minutes) before shutting down, but after the problem starts, it will only stay up for a minute or two. Could it be an overheating problem? If so what could I do to fix it? I've visually checked the fans and they are spining, but I can't tell if they are spining fast enough.

System specs: AMD Athlon XP, Asus A7S266-VM/U2 motherboard, 512MB.

I would appreciate any ideas, but I want to be sure it's not a software problem before I try and get new hardware.

AltF4 05-11-2005 05:22 PM

- check /var/log/messages
- try to install the sensors "package" and "gkrellm" and watch temperature
- keep a root window open and try "dmesg | tail -n 50 | tee /tmp/messages.out" when system shuts down

godzero 05-11-2005 08:30 PM

Ya, heat sensors will preemptively kill a machine. Check you heatsinks and fans for dust/wear. Bad connections to fans will freak it out too, so check the connections for oxidation/etc.

A quick check for this funtion is to boot you machine, with the case open, and pull the cpu fan cord, you should see a warning, and the machine will power down after 2-3 secs. (sounds scary, I know, but no harm done if it's unpluged for only a few secs)

JeffSketch 05-11-2005 08:32 PM

Thanks for the info. So I installed the sensors package and it has four temperatures:
Temp: 88C (Max 90C) This fluxuates, and sometimes hits 90C
MBTemp: 64C (Max 40C)
CPUTemp: 36C (Max 45C)
Temp3: 21C (Max 45C)

I'm not sure if the labels are correct or not. So if the processor (or MB) is running too hot, I should just have to add some more fans or will it mean that the processor or motherboard is faulty? I'm also getting the following alarms in the /var/log/messages from sensord:
Code:

May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: VCore 1: +1.65 V (min = +1.42 V, max = +1.57 V) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: VCore 2: +1.60 V (min = +2.40 V, max = +2.61 V) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: +3.3V: +5.60 V (min = +3.14 V, max = +3.46 V) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: +12V: +7.17 V (min = +11.39 V, max = +12.61 V) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: -12V: -16.32 V (min = -12.63 V, max = -11.41 V) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: -5V: -10.25 V (min = -5.26 V, max = -4.77 V) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: Stdby: +0.00 V (min = +4.76 V, max = +5.24 V) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: fan2: 0 RPM (min = 2657 RPM, div = 2) [ALARM]
May 11 21:30:02 pcp09740815pcs sensord: Sensor alarm: Chip it87-isa-0290: fan3: 0 RPM (min = 2657 RPM, div = 2) [ALARM]


godzero 05-11-2005 09:27 PM

You MB sensor is prolly under your cpu.
(sounds dirty, or you cpu is aging and running hot)

I would try in this order:
clean all dust, ect
Remove hardware you don't use (dialup modem, ect)
new (good+ quality) heatsink + fan for cpu
additional fans
New cpu

edit:

The logs look like the program isn't working well with your mobo, some of the obviously wrong data points can be ignored, but see if there's anything you can do to drop those voltages (did someone try to overclock your machine?)

JeffSketch 05-12-2005 05:43 PM

New CPU/Heatsink did the job! All the temps are a lot cooler now!

:)

Thanks godzero and AltF4!


All times are GMT -5. The time now is 09:25 AM.