LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Random Restarts With No Error Messages (https://www.linuxquestions.org/questions/linux-software-2/random-restarts-with-no-error-messages-4175586920/)

computersavvy 08-15-2016 06:44 PM

If the temperature is climbing like that while sitting idling in the bios with water cooling on an i3 I would be very sure it is a cooling problem. If you have another cooler of any kind that you can install you might try that and see if there is a difference.

My laptop with an i5 dual core (4 processors) running at about 90% load constantly only reaches 67 and my main pc with an FX 8350 (8 procs) at 50% load runs at about 60. With the water cooler that one only ran at 68 with 100% load.

zombieno7 08-15-2016 06:50 PM

What I don't understand is how I could run MPrime to stress test the CPU and not have the temps break 50C. It just doesn't make sense.

computersavvy 08-15-2016 07:47 PM

Quote:

Originally Posted by zombieno7 (Post 5591357)
What I don't understand is how I could run MPrime to stress test the CPU and not have the temps break 50C. It just doesn't make sense.

That makes perfect sense if either the fan or pump are starting to fail. Mechanical failures are seldom sudden.
Intermittent stalling of the pump would give that symptom, as would intermittent stalling or slowing of the fan on the radiator. No symptoms other than the intermittent overheating of the proc causing sudden power off, although there might possibly be some occasional noises related to the problem coming from either the pump or fan bearings.
If the intermittent failure did not occur during the test all would appear normal. You have said it sometimes runs for days and other times for very short times.

zombieno7 08-15-2016 08:10 PM

You might be right. I just removed the storage drive, since it was the last part to be added. Other than that, I'm just going to keep watching it. I might right a quick logging script to keep track of the temps from lm_sensors to see if they go up when I'm not watching.

zombieno7 08-16-2016 01:00 PM

....well it happened again. Oddly, this time was also around 5AM. I'm not sure if that means anything. Now I know for sure that it has nothing to do with the storage HDD.

273 08-16-2016 01:05 PM

Did you rule out power cuts or brownouts, by the way?

zombieno7 08-16-2016 01:36 PM

From the PSU? I haven't been able to get a voltmeter yet. I did run the self-test function on the PSU, and it ran fine. Is it possible that the old PSU damaged the motherboard when it went too? It seems like literally anything could be causing this.

EDIT: I forgot to mention. I rebuilt the kernel using Gentoo's genkernel script, in case I missed something the first time. Is there any way that this could be a software problem?

273 08-16-2016 01:39 PM

Quote:

Originally Posted by zombieno7 (Post 5591745)
From the PSU? I haven't been able to get a voltmeter yet. I did run the self-test function on the PSU, and it ran fine. Is it possible that the old PSU damaged the motherboard when it went too? It seems like literally anything could be causing this.

No, from the mains or building power supply. Does something kick in at 5AM that causes a temporary brown-out in the building or (as I've experienced around here in the past) is your local electricity supply simply shutting off for a second or so now and again?

zombieno7 08-16-2016 01:52 PM

Well, it's a small house, so there isn't really anything that kicks on at any specific time. It has been very hot for a while, so there is a lot of electricity being used for keeping cool. None of the other appliances are showing any signs of having been off, though, and I have these really annoying cable boxes that go through a long start-up process and turn on to a different channel when they come back. Could it be something so short that only the computer is experiencing the effects?

EDIT: It has happened at other times too. The other day, I got up from my desk to get coffee around 10AM, and when I got back, the computer had restarted.

zombieno7 08-16-2016 04:13 PM

Another update... I was running memtest again and decided to stop the test and move into the BIOS to test the nearby electrical switches to see if there was an electric issues, since it seems to only happen when there isn't heavy activity. What I noticed immediately was that the BIOS read 60+C. Memtest reported the temperature at 45C. I booted into Gentoo and only saw a reading in the 30's C from lm_sensors. I tried to run MPrime to see what would happen, and it almost immediately reported a hardware failure. I turned the computer off for a while to cool. When it had cooled for a while, I booted up and ran MPrime again. It passed three tests, and lm_sensors was reporting 45C. I shut the machine down again and booted into the BIOS. The temperature in the BIOs was over 60C again. When I booted back into Gentoo and started MPrime again, it immediately failed. Now, none of this has produced symptoms similar to the reboots. Could this be a cooling problem, or do I have two problems?

computersavvy 08-17-2016 12:10 AM

The problem could be either.
If electrical it may be anywhere. Usually the power company uses one transformer for a block or more of homes, and with all those neighbors using a lot of electricity like you are to stay cool it may be overloading the transformer and putting the are into a brown-out condition that allows the slightest blip to trigger failure in sensitive equipment. An AC starting could easily trigger it in a near brown-out condition.
The only sure way to solve this kind of problem might be to put your pc on a suitably sized UPS which will condition the voltages and handle those brief blips without shutting down the PC.

If cooling, starting and stopping the pc can make the situation worse. Lm_sensors is only accurate if you have calibrated the sensors to known readings. I would trust the bios readings more for accuracy, but sensors definitely does show trends. If you have gkrellm running on the desktop it gives you a near realtime reading on fan speeds, temperatures, and lots more. I have mine set to update once per second.
Cooling problems can be handled by trouble shooting, looking at readings, possibly substituting a different cooler to find out if the symptoms change, etc. Outside of a total failure of a water cooler pump or a fan this is a trial-and-error bit of troubleshooting. A quick and simple task to narrow down this might be simply replace the thermal grease on the cpu. Old dried out grease or uneven coatings can cause problems, but I would anticipate that problem to be consistent and symptoms easily reproduced. A loose or unevenly tightened cooler could give intermittent problems however.

273 08-17-2016 12:53 AM

It does look like this is pointing to a possible cooling system fault.

As to the electric supply -- it's difficult to tell but look out for thing like house alarms going off at about the same time or a light flicker you may not have noticed had you not been looking out for it. This does sound too frequent for power issues but if there's always a home alarm going off when it happens, and you didn't notice in the past, then that points to the mains, for example.

Emerson 08-17-2016 07:34 AM

Nearby power loads turning on/off can affect sensitive equipment, correct. PC power supply is not supposed to be such a sensitive equipment. Nowadays all power supplies are switching power supplies and this type of PS is very tolerant. Unless it is out of specs, as it may be the case with refurbished unit.

zombieno7 08-17-2016 10:09 AM

Alright guys, it looks like two problems were at fault. I've been asking around about the hardware, and apparently, the PSU needed a CMOS reset to sync properly with the motherboard. That's probably the reason for the resets. I haven't had long enough to test, but I haven't had any since the last one yesterday morning. I also believe that there is a cooling problem that I have to look into. If I had to guess, the pump is failing slowly. I have a Phanteks PH-TC14PE that I can use, but I just had a problem with my RX 480 introducing too much heat into the system and causing the CPU temps to rise anyway. I have to figure that one out.

273 08-17-2016 02:29 PM

The PSU needs firmware to deal with the motherboard? That sounds odd to me.


All times are GMT -5. The time now is 04:12 AM.