SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have this PC I bought about a year ago (i normally build my own what whatev( and this is what i get)) and it had Windows 8 installed by default... At various times (though usually through intense gaming (but not always))I would get a "RED" Screen of death and it would auto reboot. Sure, I would get various errors and I would research them but to no avail.
Now, I did swap out the video card, swapped memory sticks/slots and disabled On board LAN/Wireless. I am fairly confident it is either MB or something directed towards that..
I even reinstalled a fresh Win 8 that I had purchased and it still does the same thing.
My original intent was for Linux anyway so I put my Slack on it.. It runs fine but at random (sometimes 3 days, sometimes 5 mins after reboot) the screen and everything freezes and I have to do a hard boot. It mostly happens in XWindows but it has indeed happened in the shell itself w/out X running at all.
Long story short; does Linux have far better resources/tools/scans/logs to tell me what is causing my system (hardware) to fail?
I downloaded the Memtest86 v4.3.7 and booted off of USB and it ran completely through the 12 GiGs.. Came up as fine.. I assume it is a 99% accurate type thing but would still have to agree that the Memory is fine.
But, that was 1 pass. It is still running; would you let it continue or should I spend time looking at other hardware issues?
B.T.W as I said this is a new Video card... The prior one was an ATI and this is nVidia so I am safe to assume it is not the GPU BUT can be the PCI-X slot?
Something just tells me it is the MB... Also, a swapped out the HD as well.
I downloaded the Memtest86 v4.3.7 and booted off of USB and it ran completely through the 12 GiGs.. Came up as fine.. I assume it is a 99% accurate type thing but would still have to agree that the Memory is fine.
But, that was 1 pass. It is still running; would you let it continue or should I spend time looking at other hardware issues?
Well, since your system fails anywhere from 5 mins to 3 days, memtest shoulld run for a similar period. A single pass only rules out hard failures.
My experience with memtest is that it is 100% on detecting failures, less for implied success. By that I mean that when it detects a failure, there really is a problem, but when it detects no error during an extended run it means only that there were no errors during that run (just like the real world!). Typically I will let it run for at least 24 hours (full local ambient thermal conditions) and when trying to spot highly transient problems as you describe, 2-3 days before I feel more confident.
The best test for suspect memory is always to swap it out if possible.
Well as much as I want to play on my box, I will let it run at least until Monday after work so that will be like 30-40 more hours...
As far as swapping memory, the only thing I did do was take 1 out at a time which left 3 others and ran until it crashed, then swapped one stick and moved to diff slot.. I had some sort of method (i was drunk) but with each stick having its own time out, it still crashed. That is not to say two sticks are corrupt.
After I run the Memtest I will verify my Bios but I am pretty sure I did indeed update that.
As far as swapping memory, the only thing I did do was take 1 out at a time which left 3 others and ran until it crashed, then swapped one stick and moved to diff slot.. I had some sort of method (i was drunk) but with each stick having its own time out, it still crashed. That is not to say two sticks are corrupt.
I just returned to this thread to make that very suggestion, although I would recommend doing it sober!
If you rotated them through with one always out of the machine and it still followed the same crash pattern, then that is a strong indication that it may not be memory. You might try it with different sets of two cards to make it an even better indicator.
Motherboards can usually be tested by a PCI/PCIe diagnostic card. It's a good idea to have one of these if you experience problems often.
You also should check the power supply if it's sufficient for the hardware in usage. Often some PCs have bare minimum power supplies for the OEM specifications. If you've added any hardware it could be a problem if not enough power is going out into the system components.
I had something similar happening with an old laptop not long ago. It wasn't the memory as such, but WAS poor connections in the memory slots. Thoroughly cleaning the edge connectors on the memory sticks provided a complete cure!
I think the reason for the intermittent nature of the problem was due to heat build up, causing expansion in the edge connectors and leading to intermittent contact.
To add to that, I did buy this PC (HP) from Best Buy last year and it began doing it with the stock components not 2 weeks after I bought it... I know I know why in earth did I not return it? If I had the answer I would be rich... Unfortunately, I did not return it and am Not rich.
That is not to say it is not a faulty PSU or PCI Slot or loose fitting memory slots... It just adds to the complexity.
Initially my thiughts were if Linux had a crash dump file of what errord out when the PC freezes.
Hello
It is my understanding that lm-sensors is capable of logging a predetermined block of times' variations in voltage, temperature, fan speed, etc. in graphical form so that you may detect trending conditions either as a norm or accompaniing issues leading to a failure. There are some examples here - http://askubuntu.com/questions/41794...peratures-load
You should close the box up since that will/should be the situation for a failure. Your internal temps are different with a side panel removed. Put the system in the same state as when the failure occurs. PSU temps will be different with a side off. Do you have a DVM to check the voltage rails for the system? PSU fan operating? Do you have adequate case ventilation?
If you do show a memtest86+ error then I would power down and do a edge clean on the sticks and also the clean the connectors. Do a search here on LQ since I have responded with a proper techniques for cleaning edges & connectors.
Hope this helps.
Have fun & enjoy!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.