Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
12-03-2021, 05:05 PM
|
#1
|
Member
Registered: May 2009
Distribution: Manjaro
Posts: 156
Rep:
|
Graphics card suddenly causes boot crash with mce error
Something strange and unsettling happened to me today. I woke up to my screen no longer powering back on after moving the mouse, not an entirely unique occurrence. I restarted and was surprised to see that right before the login screen, the monitor would power itself off, and this time I was unable to do a clean shutdown by pressing the power button. It soon became apparent the computer would stay frozen for roughly a minute, then proceed to restart itself and repeat the cycle. After one restart I'm able to catch the following error message in the console:
https://i.imgur.com/zNK01Vs.jpg
I realized it must be hardware related since I didn't install any updates nor make changes to the system configuration for over a week, this wouldn't happen yesterday on the exact same system... to confirm it I reproduced by booting a live image, exact same behavior there. I pulled out the memory modules and tried them in sets, disconnected all hard drives, tried two different screens (HDMI and DisplayPort cables), booting two kernels (5.14 and 5.15), radeon vs amdgpu, reset the CMOS via pins... in the end the only thing that worked was removing my video card and plugging in an older one.
What makes this extremely bizarre is that I get image up until boot time: I can enter BIOS just fine, see GRUB, there are no GPU freezes or graphical corruption... this seems to be all Linux detecting an error and freaking out over it. All error messages are prefixed with "mce" and oddly enough reference a CPU issue, the rest of my hardware works just fine so it's not the processor thank god.
Does anyone know what could break in a video card that would make Linux do this? I saw a reference about a `mcelog` command for these errors, but like I said the machine becomes completely inoperable after that's printed so I can't issue any commands. If you can suggest further tests I'll take a look, but please mention everything I could test first as I don't feel comfortable plugging and pulling the video card with my motherboard so often and risk breaking things (tried it twice today). If this is a hardware issue that can't be solved from kernel I have no choice but to spend a large sum of money I didn't want to spend... figured I'd ask for help here first so I know I tried everything else.
|
|
|
12-05-2021, 05:03 AM
|
#2
|
Senior Member
Registered: Sep 2014
Distribution: Slackware
Posts: 1,856
Rep: 
|
Quote:
Originally Posted by MirceaKitsune
Does anyone know what could break in a video card that would make Linux do this?
|
Could be it's just bent because of the heat. Is there another machine where you could test the GPU?
|
|
|
12-05-2021, 09:02 AM
|
#3
|
Member
Registered: May 2009
Distribution: Manjaro
Posts: 156
Original Poster
Rep:
|
Quote:
Originally Posted by elcore
Could be it's just bent because of the heat. Is there another machine where you could test the GPU?
|
I got no overheating by the temperature sensor last time. This happens on boot, the card is very cool especially then. Overheating in the past would cause square corruption, I repasted the card and such issues went away since, until whatever happened this week.
|
|
|
12-05-2021, 09:24 AM
|
#4
|
Senior Member
Registered: Sep 2014
Distribution: Slackware
Posts: 1,856
Rep: 
|
Well, if it's bent it's clearly visible you can't miss it. I've seen one that got melted and bent because the cable weight pulled it down.
It does not bend back when cooled down, it is stuck in that position until heated and straightened back up.
Checked the copper part for mold and that sort of thing? Had one that oxidized somehow, cleaned with rubber pencil eraser and fixed it.
So there is no other machine where you could test it?
|
|
|
12-05-2021, 09:55 AM
|
#5
|
Member
Registered: May 2009
Distribution: Manjaro
Posts: 156
Original Poster
Rep:
|
Quote:
Originally Posted by elcore
Well, if it's bent it's clearly visible you can't miss it. I've seen one that got melted and bent because the cable weight pulled it down.
It does not bend back when cooled down, it is stuck in that position until heated and straightened back up.
Checked the copper part for mold and that sort of thing? Had one that oxidized somehow, cleaned with rubber pencil eraser and fixed it.
So there is no other machine where you could test it?
|
No hardware defects that I can tell, looks pristine from the outside. Dust was cleared from it a while ago when I repasted it. Only other machine is my mother's computer, unfortunately I can't test it there as neither the case nor its PSU allow connecting it (additional pins cable doesn't reach from the motherboard).
|
|
|
12-05-2021, 10:10 AM
|
#6
|
Senior Member
Registered: Sep 2014
Distribution: Slackware
Posts: 1,856
Rep: 
|
Additional pins, so it requires a backup power source. And the working (old) GPU does not require that?
It could mean the GPU burned out, power connector on the motherboard burned out, or PSU and/or cable fault.
Possibly just a capacitor, but I'm no electronics expert. I'd test it on another machine to make sure it's not other components' fault.
If all the other parts are working, including the additional power connector, then I'd suspect the GPU should go to repair shop.
Some folks use the oven, sometimes all it takes is to melt the soldering.. most just buy a new GPU.
|
|
|
12-05-2021, 10:26 AM
|
#7
|
Member
Registered: May 2009
Distribution: Manjaro
Posts: 156
Original Poster
Rep:
|
Old (broken) card has two additional connectors, a 6-pin plus an 8-pin... the older (fallback) has only one 6-pin connector and works fine. The PSU makes them customizable (6-pin or 8-pin) so I tried reversing which are plugged into which connector last time, no effect so I don't suspect a bad socket. New card is supposed to arrive soon, I needed an upgrade anyway, I'll be seeing how it goes.
|
|
|
All times are GMT -5. The time now is 10:11 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|