Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux? |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
06-30-2018, 05:19 PM
|
#1
|
Member
Registered: Sep 2014
Location: Madrid, Spain
Distribution: Mageia (Cauldron)
Posts: 71
Rep: 
|
Advice on non-fatal MCE hardware errors
Starting in January of this year a small laptop year-old laptop I have with an Intel N4200 processor (8 GB ram, 480 SSD running fully updated Mageia 6 on 4.14.50 kernel) began having hardware problems by:
- Indicating that the CMOS battery was dying or had been replaced recently and episodes when it took multiple attempts to get the machine to boot.
- At the same time I also noted MCE messages as Mageia 6 booted, specifically:
- mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408
- mce: [Hardware Error]: TSC 0 ADDR fef134c0
- mce: [Hardware Error]: PROCESSOR 0:506c9 TIME 1526150657 SOCKET 0 APIC 0 microcode 2c
The vendor's tech service feels that a BIOS upgrade has taken care of the first problem (too soon to tell for sure), but they missed the fact that the MCE errors continue. As can be seen from the third error, the microcode is mentioned and, in fact, the timing of the problems coincides with the microcode changes. I installed mcelog, but it appears that the N4200 processor is not supported so I cannot decipher the error messages more.
My question is two-fold:
- Can anyone tell me more specifically what these errors mean?
- Could they be related to the changes in microcode -- and perhaps something to be lived with?
Basically I am trying to determine if this is truly serious before asking for a motherboard replacement.
For the curious: http://www.vantpc.es/producto/minimo...-pentium-n4200
Basic UEFI Information: - SMBIOS 3.0.0
- Vendor:AMI
- Version: 5.12
- Release date: 12/05/2017
|
|
|
07-03-2018, 02:32 PM
|
#2
|
LQ Guru
Registered: Dec 2011
Distribution: Slackware, Debian 12, Devuan & MX Linux
Posts: 9,528
|
Hi:
The first error:
Code:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408
There are a number of reasons from what I've found.
It could be Memory errors or Error Correction Code
- Inadequate cooling / processor over-heating
- System bus errors
- Cache errors in the processor or hardware
http://www.advancedclustering.com/ac...ptions-or-mce/
Looking at the second error:
Code:
mce: [Hardware Error]: TSC 0 ADDR fef134c0
It could be processor microcode error relating to UEFI firmware rather than anything CMOS related .
https://forums.fedoraforum.org/showt...on-new-machine
And the third error is a BUG:-
https://bugzilla.redhat.com/show_bug.cgi?id=1575399
I'm hoping for you that the BIOS upgrade helped. If you continue to have issues with the machine booting than replacing the CMOS may just be your best bet.
Is this a Windows machine? An Asus? Toshiba? Sony?
Does running other Linux distributions give you the same errors at boot up?
Did the mcelog provide any helpful details?
If not, try looking in /var/log/mcelog.
-::-Another solution may be to move to a higher version of the kernel for better support.
In helping here for many years I've seen many issues fixed by installing a higher version of the kernel.-::-
|
|
|
07-03-2018, 04:24 PM
|
#3
|
Member
Registered: Sep 2014
Location: Madrid, Spain
Distribution: Mageia (Cauldron)
Posts: 71
Original Poster
Rep: 
|
Thanks for the response, Ztcoracat. mcelog has not been helpful as Apollo Lake series chips are not included in its list of CPUs, although I will try again next week with the information from the Advanced Clustering link (nicest write up on mcelog I've seen). The computer itself is a CLEVO W515PU (yet another Chinese machine) imported by a re-seller here who puts their name on the lid, installs Linux Mint or a flavor of Ubuntu, and provides Windows drivers for those who want to install Win later.
Overall it is not a bad little machine (11.6" screen) if you need something small and light. The case is rather cheap (all plastic) and the battery too small (3 cell), but both of these things keep it light which has been important for me in my work.
One additional question, if I might: This is my first UEFI machine and there is one "normal" behavior it has had from the stat. After pressing the on switch it pauses for a couple of seconds before anything (except the led by the switch lighting) happens. Then it boots up. Normal?
Overheating has not generally been a problem with the machine (not hot to the touch, no high temps in the Conky), so that is the only thing I can discount. I am glad you knew of the bug, which essentially means that that error will remain until there is a(nother) bios upgrade.
At present the startup problem seems to be OK, but I am going to allow the machine to sit for a week to see if the CMOS message pops again. If it does or I see something more serious with mcelog, it goes back while it is still under warranty.
I will post again next week.
PS Yes, the error shows up with other versions of Linux, including Artix and Ubuntu. Ubuntu 18.04 (no longer installed) does NOT show it on screen (as an earlier version used by Clonezilla does), but it does write the error to "kernel.log". Artix (which I am trying to find the time to play with) and Mageia show the message during boot, and at least Mageia also writes it to "dmesg.log."
|
|
|
07-03-2018, 06:08 PM
|
#4
|
LQ Guru
Registered: Dec 2011
Distribution: Slackware, Debian 12, Devuan & MX Linux
Posts: 9,528
|
You're Welcome:-
The slight delay after pressing the on switch is normal yes.
Yeah I would see how things go since the BIOS upgrade.
Maybe if you have time read up more on all the differnt machine check exceptions that occur. The more you know the better you'll be able to diagnose the issue.
Since you have seen the errors with Anti-X and Ubuntu that's a pattern which most likely leads to confirmation that it's a bug.
What version of Ubuntu were you running that was cloned with Clonezilla?
What kernel is Mageia running now?
Before you consider installing a fresh CMOS (since it's still under warranty) I'd call them first as you wouldn't want to VOID the warranty than I'd try a higher version of the kernel and see if that puts a stop to the errors. IF not as you've already said; and I agree, the issue probably won't be solved until the next BIOS upgrade.
The CLEVO User Manual is here in this search if you need it.
https://www.google.com/search?client...74.aw7NEXOPgNc
|
|
|
09-17-2019, 12:34 AM
|
#5
|
LQ Newbie
Registered: Sep 2019
Posts: 1
Rep: 
|
Generally speaking a machine check exception indicates a hardware failure. This is rarely if ever an operating system error. It is highly recommended that you follow your vendor's hardware diagnostic procedure to find and replace any faulty hardware First Bankcard.
Last edited by Isabel8; 09-18-2019 at 12:36 AM.
|
|
|
09-18-2019, 07:32 AM
|
#6
|
LQ Guru
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,275
|
Something to watch for on an unusual pc like you appear to have it weird behaviour. For instance, the pine64 is a 64 bit pc that boots in 32bit mode :-/. I would tryu a cd boot or usb key boot where there is no grub or lilo, and lastly, a 32 bit boot.
|
|
|
All times are GMT -5. The time now is 05:05 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|