LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 06-30-2018, 04:19 PM   #1
aguador
Member
 
Registered: Sep 2014
Location: Madrid, Spain
Distribution: Mageia (Cauldron)
Posts: 71

Rep: Reputation: Disabled
Advice on non-fatal MCE hardware errors


Starting in January of this year a small laptop year-old laptop I have with an Intel N4200 processor (8 GB ram, 480 SSD running fully updated Mageia 6 on 4.14.50 kernel) began having hardware problems by:
  • Indicating that the CMOS battery was dying or had been replaced recently and episodes when it took multiple attempts to get the machine to boot.
  • At the same time I also noted MCE messages as Mageia 6 booted, specifically:
    • mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408
    • mce: [Hardware Error]: TSC 0 ADDR fef134c0
    • mce: [Hardware Error]: PROCESSOR 0:506c9 TIME 1526150657 SOCKET 0 APIC 0 microcode 2c
The vendor's tech service feels that a BIOS upgrade has taken care of the first problem (too soon to tell for sure), but they missed the fact that the MCE errors continue. As can be seen from the third error, the microcode is mentioned and, in fact, the timing of the problems coincides with the microcode changes. I installed mcelog, but it appears that the N4200 processor is not supported so I cannot decipher the error messages more.

My question is two-fold:
  • Can anyone tell me more specifically what these errors mean?
  • Could they be related to the changes in microcode -- and perhaps something to be lived with?
Basically I am trying to determine if this is truly serious before asking for a motherboard replacement.

For the curious: http://www.vantpc.es/producto/minimo...-pentium-n4200

Basic UEFI Information:
  • SMBIOS 3.0.0
  • Vendor:AMI
  • Version: 5.12
  • Release date: 12/05/2017
 
Old 07-03-2018, 01:32 PM   #2
Ztcoracat
LQ Guru
 
Registered: Dec 2011
Distribution: Slackware, MX 18
Posts: 9,484
Blog Entries: 15

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
Hi:
The first error:
Code:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408
There are a number of reasons from what I've found.
It could be Memory errors or Error Correction Code
- Inadequate cooling / processor over-heating
- System bus errors
- Cache errors in the processor or hardware
http://www.advancedclustering.com/ac...ptions-or-mce/

Looking at the second error:

Code:
mce: [Hardware Error]: TSC 0 ADDR fef134c0
It could be processor microcode error relating to UEFI firmware rather than anything CMOS related .

https://forums.fedoraforum.org/showt...on-new-machine

And the third error is a BUG:-
https://bugzilla.redhat.com/show_bug.cgi?id=1575399

I'm hoping for you that the BIOS upgrade helped. If you continue to have issues with the machine booting than replacing the CMOS may just be your best bet.

Is this a Windows machine? An Asus? Toshiba? Sony?

Does running other Linux distributions give you the same errors at boot up?

Did the mcelog provide any helpful details?
If not, try looking in /var/log/mcelog.

-::-Another solution may be to move to a higher version of the kernel for better support.
In helping here for many years I've seen many issues fixed by installing a higher version of the kernel.-::-
 
Old 07-03-2018, 03:24 PM   #3
aguador
Member
 
Registered: Sep 2014
Location: Madrid, Spain
Distribution: Mageia (Cauldron)
Posts: 71

Original Poster
Rep: Reputation: Disabled
Thanks for the response, Ztcoracat. mcelog has not been helpful as Apollo Lake series chips are not included in its list of CPUs, although I will try again next week with the information from the Advanced Clustering link (nicest write up on mcelog I've seen). The computer itself is a CLEVO W515PU (yet another Chinese machine) imported by a re-seller here who puts their name on the lid, installs Linux Mint or a flavor of Ubuntu, and provides Windows drivers for those who want to install Win later.

Overall it is not a bad little machine (11.6" screen) if you need something small and light. The case is rather cheap (all plastic) and the battery too small (3 cell), but both of these things keep it light which has been important for me in my work.

One additional question, if I might: This is my first UEFI machine and there is one "normal" behavior it has had from the stat. After pressing the on switch it pauses for a couple of seconds before anything (except the led by the switch lighting) happens. Then it boots up. Normal?

Overheating has not generally been a problem with the machine (not hot to the touch, no high temps in the Conky), so that is the only thing I can discount. I am glad you knew of the bug, which essentially means that that error will remain until there is a(nother) bios upgrade.

At present the startup problem seems to be OK, but I am going to allow the machine to sit for a week to see if the CMOS message pops again. If it does or I see something more serious with mcelog, it goes back while it is still under warranty.

I will post again next week.

PS Yes, the error shows up with other versions of Linux, including Artix and Ubuntu. Ubuntu 18.04 (no longer installed) does NOT show it on screen (as an earlier version used by Clonezilla does), but it does write the error to "kernel.log". Artix (which I am trying to find the time to play with) and Mageia show the message during boot, and at least Mageia also writes it to "dmesg.log."
 
Old 07-03-2018, 05:08 PM   #4
Ztcoracat
LQ Guru
 
Registered: Dec 2011
Distribution: Slackware, MX 18
Posts: 9,484
Blog Entries: 15

Rep: Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176Reputation: 1176
You're Welcome:-

The slight delay after pressing the on switch is normal yes.

Yeah I would see how things go since the BIOS upgrade.
Maybe if you have time read up more on all the differnt machine check exceptions that occur. The more you know the better you'll be able to diagnose the issue.

Since you have seen the errors with Anti-X and Ubuntu that's a pattern which most likely leads to confirmation that it's a bug.
What version of Ubuntu were you running that was cloned with Clonezilla?
What kernel is Mageia running now?

Before you consider installing a fresh CMOS (since it's still under warranty) I'd call them first as you wouldn't want to VOID the warranty than I'd try a higher version of the kernel and see if that puts a stop to the errors. IF not as you've already said; and I agree, the issue probably won't be solved until the next BIOS upgrade.

The CLEVO User Manual is here in this search if you need it.
https://www.google.com/search?client...74.aw7NEXOPgNc
 
Old 09-16-2019, 11:34 PM   #5
Isabel8
LQ Newbie
 
Registered: Sep 2019
Posts: 1

Rep: Reputation: Disabled
Generally speaking a machine check exception indicates a hardware failure. This is rarely if ever an operating system error. It is highly recommended that you follow your vendor's hardware diagnostic procedure to find and replace any faulty hardware First Bankcard.

Last edited by Isabel8; 09-17-2019 at 11:36 PM.
 
Old 09-18-2019, 06:32 AM   #6
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,366

Rep: Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335Reputation: 2335
Something to watch for on an unusual pc like you appear to have it weird behaviour. For instance, the pine64 is a 64 bit pc that boots in 32bit mode :-/. I would tryu a cd boot or usb key boot where there is no grub or lilo, and lastly, a 32 bit boot.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
CMCI Storm Detected and MCE Hardware Error Slacktivist Linux - Hardware 2 12-07-2014 02:48 PM
mce: Hardware error problem Micik Linux - Hardware 1 03-17-2013 04:31 AM
MCE Errors (New on the box ) business_kid Linux - Hardware 1 12-07-2011 10:13 AM
HOWTO: Supermicro X7db8+ MCE hardware errors Adaptec SCSI card mossy Linux - Hardware 0 09-24-2007 12:42 PM
fatal errors on install - hardware problem preston87 Slackware - Installation 3 11-20-2005 09:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 04:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration