[Hardware Error]: System Fatal error
Hi,
yesterday when my PC was starting up, before it had completed the booting process (i.e. before I got the login prompt), it rebooted autonomously. I not sure how far it went but the second time the startup was completed successfully and it has been working normal since then. When I was checking the syslogs, found these error messages: Jan 26 08:46:10 epg-hp kernel: [ 3.839827] [Hardware Error]: System Fatal error. Jan 26 08:46:10 epg-hp kernel: [ 3.839938] [Hardware Error]: CPU:0 (15:60:1) MC4_STATUS[Over|UE|MiscV|PCC|AddrV|-|-]: 0xfe00000000070f0f Jan 26 08:46:10 epg-hp kernel: [ 3.840214] [Hardware Error]: MC4 Error Address: 0x00000000d0d00e50 Jan 26 08:46:10 epg-hp kernel: [ 3.840314] [Hardware Error]: MC4 Error (node 0): Watchdog timeout due to lack of progress. Jan 26 08:46:10 epg-hp kernel: [ 3.840510] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out) Googled a bit and found some people saying this could be RAM errors, however after ~8 hours running memtest didn find any errors. So... What next? Any ideas of what could have caused this error?? Running Slackware64-14.2, kernel 4.4.38 Thank you! |
it might be one of them things that only take place when you're not looking.
I'd keep an eye on it and perhaps get a new set of RAM chips just in case it's going down. Or at least stash some just in case money away for it. this may help give you a little more info on this How to identify defective DIMM from EDAC error on Linux |
Thank you BW-userx for your reply and for the link; very useful...
I just ran memtest over the weekend (48+ hours) and still no errors were found. So I guess I can only wait and monitor if it'll come again. |
Quote:
|
I'm no expert on the subject, but aren't these errors related to the CPU cache and not the RAM? I agree with BW-userx that it could be a one-time thing.
|
Thks for the feedback... Yeah, you could be right. Anyway, I tried to run mcelog to capture proper logs if this issue happens again, but unfortunately AMD cpus are not supported. :-(
|
It just happened again:
[ 3.839717] [Hardware Error]: System Fatal error. [ 3.839828] [Hardware Error]: CPU:0 (15:60:1) MC4_STATUS[Over|UE|MiscV|PCC|AddrV|-|-]: 0xfe00000000070f0f [ 3.840069] [Hardware Error]: MC4 Error Address: 0x00000000d0d00e50 [ 3.840208] [Hardware Error]: MC4 Error (node 0): Watchdog timeout due to lack of progress. [ 3.840406] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out) Same error message, same address... Is it fair to say it's a hardware issue? Any suggestions on how to troubleshoot this further?? Thank you |
Quote:
repeat until the problem is no longer there. take back everything that did not fix the problem and get your money back. |
Quote:
|
Not at all, no overclocking...
And the idea of changing HW until the problem disappears, I'm afraid it's not going to work for me. First, this is a company-owned laptop so I can't/shouldn't change the parts myself. And second, it's still under warranty so I'm gonna void it if I open the laptop. I could just call warranty and see what they're gonna say, but I wanted to be sure this is indeed a HW issue... |
Quote:
Is input–output memory management unit (IOMMU) available in the BIOS, and is it on? |
Yes, it's an AMD CPU on an HP 745 G3. I didn't see any iommu option in bios, don't think my pc supports that.
|
Quote:
|
|
Thank you for replying!
It's a brand new PC, got it just a couple of months ago. First time I noticed this error was around two weeks ago, when I started this thread. Yesterday it happened again... And no, no changes were done since I installed slackware. And I had seen that link you shared, but unfortunately I couldn't run mcelog, it seems (correct me if I'm wrong) that it doesn't support AMD cpus. |
All times are GMT -5. The time now is 09:51 AM. |