LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Kernel message - bad memory? bad cpu? (https://www.linuxquestions.org/questions/slackware-14/kernel-message-bad-memory-bad-cpu-4175452580/)

Ook 03-03-2013 10:28 PM

Kernel message - bad memory? bad cpu?
 
Slackware 14 64 bit, amd phenom 6 core 3200MHz cpu.

The box has a tendency to kernel panic or freeze on occasion. I was getting ready to look at it, and it was just sitting idle on my desk, when I got this in the console:

Message from syslogd@ook_Winbloze at Sun Mar 3 21:13:15 2013 ...
ook_Winbloze kernel: [ 897.903094] [Hardware Error]: CPU:2^IMC1_STATUS[-|CE|-|-|AddrV]: 0x9400000000000151

Message from syslogd@ook_Winbloze at Sun Mar 3 21:13:15 2013 ...
ook_Winbloze kernel: [ 897.905679] [Hardware Error]: ^IMC1_ADDR: 0x000000000068edd0

Message from syslogd@ook_Winbloze at Sun Mar 3 21:13:15 2013 ...
ook_Winbloze kernel: [ 897.908235] [Hardware Error]: Instruction Cache Error: Parity error during data load.

Message from syslogd@ook_Winbloze at Sun Mar 3 21:13:15 2013 ...
ook_Winbloze kernel: [ 897.910825] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD

Is it telling me that the cache memory on the cpu die is bad? L1 cache is on the die, and parity error in the L1 cache would basically mean bad L1 memory, throw the cpu out and get another one.

Anyone care to concur or otherwise comment on what it is trying to tell me? This box has a history of instability, and this would explain it if that is what is actually wrong.

volkerdi 03-03-2013 10:53 PM

I've had these on my own 6 core Phenom II a couple of times, evidently caused by dust buildup in the CPU heat sink. If that looks clean (and as you say, the box has a history of instability) it might also be possible that the thermal grease wasn't applied evenly.

It could also be a defective CPU, but I'd look into the cooling issues first. Even if nothing looks wrong, a better quality heat sink might be all you need.

irgunII 03-03-2013 11:57 PM

I just had that problem a couple weeks ago. I tried everything (IIRR I had another thread about it), but nothing was stopping it. Not a cooling problem, not bad RAM, nothing I or anyone else, could figure out, so I called up AMD (it was an Athlon II X2 260) on their website and did the whole warranty thing there. I got the RMA a few days ago and Monday I'm sending it to them so they can check it out to see if it *is* the cpu. If it is, they're gonna send me a new one since it was only a couple of weeks old.

It's going to keep happening to you, I guarantee it. There will be *no* pattern. You won't be able to know when it will happen at all. Sometimes it may go a whole day and you'll never get a 'warning' window, and then some days you'll get them every 2 or 3 minutes.

So, that's 2 people now that this has happened to...I'm betting just a bad batch of cpu's.

H_TeXMeX_H 03-04-2013 04:17 AM

Try running GIMPS:
http://www.mersenne.org/freesoft/#source
with test option 1 to try to diagnose CPU hardware error. Yes, the program can also be used to search for prime numbers, but it has a good CPU hardware test to make sure you actually get a good prime number.

I would also run memtest86+ because it could also be bad RAM.

Ook 03-04-2013 01:01 PM

Quote:

Originally Posted by irgunII (Post 4904065)
I just had that problem a couple weeks ago. I tried everything (IIRR I had another thread about it), but nothing was stopping it. Not a cooling problem, not bad RAM, nothing I or anyone else, could figure out, so I called up AMD (it was an Athlon II X2 260) on their website and did the whole warranty thing there. I got the RMA a few days ago and Monday I'm sending it to them so they can check it out to see if it *is* the cpu. If it is, they're gonna send me a new one since it was only a couple of weeks old.

It's going to keep happening to you, I guarantee it. There will be *no* pattern. You won't be able to know when it will happen at all. Sometimes it may go a whole day and you'll never get a 'warning' window, and then some days you'll get them every 2 or 3 minutes.

So, that's 2 people now that this has happened to...I'm betting just a bad batch of cpu's.

That is *exactly* what it does. A day or three no problems, then it dies every five minutes. No pattern at all. Nothing related to ambient temperature, nothing related to time, usage. It just dies when it feels like it. I have a Zalman cooler on the cpu, been using them for years, work really well. Bios shows cpu temp well within reason even under load. Swapped out memory, no help, replaced power supply, no help. I was about to replace the mother board as it is an Asus, and every Asus board I've bought during the last five years has fried in less than 12 months. I've never seen a Phenom go bad, so that was not one of the first places I would have looked.

This cpu is a few years old. I think I'll see if I can get a replacement, it's an am3 board, so I should be able to get one fairly inexpensively. To be continued...

irgunII 03-05-2013 07:16 AM

Okay, my old cpu, the one messing up like the OP's, got sent in the mail yesterday. It'll take a while before I hear anything back from AMD as they give this in their e-mail correspondence with warranty fixes such as this:

<quote>We recommend using a track able carrier, such as UPS, Federal Express,
DHL, etc.
Be advised that packages sent through the Postal service are directed
through AMD's mail sorting facility, which can cause undue or lengthy
delays in processing your RMA.</quote>

I couldn't afford UPS or the others, so there's no telling how long it will take before I hear back from them. Could be 3 or 4 days, could be a couple weeks <mutter>.

I'll post whatever happens when it happens.

slacker2012 03-05-2013 11:50 AM

Maybe check your capacitors on the MoBo? All of my AMDs usually eat Motherboard caps for lunch.

irgunII 03-05-2013 12:17 PM

Quote:

Originally Posted by slacker2012 (Post 4905254)
Maybe check your capacitors on the MoBo? All of my AMDs usually eat Motherboard caps for lunch.

I've got a brand-new, much nicer cpu (FX-6300) on the same MoBo as the one the athlon IIx2 260 went out on. Working fine.

Ook 03-11-2013 01:49 PM

Put a new cpu in, and it's been running fine ever since. Problem solved.

irgunII 03-11-2013 11:22 PM

Quote:

Originally Posted by Ook (Post 4909373)
Put a new cpu in, and it's been running fine ever since. Problem solved.

I just got a reply back from AMD. They confirm that my Athlon II X2 260 was bad and they are sending me a new one since it was under warranty still. So, me putting in a new/different cpu and you doing the same and everything is now running right in our systems seems to point to simply bad cpu's and nothing more. It's bound to happen out of millions(?) of them being made daily/monthly/whatever.


All times are GMT -5. The time now is 04:39 PM.