Slackware This Forum is for the discussion of Slackware Linux.
|
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
02-14-2013, 09:17 PM
|
#1
|
Member
Registered: Jan 2012
Location: Directly above the center of the earth
Distribution: Slackware. There's something else?
Posts: 383
Rep:
|
Suddenly, strange hardware(?) warnings from syslog...
I've been running my slackware 14 on this hdd on a new MoBo with new (4GB) RAM for about a week now and everything has been working fine.
Yesterday, I came back in to see about 50 little windows with 'warning's' about hardware and stuff.
I thought maybe I'd made a strange/wrong setting in the UEFI BIOS, so I went in and set it to 'default'. Still happened about 10 to 15 minutes later...a bunch of those 'warnings' pop up real fast (almost all at once actually) and I have to click on each one to get rid of it.
So all day today I've been testing different configurations in the BIOS and it just keeps happening.
I decided to start up my backup hdd instead about 2 hours ago to see if it happens over there (I use luckybackup to put *everything - including hidden files - from my /home dir to the backup hdd so that it's no different than my main hdd other than a very few things/apps not installed). That hdd stayed up and there were no problems with it. Not one warning window or anything.
Here's a few of the 'warning's' from the syslog in my /var/log...
Feb 14 10:26:42 oogah kernel: [12900.000036] [Hardware Error]: ^IMC0_ADDR: 0x00000000cb5c8c00
[Hardware Error]: Data Cache Error: during L1 linefill from L2.
[Hardware Error]: cache level: L2, tx: DATA, mem-tx RD
[Hardware Error]:CPU:0^IMC0_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40040004a000136
[Hardware Error]: ^IMC0_ADDR: 0x00000000c0ce8c00
[Hardware Error]: Data Cache Error: during L1 linefill from L2.
[Hardware Error]: cache level: L2, tx: DATA, mem-tx RD
[Hardware Error]:CPU:0^IMC2_STATUS[-|CE|-|-|AddrV|CECC]: 0x940040000000018a
[Hardware Error]: ^IMC2_ADDR: 0x000000008bba8800
[Hardware Error]: Bus Unit Error: SNP error during data copyback.
[Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP
[Hardware Error]: CPU:0^IMC0_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40040004a000136
[Hardware Error]: ^IMC0_ADDR: 0x00000000c0938d80
[Hardware Error]: Data Cache Error: during L1 linefill from L2.
[Hardware Error]: cache level: L2, tx: DATA, mem-tx RD
[Hardware Error]: CPU:0^IMC2_STATUS[-|CE|-|-|AddrV|CECC]: 0x940040000000018a
[Hardware Error]: ^IMC2_ADDR: 0x00000000c8378800
[Hardware Error]: Bus Unit Error: SNP error during data copyback.
[Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP
[Hardware Error]: CPU:0^IMC0_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40040004a000136
[Hardware Error]: ^IMC0_ADDR: 0x00000000c0978800
[Hardware Error]: Data Cache Error: during L1 linefill from L2.
[Hardware Error]: cache level: L2, tx: DATA, mem-tx RD
I can't tell head or tails what the warnings are. Anyone have any ideas? Some things to try to do?
|
|
|
02-15-2013, 02:28 AM
|
#2
|
Member
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591
Rep:
|
i am not specialist in that, but for me, looks like something with processor \ motherboard?
first i try to check RAM with one of memtest - download UBCD and write it to CD or to usb flash, and make them bootable, then boot from it and choose one of memtest.
L2 cache, as i understand, sit in processor. do you monitor temperatures and voltages via sensors? it all be ok? if yes, i start with processor get out from mobo, see it all contacts and so is clear, without dust and debris, and set it again. set heatsink with new thermointerface. reset ram modules, look at mobo electrolyte capacitators -all it ok ith them? if yes - try to turn on again. if warnings remain, try to swap to another PSU. it all of it not helps, then i not know -try to swap on another CPU and show it help or not - if not, try to check mainboard...
Last edited by WiseDraco; 02-15-2013 at 02:30 AM.
|
|
|
02-15-2013, 02:59 AM
|
#3
|
Member
Registered: Jan 2012
Location: Directly above the center of the earth
Distribution: Slackware. There's something else?
Posts: 383
Original Poster
Rep:
|
Sorry...I already did a Memtest86 and the RAM is all good. The temp on the cpu is at 107F, which is nice and cool. Nothing wrong with the cpu or MoBo or RAM as the backup hdd runs just fine. I'm posting from it right now. I've been on this hdd for the past few hours now and nothing pops up on this hdd.
I did get Parted Magic and booted with it and did some tests and it looks like it's more than likely the hdd is going out, though the tests weren't conclusive that anything is or will happen soon. I have to figure since this hdd is working fine and the other keeps popping up 'warnings' that it's the hdd going out. Thank goodness for backing up!
Thank you though for your input.
|
|
|
02-15-2013, 03:07 AM
|
#4
|
Member
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591
Rep:
|
very strange to me - logs talk about CPU, L1 and L2 cache, and it no have any connections to HDD, as i understand.
you check that hdd with smartctl -a /dev/sda ?
|
|
|
02-15-2013, 03:48 AM
|
#5
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
Try running:
http://www.mersenne.org/freesoft/#source
Run it in mode 1 to test for CPU issues.
|
|
|
02-15-2013, 04:10 AM
|
#6
|
Member
Registered: Apr 2009
Location: Oz
Distribution: slackware64-14.0
Posts: 875
|
It is your CPU, but it's not necessarily faulty, it just maybe a kernel incompatibility.
What you need to do is compile the very latest kernel from kernel.org and test it with that.
|
|
|
02-15-2013, 07:56 AM
|
#7
|
Member
Registered: Jan 2012
Location: Directly above the center of the earth
Distribution: Slackware. There's something else?
Posts: 383
Original Poster
Rep:
|
@WiseDraco - No, but I will shortly and give the results.
@H_TeXMeX_H - Download the source or what? What is that anyway? It looks like something similar to BOINC just not distributed computing (which I'm running with seti@home and is working fine on the backup hdd).
@wildwizard - But why all of a sudden out of the blue after almost a week of being fine? And why isn't my backup hdd making it do the same thing? Not doubting you, it just doesn't make any sense to me and mentioning it is all.
Last edited by irgunII; 02-15-2013 at 08:19 AM.
|
|
|
02-15-2013, 08:12 AM
|
#8
|
Member
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 594
Rep:
|
Mersenne is a program which searches for mersenne prime numbers and is highly optimized so that your processor and memory will be extremely tested. If there is a problem with your processor or ram it will give errors. Even so that computers which normally give no errors can give errors under this pressure so it's a very good test.
|
|
|
02-15-2013, 08:43 AM
|
#9
|
Member
Registered: Jan 2012
Location: Directly above the center of the earth
Distribution: Slackware. There's something else?
Posts: 383
Original Poster
Rep:
|
@whizje - Gotcha. Thanks. I used all the tests for the cpu that come on the latest Parted Magic, which were similar IIRR. Tyhe cpu was flying without error and doing well against the other cpu's in the list.
Here's what I got for the #smartctl command on sda:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 3
3 Spin_Up_Time 0x0003 162 154 021 Pre-fail Always - 2891
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 25
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 2
7 Seek_Error_Rate 0x000e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 051 051 000 Old_age Always - 36012
10 Spin_Retry_Count 0x0012 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 507
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 279
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 507
194 Temperature_Celsius 0x0022 112 100 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
This test is on Parted Magic also and it told me that the Raw_Read_Error_Rate, Reallocated_Sector_Ct and Reallocated_Event_Count with Raw Value of anything but zero is time to start to think about a new hdd possibly.
|
|
|
02-15-2013, 08:44 AM
|
#10
|
Member
Registered: Jan 2012
Location: Directly above the center of the earth
Distribution: Slackware. There's something else?
Posts: 383
Original Poster
Rep:
|
@whizje - Gotcha. Thanks. I used all the tests for the cpu that come on the latest Parted Magic, which were similar IIRR. Tyhe cpu was flying without error and doing well against the other cpu's in the list.
Here's what I got for the #smartctl command on sda:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 3
3 Spin_Up_Time 0x0003 162 154 021 Pre-fail Always - 2891
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 25
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 2
7 Seek_Error_Rate 0x000e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 051 051 000 Old_age Always - 36012
10 Spin_Retry_Count 0x0012 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 507
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 279
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 507
194 Temperature_Celsius 0x0022 112 100 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
This test is on Parted Magic also and it told me that the Raw_Read_Error_Rate, Reallocated_Sector_Ct and Reallocated_Event_Count with Raw Value of anything but zero is time to start to think about a new hdd possibly.
|
|
|
02-15-2013, 09:00 AM
|
#11
|
Member
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591
Rep:
|
i think, raw error not nothing fearly
i have that value now at 1050. reallocated sectors - not good, but if number is small and do not increase, than, imho, it is nothing tragic.
try to run hdd regenerator on that disc? but for me i not see any case, who can get that problems ..?
http://www.dposoft.net/hdd.html
|
|
|
02-15-2013, 09:31 AM
|
#12
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
Quote:
Originally Posted by irgunII
@whizje - Gotcha. Thanks. I used all the tests for the cpu that come on the latest Parted Magic, which were similar IIRR. Tyhe cpu was flying without error and doing well against the other cpu's in the list.
Here's what I got for the #smartctl command on sda:
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 3
3 Spin_Up_Time 0x0003 162 154 021 Pre-fail Always - 2891
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 25
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 2
7 Seek_Error_Rate 0x000e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 051 051 000 Old_age Always - 36012
10 Spin_Retry_Count 0x0012 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 507
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 279
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 507
194 Temperature_Celsius 0x0022 112 100 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 198 198 000 Old_age Always - 2
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
This test is on Parted Magic also and it told me that the Raw_Read_Error_Rate, Reallocated_Sector_Ct and Reallocated_Event_Count with Raw Value of anything but zero is time to start to think about a new hdd possibly.
|
I don't see anything to worry about there. If you still are concerned that it may be the HDD, you can run a SMART long test.
|
|
|
02-15-2013, 01:49 PM
|
#13
|
Member
Registered: Jan 2012
Location: Directly above the center of the earth
Distribution: Slackware. There's something else?
Posts: 383
Original Poster
Rep:
|
Ran all three tests for hdd's and the one I posted above was the only one with anything 'negative' to say. (the two short ones and the 39 minute one)
Does it not make any sense to everyone else though that my main hdd starts, out of the blue one day, to get those 'warnings', yet when I restart the system and boot into my backup hdd nothing happens at all and the backup hdd runs like my main one did for almost a week? I understand that the warnings said L1 cache and such and that that is something to do with the cpu, but does anyone know how the cpu can be affected by one hdd and not another on the same system? It's really bugging the heck out of me and I hate not having a backup hdd and don't have the money to get another until next month.
Is it possible that maybe an sata cable can make the cpu hiccup and burp and send out warnings? Or possibly the sata plugin on the MoBo? I can't think of anything else (which doesn't mean much, heh).
|
|
|
02-15-2013, 01:57 PM
|
#14
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
The L1 cache is on the CPU die, so I doubt anything can affect it, the HDD included.
However, if the HDD became corrupt, it could be software that is reporting it incorrectly, but the HDD seems fine.
|
|
1 members found this post helpful.
|
02-15-2013, 02:53 PM
|
#15
|
Member
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591
Rep:
|
Quote:
Originally Posted by H_TeXMeX_H
The L1 cache is on the CPU die, so I doubt anything can affect it, the HDD included.
However, if the HDD became corrupt, it could be software that is reporting it incorrectly, but the HDD seems fine.
|
+1
logfile reported about CPU /L1 / L2 errors, and that things is no connection with hdd or ram theoretically. something like may causes ( i think) from bad PSU, or mb or CPU itself - better is bought that pieces, and try to swap with your to see, what happens.
|
|
|
All times are GMT -5. The time now is 09:16 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|