LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   smart prefailure errors on new hard drive (https://www.linuxquestions.org/questions/linux-hardware-18/smart-prefailure-errors-on-new-hard-drive-645418/)

slackhack 05-28-2008 03:41 PM

smart prefailure errors on new hard drive
 
I just installed a new 250GB maxtor hard drive a couple weeks ago, and smart is giving this message in the daily logs:

Code:

/dev/hdb :
    Prefailure: Raw_Read_Error_Rate (1) changed to
      114, 116, 114, 115, 110, 105, 110, 112, 111, 110, 105, 111,
      112, 108, 103, 102, 105, 107, 102, 109, 107, 108, 107, 108,
      109, 111, 114, 112, 108, 104, 105, 107,


But then manual smart testing says it's okay:

Code:

root@moe:~ # smartctl -l error /dev/hdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged

root@moe:~ # smartctl -l selftest /dev/hdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%      3639        -
# 2  Short offline      Completed without error      00%      3639        -
# 3  Short offline      Completed without error      00%      3639

Which one should I believe? Are those prefailure numbers normal? I just wonder whether I should be concerned there's a problem or not.

GrapefruiTgirl 05-28-2008 04:26 PM

Many of the statuses or conditions listed by SMART start at a baseline of 100, and drift from there over time, generally in a downward fashion, obvious exceptions being things like temperature, power-on-hours, etc..

Try "smartctl -A /dev/hdb" for a list such as the following:

Code:

sh-3.1# smartctl -A /dev/hda
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  111  096  006    Pre-fail  Always      -      37767785
  3 Spin_Up_Time            0x0003  096  095  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      126
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  072  060  030    Pre-fail  Always      -      30210708769
  9 Power_On_Hours          0x0032  093  093  000    Old_age  Always      -      6175
 10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      301
187 Unknown_Attribute      0x0032  100  100  000    Old_age  Always      -      0
189 Unknown_Attribute      0x003a  100  100  000    Old_age  Always      -      0
190 Temperature_Celsius    0x0022  064  054  045    Old_age  Always      -      639631396
194 Temperature_Celsius    0x0022  036  046  000    Old_age  Always      -      36 (Lifetime Min/Max 0/20)
195 Hardware_ECC_Recovered  0x001a  065  056  000    Old_age  Always      -      113790004
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0000  100  253  000    Old_age  Offline      -      0
202 TA_Increase_Count      0x0032  100  253  000    Old_age  Always      -      0

sh-3.1#

SO you can see the many standard defaults & 100s in the first column, followed by a "worst recorded" value, finally followed by a threshold value which I am unsure but I believe is a "if you have this value you're screwed" type value. A threshold of 000 seems to mean it is irrelevant.
Basically, if the 1st column is hanging around its default value of 100 (or 200 in some cases), and the 2nd column isn't too too far removed from that value on a regular basis, AND they do not meet or exceed the threshold, I don't worry. Any row with "100 100 ###" all the time, I don't worry about.
After some regular monitoring of a device, you get familiar with its usual values.

If someone knows a better way to interpret this stuff, do tell! Or perhaps a site where various drives' defaults can be read and compared to our own?


You can also use the --all flag to get full SMART capability & usage info from the drive, including recomended testing intervals and stuff like that. The -H flag gives a brief health status, and -h is for help.

The message in your logs is more of an INFO sort of message, rather than cause for concern.

Sasha

rtspitz 05-28-2008 04:51 PM

I was once confused by the "pre failure" attributes as well.

They're just called "pre failure" to indicated that IF the numbers are out of range, the drive will most likely be dead within a short time, critical parameters that is.

As long as the smart self tests are ok, and smartctl doesn't come up with "failing now" I would just keep an eye on them, but not worry too much about it.
Interpretation of those numbers can be a bit tricky though as they are usually normalized to some value.

slackhack 05-28-2008 06:41 PM

thanks for the detailed responses GrapefruiTgirl and rtspitz. I had seen in an email archive somewhere someone saying that if it got much above 100 you should start worrying, but I guess that was a little off. I'm going to do the -A test and see what those values say, thanks. :cool:

farslayer 05-29-2008 10:37 AM

If you are concerned about the drive you can always grab the HD Test program from the drive manufacturers website..

for instance... Seatools for DOS Bootable ISO image
Most Drive manufacturers provide similar tools for testing their Hard drives.


All times are GMT -5. The time now is 08:07 AM.