imminent drive failure or overly sensitive monitoring?

LovinFedora · 01-01-2015, 06:23 PM

Hello, I a pretty much a newb to linux, though been using for a few years. I am running fedora 21 worksation o a toshiba satellite c55-A laptop and recently got the following notifications on my kde desktop

WARNING: Your hard drive is failing Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

Device: /dev/sda [SAT], 1 Offline uncorrectable sectors

WARNING: Your hard drive is failing

I got these twice in the last week. I ran tests using startmontools, both extended and short and here were the complete results. I am wanting to know, is failure imminent, or do I have a few months or more? Maybe just overly sensitive monitoring? Thank you in advance.

Short test output
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.7-300.fc21.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.6
Device Model: ST9500325AS
Serial Number: 5VE8CHS0
LU WWN Device Id: 5 000c50 01f6059c6
Firmware Version: 0002BSM1
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 1.5 Gb/s
Local Time is: Thu Jan 1 14:49:24 2015 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 144) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 110 095 006 Pre-fail Always - 167400250
3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 967
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65045670
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2246
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 541
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 088 088 000 Old_age Always - 12
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032836
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 059 035 045 Old_age Always In_the_past 41 (38 86 44 36 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 23
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 20
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 8929
194 Temperature_Celsius 0x0022 041 065 000 Old_age Always - 41 (0 16 0 0 0)
195 Hardware_ECC_Recovered 0x001a 045 031 000 Old_age Always - 167400250
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 11 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11 occurred at disk power-on lifetime: 2244 hours (93 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 68 ff ff ff 4f 00 5d+16:35:34.722 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.721 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.526 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.437 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.433 READ FPDMA QUEUED

Error 10 occurred at disk power-on lifetime: 2049 hours (85 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 15:42:59.485 READ FPDMA QUEUED
60 00 80 ff ff ff 4f 00 15:42:59.480 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 15:42:59.477 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 15:42:59.471 READ FPDMA QUEUED
60 00 18 ff ff ff 4f 00 15:42:59.471 READ FPDMA QUEUED

Error 9 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 00:02:41.646 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:41.537 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:41.451 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.724 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.672 READ DMA EXT

Error 8 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 00:02:38.724 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.672 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.649 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.641 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.633 READ DMA EXT

Error 7 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 00:02:35.993 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.985 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.977 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.970 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.962 READ DMA EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2246 -
# 2 Short offline Completed without error 00% 2104 -
# 3 Short offline Completed without error 00% 2076 -
# 4 Extended offline Completed without error 00% 2065 -
# 5 Short offline Completed without error 00% 2059 -
# 6 Short offline Completed without error 00% 2058 -
# 7 Extended offline Completed without error 00% 2054 -
# 8 Short offline Completed: read failure 90% 2050 372497004
# 9 Short offline Completed without error 00% 2050 -
1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 4

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Long test output
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.7-300.fc21.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.6
Device Model: ST9500325AS
Serial Number: 5VE8CHS0
LU WWN Device Id: 5 000c50 01f6059c6
Firmware Version: 0002BSM1
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 1.5 Gb/s
Local Time is: Thu Jan 1 18:42:08 2015 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 144) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 107 095 006 Pre-fail Always - 12986015
3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 967
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65192573
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2250
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 541
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 087 087 000 Old_age Always - 13
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032836
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 063 035 045 Old_age Always In_the_past 37 (38 86 46 36 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 23
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19
193 Load_Cycle_Count 0x0032 096 096 000 Old_age Always - 9019
194 Temperature_Celsius 0x0022 037 065 000 Old_age Always - 37 (0 16 0 0 0)
195 Hardware_ECC_Recovered 0x001a 047 031 000 Old_age Always - 12986015
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 11 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11 occurred at disk power-on lifetime: 2244 hours (93 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 68 ff ff ff 4f 00 5d+16:35:34.722 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.721 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.526 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.437 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 5d+16:35:34.433 READ FPDMA QUEUED

Error 10 occurred at disk power-on lifetime: 2049 hours (85 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 15:42:59.485 READ FPDMA QUEUED
60 00 80 ff ff ff 4f 00 15:42:59.480 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 15:42:59.477 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 15:42:59.471 READ FPDMA QUEUED
60 00 18 ff ff ff 4f 00 15:42:59.471 READ FPDMA QUEUED

Error 9 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 00:02:41.646 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:41.537 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:41.451 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.724 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.672 READ DMA EXT

Error 8 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 00:02:38.724 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.672 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.649 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.641 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:38.633 READ DMA EXT

Error 7 occurred at disk power-on lifetime: 803 hours (33 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 00 ff ff ff 4f 00 00:02:35.993 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.985 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.977 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.970 READ DMA EXT
25 00 00 ff ff ff 4f 00 00:02:35.962 READ DMA EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 2249 418430339
# 2 Short offline Completed without error 00% 2246 -
# 3 Short offline Completed without error 00% 2104 -
# 4 Short offline Completed without error 00% 2076 -
# 5 Extended offline Completed without error 00% 2065 -
# 6 Short offline Completed without error 00% 2059 -
# 7 Short offline Completed without error 00% 2058 -
# 8 Extended offline Completed without error 00% 2054 -
# 9 Short offline Completed: read failure 90% 2050 372497004
#10 Short offline Completed without error 00% 2050 -
1 of 2 failed self-tests are outdated by newer successful extended offline self-test # 5

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

If more information is needed, please let me know. I have the drive encrypted with luks, and I am not educated in linux enough to know if this would affect the results. Also, though it is a laptop, I leave it on and running for a week or more at a time, as I run a freenet node. Thank you again for your replies.

sag47 · 01-01-2015, 09:59 PM

Definitely imminent drive failure. The following values should be near zero.

Code:

1 Raw_Read_Error_Rate 0x000f 110 095 006 Pre-fail Always - 167400250
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65045670
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032836
195 Hardware_ECC_Recovered 0x001a 045 031 000 Old_age Always - 167400250

With 2246 power on hours I'd say it's had a good life. Replace it immediately. The error counts (number all the way to the right) is extremely high. The errors do not appear to be due to a loose cable or anything otherwise the following would have a value greater than zero.

Code:

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

It's definitely on its last legs of life.

SAM

LovinFedora · 01-01-2015, 10:08 PM

Quote:

Originally Posted by sag47

Definitely imminent drive failure. The following values should be near zero.

Code:

1 Raw_Read_Error_Rate 0x000f 110 095 006 Pre-fail Always - 167400250
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65045670
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032836
195 Hardware_ECC_Recovered 0x001a 045 031 000 Old_age Always - 167400250

With 2246 power on hours I'd say it's had a good life. Replace it immediately. The error counts (number all the way to the right) is extremely high. The errors do not appear to be due to a loose cable or anything otherwise the following would have a value greater than zero.

Code:

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

It's definitely on its last legs of life.

SAM

Thank you very much. I have 6 days until payday before I can order a new drive. I was really hoping this was jus a case of overly sensitie monitoring. Time for a new drive I guess. I appreciate your help.

sag47 · 01-01-2015, 10:14 PM

Quote:

Originally Posted by LovinFedora

Thank you very much. I have 6 days until payday before I can order a new drive. I was really hoping this was jus a case of overly sensitie monitoring. Time for a new drive I guess. I appreciate your help.

If you'd like to learn more the article on Wikipedia is pretty good.

http://en.wikipedia.org/wiki/S.M.A.R.T.

Your best bet, if you can, is to power off your drive until your new drive arrives.

metaschima · 01-01-2015, 10:22 PM

I would not say that it is failing imminently, the attributes look mostly ok, but there are bad blocks according to the long test. The drive is likely old and should be replaced soon. Backup your data if you haven't already done so.

LovinFedora · 01-01-2015, 10:59 PM

Thank you all very much. I am in the process of backing up ll my data now.

rknichols · 01-02-2015, 09:59 AM

Quote:

Originally Posted by sag47

Definitely imminent drive failure. The following values should be near zero.

Code:

1 Raw_Read_Error_Rate 0x000f 110 095 006 Pre-fail Always - 167400250
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 65045670
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032836
195 Hardware_ECC_Recovered 0x001a 045 031 000 Old_age Always - 167400250

With 2246 power on hours I'd say it's had a good life. Replace it immediately. The error counts (number all the way to the right) is extremely high. The errors do not appear to be due to a loose cable or anything otherwise the following would have a value greater than zero.

Code:

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

It's definitely on its last legs of life.

SAM

That is very, very wrong. Some drives, Seagate models in particular, store something other than a simple error count in those "RAW" numbers. It's typically split into two fields, one being the total number of operations (which will continue to increment rapidly and does wrap around). See http://www.pcreview.co.uk/forums/sea...-t4040327.html for a more complete description.

The 2246 power-on hours is just over 93 days. That is a very young drive.

I can only presume that you are making some kind of sick joke when you say the drive needs immediate replacement. The drive currently has just 3 bad sectors, two that have been reallocated and one more pending reallocation (it will be reallocated, or the error cleared, the next time that sector is written).

metaschima · 01-02-2015, 10:16 AM

Quote:

Originally Posted by rknichols

The 2246 power-on hours is just over 93 days. That is a very young drive.

Yeah, I guess it is not that old, so you could keep it and keep monitoring it for bad blocks.

rknichols · 01-02-2015, 10:26 AM

Quote:

Originally Posted by metaschima

Yeah, I guess it is not that old

Especially considering the OP's comment,

Quote:

I leave it on and running for a week or more at a time...

schneidz · 01-02-2015, 10:34 AM

s.m.a.r.t. seems to be very finicky. i have 2 usb drives that refuse to write because they were flagged as faulty and should be replaced soon. i have been using them as read-only video drives for my xbmc machine for more than 6 months.

metaschima · 01-02-2015, 10:42 AM

Basically, a drive is considered failing when:
1) A SMART attribute is failing. It will say "FAILING_NOW" in the WHEN_FAILED field.
2) A SMART attribute that is near-failing or has failed in the past AND failed SMART long test / badblocks.

Neither is the case here, so the drive is clearly not failing. However, the bad blocks still do concern me. They should disappear as they are reallocated by the drive, but if they keep increasing that is not good. So, just keep monitoring it for now.

LovinFedora · 01-02-2015, 11:09 AM

Thank you all for your responses. I backed up all important data, and sill plan on ordering a backup drive, but will just keep it in reserve if his one does indeed fail. now I a notg so worried. In the meantime, I am studying up ike crazy on S.M.A.R.T. and the readouts it gave me. I started using linux years ago, and went full bore into learning everything I could, using the terminal for most everything etc., and I kind of got lazy over time, due to GUIs and everyday life, things like that. This situation has really ignited that old fire to learn as much as possible again, so I can solve issues like this. Thank you all so much for your help. This is truly a great forum. Glad to be a member.

schneidz · 01-02-2015, 12:03 PM

similar:
http://www.linuxquestions.org/questi...17#post5294217