LinuxQuestions.org - Hard drive failing

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Hard drive failing (https://www.linuxquestions.org/questions/linux-newbie-8/hard-drive-failing-4175705650/)

Hard drive failing

Hi Folks,

it looks like my hard drive is on it's last legs. I didn't want to spend the money but it looks like I get to upgrade to an SSD.

Code:

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  100  099  062    Pre-fail  Always      -      0

  2 Throughput_Performance  0x0025  100  100  040    Pre-fail  Offline      -      0

  3 Spin_Up_Time            0x0023  168  100  033    Pre-fail  Always      -      1

  4 Start_Stop_Count        0x0032  098  098  000    Old_age  Always      -      3201

  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002f  100  100  067    Pre-fail  Always      -      0

  8 Seek_Time_Performance  0x0025  100  100  040    Pre-fail  Offline      -      0

  9 Power_On_Hours          0x0032  080  080  000    Old_age  Always      -      8950

 10 Spin_Retry_Count        0x0033  100  100  060    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  098  098  000    Old_age  Always      -      3200

183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -      0

184 End-to-End_Error        0x0033  100  100  097    Pre-fail  Always      -      0

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      592709156864

188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      81606148102

190 Airflow_Temperature_Cel 0x0022  070  049  045    Old_age  Always      -      30 (Min/Max 20/31)

191 G-Sense_Error_Rate      0x0032  082  082  000    Old_age  Always      -      4689

192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      851981

193 Load_Cycle_Count        0x0032  059  059  000    Old_age  Always      -      419500

196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0030  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0036  100  100  000    Old_age  Always      -      0

223 Load_Retry_Count        0x002a  100  100  000    Old_age  Always      -      0



SMART Error Log Version: 1

No Errors Logged

--glenn

I don't see anything wrong in those attributes. The Load_Cycle_Count and Power-Off_Retract_Count are awfully high, but so far they don't seem to have hurt anything. Having the drive power cycling every 90 seconds or so (851981 times in just 8950 hours) seems like a configuration problem.

Smartctl is usually able to report a lot more if you were to use

Code:

sudo smartctl -a /dev/sdX

Unless you are certain it is failing (i.e. getting a lot of corruption that requires using fsck or similar) I would dig deeper before you just automatically replace it.

With that said, the output of the smartctl command above is a better judge of the status than the little bit you posted here.

Also remember that a backup is always recommended just in case of catastrophic failure.

Well, I certainly wouldn't be happy with those numbers for 187, 188.

Quote:

Originally Posted by PsychoHermit (Post 6313829)

Hi Folks,

it looks like my hard drive is on it's last legs. I didn't want to spend the money but it looks like I get to upgrade to an SSD.

Code:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0

196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  100  100  000    Old_age  Always      -      0

From these I see no reason to expect soon failure, especially with such low power on hours. Don't be afraid to update to SSD anyway. Unless your PC is running the disk on an old SATA-1 controller, upgrading to SSD should be worth it for the speed increase. This HDD looks like a great candidate for backing up your SSD.

Quote:

Originally Posted by syg00 (Post 6313841)

Well, I certainly wouldn't be happy with those numbers for 187, 188.

Nothing wrong with those - you should check normalised, not raw values. It is probably a seagate drive - they always have insane raw values. And if it is, it should be replaced even if it looks perfectly healthy as this one does.

Quote:

Originally Posted by syg00 (Post 6313841)

Well, I certainly wouldn't be happy with those numbers for 187, 188.

Convert those values to hex and look at the low-order bits. Seagate drives (I'm guessing this is a Seagate hybrid drive) typically have some raw values for which the low-order bits are the actual exception count and the higher-order bits are the number of operations.

Unless you're really into parsing low level details, you can just ask smartctl for its evaluation with -H

Code:

sudo smartctl  -H /dev/nvme0n1

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.4-101.fc34.x86_64] (local build)

Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org



=== START OF SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

Looks ok. But hey ... any excuse to spend $100 on an SSD sounds good to me :D . My spinning rust storage devices are now only for backups. All our laptops, desktops, and a data server our running on SSDs.

Never too late to learn - thanks for the education.

I guess I will hold off replacing the drive and see what develops. It may keep working for quite some time.

Thanks,
--glenn

If you think that a drive might be headed for failure, get rid of the damned thing. :)

SSD hard drives, both internal and external, are insanely-big and no longer expensive. I have several external drives attached to all of my computers, for continuous backups and other purposes.

If you are using LVM = Logical Volume Management, as you should be, you can actually migrate all of the data off the failing drive and onto the new one automagically ... and without downtime.