Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I wonder if anyone can tell me why Linux (any version) always tells me my drive is failing? I have bought many new drives over the years based on Linux recommendations.
Now it is telling me a drive is failing yet again. The drive just ran out of warranty the 23rd (seems to always happen that way).
Anyways, smartmontools is telling me some things are pre-fail and some are old age. My power on time doesn't even come up to a year. Are drives getting this horrible really?
Seagate (according to Linux) I had 3 drives fail and replaced them after only a little over a year use. Now this WD is failing. Horrible.
Here is smart info:
Code:
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-36-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV55587265
LU WWN Device Id: 5 0014ee 2ae8246a5
Firmware Version: 80.00A80
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Jan 28 05:59:06 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20460) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 236) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3031) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 331
3 Spin_Up_Time 0x0027 134 130 021 Pre-fail Always - 6300
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2876
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 6
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7838
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1464
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 92
193 Load_Cycle_Count 0x0032 190 190 000 Old_age Always - 30697
194 Temperature_Celsius 0x0022 118 108 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 124
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 7834 -
# 2 Short offline Completed without error 00% 5710 -
# 3 Extended offline Completed without error 00% 374 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Is this really failing? I mean it's getting to the point where I am replacing drives at the speed of light it seems. New drive about every year here. What gives?
Bah. To me, drives are cheap and data is priceless. If SMART's telling me that drives are failing, I'm not going to be the one to go on assuming that everything's okay until one day I fin
Bah. To me, drives are cheap and data is priceless. If SMART's telling me that drives are failing, I'm not going to be the one to go on assuming that everything's okay until one day I fin
Data (drives) are cheapish. But a failure a year? For home use? Makes no sense.
These drives are getting to a point where the quality has diminished so much that they require a yearly replacement?
80 bucks a pop. My machine is getting to be around 4 years old. I paid about 360 bucks for my home build (not including drives). This is costing me 310 bucks in drives and if I act now it will cost me 390? In drives? Really? This is crazy. I had an old 25MHz machine up till about a year ago with the same 12 megabyte drive it started with.
I have backups, not scared of failure really. Just a bunch of BS in my book. I have a 3 year old laptop on the same stock drive. Funny thing is after a year of use Linux told me the drive was failing. I backed it up and have never had an issue. Silly scare tactics to make one run out and buy a drive? Linux not reporting info right (in the ssd case I know it is wrong anyway)?
I agree with whizje--what in the results indicate the drive failed/is failing? The OLD_AGE and PRE_FAIL are just categories the metrics fall into, if you look in the WHEN_FAILED column none of the tests have ever failed.
Now that has taught me something. I have always looked at that data and replaced the drives. So this has always been a matter of me looking at the information wrong.
So now I know I have probably over spent over the years lol.
Why would you replace the drive before it fails anyway? I take smart warnings like that as just what they are, a warning sign..."hey your drive is about to fail, so make sure you're current on backups and have a replacement ready to go". Then when it does actually fail, a year or two later (if ever), everything is ready to swap in.
Last edited by suicidaleggroll; 01-28-2013 at 11:19 AM.
The first time I ran smartctl, on a brand new drive that I just partitioned, I saw "pre-fail" and "old-age" I thought really that the hard drive was failing, as I had problems with another drive (WD20EARX) a few days previously. However, I remarked that there were no fail event really reported and I then understood better the report.
Anyway, it is not usually possible to predict drive failures.
- A friend recently lost 2 drives (hmmm... WD green too) in a NAS RAID-5 array. Normally, the controller checks SMART and warns about the drive status. In this case, there was no warning.
- Another recent case, in a forum server: failure of a drive in a RAID-1 array, failure of the second (good) drive during RAID rebuild!
In both cases restoring the backup was the solution. Backup, backup again, backup often.
Drive failed. Had some problems the other day (read errors and such). Got a new case and transferred everything over to the new case (old one was a dust magnet). Booted the machine up and it started without issue. Couple updates installed, rebooted. Sat and stared at the screen for 10 minutes waiting for something. Dropped to console to watch the boot process, read write errors everywhere. Let it run for over an hour and nothing. Drive is gone.
So now running SSD with a 320GB drive I had around mounted under /home/me/media for my bigger files and got cache and swap setting there. Much faster but lets see how long this lasts. I guess my paranoia had something to it. Dropped my old drive into me new esata port and checked smart, passed. So smart must not be so smart.
SMART data can be interpreted - different drives can report certain values improperly.
But generally you need to look out for these values - these tend to be reported by most drives correctly:
Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 6
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 124
The RAW_VALUE has to be 0 in order to have a healthy drive. You had 6 reallocated sectors and 124 sectors that failed read and the drive had not reallocated them yet. Nonetheless this means that these sectors cannot be read.
If you can access the drive, you can try zeroing the whole drive a few times in a row ("dd if=/dev/zero of=/dev/sdX", where X is a,b,c,etc corresponding to the drive in question).
SMART data can be interpreted - different drives can report certain values improperly.
But generally you need to look out for these values - these tend to be reported by most drives correctly:
Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 6
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 124
The RAW_VALUE has to be 0 in order to have a healthy drive. You had 6 reallocated sectors and 124 sectors that failed read and the drive had not reallocated them yet. Nonetheless this means that these sectors cannot be read.
If you can access the drive, you can try zeroing the whole drive a few times in a row ("dd if=/dev/zero of=/dev/sdX", where X is a,b,c,etc corresponding to the drive in question).
Trying it now ty.
Here is the result of the first run:
Code:
dd: writing to `/dev/sdc': Input/output error
21242633+0 records in
21242632+0 records out
10876227584 bytes (11 GB) copied, 2197.26 s, 4.9 MB/s
Going again...
And again:
Code:
dd: writing to `/dev/sdc': Input/output error
21242633+0 records in
21242632+0 records out
10876227584 bytes (11 GB) copied, 492.506 s, 22.1 MB/s
I think it is pretty much toast.
Last edited by corbintechboy; 02-02-2013 at 09:51 AM.
A Current_Pending_Sector count of 124 is hardly "fine". Something bad has happened to that drive, either during the 4 hours since that last successful short offline test, or something that the short test does not detect.
As for this, I have roughly 13 computers under my care, running a combined total of 95 hard drives 24/7. Most of these machines are between 3-6 years old. Out of the entire set, I typically lose one drive every 2 years. If anything I think drives have gotten more reliable over the years, not less.
If you really are having hard drives fail on you this often, I would take a closer look at your setup. I used to lose a drive a year on my personal computer until I started putting a case fan on the drive (just a case fan on the front of the case in front of the drive to give it some air flow). I haven't lost a single drive on my personal computers (which currently account for 3 comps and 9 hard drives out of the list above) since I started doing that about 9 years ago.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.