Smart HDD report interpretation, please!
Hi guys,
Another drive of mine has started making some really disturbing noises, so I fear its time may soon be at hand. Can some kind soul make any sense of this, please? Code:
# smartctl --all /dev/sda |
The values for Current_Pending_Sector (potentially bad sectors awaiting reallocation) and Offline_Uncorrectable are off the scale. This drive has major problems and should be replaced ASAP.
|
Quote:
|
PS: I'm closing down now to migrate important un-backed-up data to somewhere else, so won't be online for a while.....
|
Looking at your smartctl dump i see two possible causes for trouble:
Possible cause #1: 194 Temperature_Celsius 0x0022 051 056 000 Old_age Always - 51 (0 13 0 0 0)This show the current drive temperature to be 51 deg Celsius. For SATA/SAS drives, the magic number seem to be that above 48 deg Celsius bad things start to happen with the sectors. Sometimes the problem can be solved by improving the cooling of the drive. (Adding a fan...) Sometimes it is a mechanical issue within the drive that is making it run hot. If so, the drive is a gonner. To revive a drive that has been running hot for a while, take an image backup of the data. But before anything else, solve the heat issue so the drive operates below 48 deg C. When you have made a proper image backup of the drive, (and checked that the backup is good), then erase/overwrite the "bad" drive with the "dod pattern", AA, 55, FF, 00... This can be done in linux with a command like "badblocks -wsc 256 /dev/sdx". When you write the patterns AA, 55, FF, 00, (that is bytes of 10101010, 01010101, 11111111 and 00000000), the drive smart firmware gets a chance to write fresh sectors, and at the same time test the sector "goodness" with the different patterns. If you run the smart long selftest after wiping the drive like this, the smart firmware gets a chance to tick of any suspect bads that are not actually bad, and also relocate sectors that is truly bad. It may happen that the drive after being wiped with the DOD pattern under reasonable operating temperature is going to be good again, after smart has had the chance to scrub out all the suspect bads. But it might also be possible that the drive is truly bad. Then the smart selftest will eventually terminate with a lot of uncorrectables, and a tripped SMART FAIL status, and a drive that merely respond to read/write requests. Possible cause #2 for trouble is the: 189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 3856This can mean that the write head actuator may be worn out, or that external mechanical shock made the write head move from where is was supposed to be during a write. Imaging you writing a letter with a ballpoint pen and your daughter comes in and grasp you by the arm, shaking the arm with the pen vigourously while saying something like "daddy, daddy come look"! Then you will be making "high fly writes" all over the paper, desk, left hand... :) If you have high fly writes on a server standing securely in a quiet corner, the disk is surely worn out, or someone is kicking at it when you're not looking. Some data to compare with: Code:
SMART Attributes Data Structure revision number: 16 It's a HITACHI DeskStar HDS7250SASUN500G. Not the fastest. Not the coolest. But it is sturdy. |
All times are GMT -5. The time now is 05:53 AM. |