LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Have you given your HDDs some TLC lately? (https://www.linuxquestions.org/questions/linux-hardware-18/have-you-given-your-hdds-some-tlc-lately-4175509808/)

wroom 07-01-2014 06:55 PM

Have you given your HDDs some TLC lately?
 
Hello! I just want to give some general advice on HDDs, based on my own experience.
This covers mechanical disks only. I don't trust SSD's. Maybe because they fail so catastrophic out of the blue?

All HDDs need some airflow around them to keep cool. It doesn't have to be much. Just as much as the air is not standing still around the drive.
Size does matter, to a small degree. 2.5 inch drives tend to handle being stuffed into a fan-less enclosure better than 3.5 inch drives. But there exists 2.5 inch drives that does not suffer long in an unventilated compartment beside a hot lithium battery in a laptop.

Never run a HDD hotter than 48 degrees Celcius, (measured using SMART). They can run forever if you stay below 48 degrees. But over 48 degrees failure can happen anytime.

And by the way - Never run a HDD when it is colder than +5 degrees celsius. It might just grind down.
The drive manufacturers seem to just have discovered this, and have started changing lowest operating temp to +5 C on drives models that previously was claimed to handle 0 degrees Celsius.

Vibration and shock is a killer. HDDs don't like to be mechanically fixed to each other. Vibrations from two HDDs connected metal to metal can destroy them.
Some disks seem to handle being accidentally dropped on the table. Other disks - Seems to be mostly high rpm disks, like 15000 rpm - can warp when started up after just nickin it to the metal casing of a hotswap bay.

External HDD enclosures often have some really cheap power supply delivered with them. Bad power supply to a HDD will after a while make the bad blocks pop up.

What good may come from a loose ESATA (or SATA) connector? Well, everything from a simple crash dump, up to a drive that will stay silent forever in reply to commands from the controller, and keep it's dear secrets to itself. Using SATA cables with locking clips is a good idea.

HDDs don't like to be spun down over a very long time. The spindle bearings might get stuck. Seen this more on old server raid HDDs being spun down since a long time as "hot spares". Guess what? A spun down hot spare disk may not be so hot anymore, when one of its raid siblings get a bug and raid hot spare recovery is initiated. Never spin down hot spares.
The scary part is that it often is enough to take out the stuck HDD and give it a soft tap by the palm of your hand to make it unstuck. But by then the raid might have been trashed already.
The big lesson here is that if you have backups on external HDDs stuffed away in the bookshelf, you should start'em up at least once per year.

SMART is good. It is good to be smart. Run regular SMART checks on the drive health. And why not put in some scheduled SMART short selftests? At least one per week. It will give you a heads up before alzheimer strikes the HDD. With linux, this can be accomplished by using smartd. If a short selftest fails, then run a long selftest. In most cases the long selftest will recover the sectors starting to get bad before data is lost. If the long selftest fails, you "may" have lost some data. If you are lucky you can still read the sector correctly, but slow to respond. If you then rewrite the sector on a modern SATA/SAS/SCSI HDD it "may" become good. And by the way - It seems like the first sign of a disk running bad is that it is slow. (Check the drive transfer speed with "hdparm -t /dev/sda"). After that comes the SMART unrecoverable sectors messages. If so, then instantly take a backup. Sometimes a HDD seem to be repaired after running SMART long selftests. But if the cause is something like oil film on the platters, cracked magnetic surface, or any other pathological state, then the disk will quicker and quicker deteriorate even if the bad sectors was just recovered.

metaschima 07-01-2014 07:27 PM

There should be a fan near the HDD, but you shouldn't worry too much about temperature:
http://static.googleusercontent.com/...k_failures.pdf
Temperature only affects older drives, and only very high temperatures, yes over 45 C or so.

I run SMART long tests about every 1000 hours. It's senseless to run them on a calendar schedule, because my computer use varies. I only use SMART short tests for external drives where long tests take forever. I think long tests are more important and useful because they scan for bad blocks. Either way, if you enable Auto Offline Data Collection, you don't even need short tests as the attributes are auto updated.

The most important thing is backing up your data.
https://www.us-cert.gov/security-pub...backup-options

Another important thing, laptop and netbook and "green" HDDs spin themselves down on a regular basis by default. This greatly reduces their life span. So consider turning this feature off unless you plan on buying an new HDD soon. I always put the following in rc.local, no matter the system:
Code:

hdparm -B 254 /dev/sda

onebuck 07-02-2014 08:29 AM

Member Response
 
Hi,

From 'man hdparm';
Quote:

-B Get/set Advanced Power Management feature, if the drive supports it. A low value means aggressive power management and a high value means bet-
ter performance. Possible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit
spin-down). The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A
value of 255 tells hdparm to disable Advanced Power Management altogether on the drive (not all drives support disabling it, but most do).
You would use a value of '255' to disable APM for the drive.

metaschima 07-02-2014 11:10 AM

Quote:

Originally Posted by onebuck (Post 5197324)
Hi,

From 'man hdparm';You would use a value of '255' to disable APM for the drive.

It doesn't work, I tried it. 254 prevents it from spinning down. Many drives don't accept 255 and refuse to turn APM off.

onebuck 07-02-2014 01:14 PM

Member Response
 
Hi,

You should do '~# hdparm -I /dev/sdb' for the device in question to get the parameters supported. '254' will set the drive to highest performance.


All times are GMT -5. The time now is 02:36 PM.