LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (http://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Is it possible to "stress test" hard drive to fail (http://www.linuxquestions.org/questions/linux-hardware-18/is-it-possible-to-stress-test-hard-drive-to-fail-844205/)

kevinbenko 11-14-2010 11:47 AM

Is it possible to "stress test" hard drive to fail
 
This is a general hardware question, but I trust the Linux community to be more clueful than those *other* OS users.

I just picked up a new external hard drive.
Back in the 8-bit era of the 1980's, there was a general consensus, that may have been an urban legend, that if a piece of hardware were faulty and was going to fail, then it would fail within the first $NUMBER hours of use, now $NUMBER varied from 100 to 500, depending who you talked to.

While the above *seems* to make sense, does it, indeed, make sense?

Be that as it may, given that I do subject the new hard drive to big buckets of read/writes and it does not fail, does anyone have any opinion on whether this is any sort of a guarantee that the new hard drive is less likely to be faulty and that I can feel more secure of its ability to not melt into a pile of slag on my desk?

macemoneta 11-14-2010 11:58 AM

You are conflating two concepts invalidly.

Premature failures are caused by faulty components, and they cause a failure to occur outside the normal probability for the product. Stress testing can uncover a premature failure, but it's unlikely that anything that you do in normal use will cause that type of stress (usually done with thermal, mechanical, and electrical variations in component testing). If a component is going to fail early, it will fail early no matter how you use it.

Past the premature failure time frame, product failure has a given probability indicated by the MTTF. However, that refers to a population, and provides no predictive function for an individual in the population.

H_TeXMeX_H 11-14-2010 12:06 PM

The bathtub curve does apply here:
http://en.wikipedia.org/wiki/Bathtub_curve

However, like macemoneta says, early failures are not due to wear and tear, they may be cause by some manufacturing fault, and you may not be able to elicit them.

For the wear out failures, SMART is useful. For early failures, well don't keep important stuff on a brand new drive, and keep the warranty statement handy.

jefro 11-14-2010 04:44 PM

Most consumer products don't get a factory test or at least like they used to a long time ago. IBM used to stress test AT computers in ovens for 24 hours or more and that was after a hundred or line tests. They were quite well tested. You installing the drive may be it's first test.

What some of the space program tests did was in fact to try to damage devices. They would actually buy 100 parts. then try to get them to fail. They believed that if the 90 or so that failed resulted in 10 that were of the best quality. It as a MTBF deal. Their tests were well below predicted MTBF.


So as to your test. I agree that a proper stress test that used the hard drive as in normal use would ferret out any short term fault. That test would be subject to knowing a lot of other data too. It could be suggested that one could create a situation that made it fail.

Electro 11-14-2010 07:01 PM

All consumer products are tested. Some are not as much as others.

Hard drives are tested, but in a proprietary method that each manufacture has in house. The MTBF and MTTF are not the same for each manufacture, so you could use those specs to find the most robust hard drive. You could go with IBM/Hitachi, Western Digital, and Seagate, the hard drives should be tough enough to handle desktop and notebook wear and tear. If you are very, very paranoid, it is best to compare the hard drive utility from each of the manufacture basing on the quality of their scan. IMHO, IBM or Hitachi has the best utility that does a very, very thorough scan of the mechanics and the integrity of the sectors of the platters.

The feature S.M.A.R.T. is not the best to figure out when a hard drive will fail. It is layered in the software of the hard drive to diagnose problems. The electronics that is controlling the mechanics of the hard drive can have a glitch and so will S.M.A.R.T. How are you going to depend on something that a chances of glitching and cause S.M.A.R.T. to mark a glitch as a failure. A glitch is a matter of life, so it can be good or bad. IMHO, it is best to use the loudness of a hard drive to tell if the mechanics are getting bad (loud).

Technology of a hard drive have change very dramatically since 1980, so I suggest do not be paranoid. If you are this paranoid, it is better to go into a different industry. Any industry has some sort of risk.

macemoneta 11-14-2010 07:16 PM

Quote:

Originally Posted by Electro (Post 4158868)
The MTBF and MTTF are not the same for each manufacture, so you could use those specs to find the most robust hard drive.

MTBF and MTTF tell you something about the product line, they tell you absolutely nothing about a given drive.

Think of it this way... people have a life expectancy of 75 years. As a population, that is the average (the MTTF). What does that tell you about how long I personally will live? Absolutely nothing. It tells you something about our species, but nothing about an individual.

MTBF and MTTF are population values, and each product line will have its own estimates. They are useful for the manufacturer, because they provide service information. For example, if a drive model has a 1,000,000 MTTF, and they sell 10,000,000, then ten drives (out of the population of 10,000,000) will be failing per hour. That's not at all useful to you as a consumer - other than to provide a general indication of reliability. Using it for more than that is an error.

Electro 11-14-2010 07:40 PM

Quote:

Originally Posted by macemoneta (Post 4158874)
MTBF and MTTF tell you something about the product line, they tell you absolutely nothing about a given drive.

Think of it this way... people have a life expectancy of 75 years. As a population, that is the average (the MTTF). What does that tell you about how long I personally will live? Absolutely nothing. It tells you something about our species, but nothing about an individual.

MTBF and MTTF are population values, and each product line will have its own estimates. They are useful for the manufacturer, because they provide service information. For example, if a drive model has a 1,000,000 MTTF, and they sell 10,000,000, then ten drives (out of the population of 10,000,000) will be failing per hour. That's not at all useful to you as a consumer - other than to provide a general indication of reliability. Using it for more than that is an error.

MTTF and MTBF is a bunch a bull. It is great for marketing and paranoid idiots, but not for me. I buy hard drives from brands I prefer like Hitachi and Western Digital. Hard drives eventually fail, so backups are always a must. Saying they are not like you stating of MTTF and MTBF shows how stupid you really are.

Every manufacture has their own way of measuring MTTF and MTBF. That is my point that you can not understand.

You can say all you want, but comparing humans to electronics are two different things. Humans heal while electronics can not. There are a lot of people living longer than what the studies have shown.

macemoneta 11-14-2010 07:56 PM

Quote:

Originally Posted by Electro (Post 4158882)
MTTF and MTBF is a bunch a bull. It is great for marketing and paranoid idiots, but not for me. I buy hard drives from brands I prefer like Hitachi and Western Digital. Hard drives eventually fail, so backups are always a must. Saying they are not like you stating of MTTF and MTBF shows how stupid you really are.

What are you talking about? Where did I say anything about backups?

Quote:

Every manufacture has their own way of measuring MTTF and MTBF. That is my point that you can not understand.
I'm going to go ahead and guess that your major was not in statistics.

Quote:

You can say all you want, but comparing humans to electronics are two different things. Humans heal while electronics can not. There are a lot of people living longer than what the studies have shown.
Personally, I've never seen a dead person heal.

Electro 11-14-2010 11:48 PM

Quote:

Originally Posted by macemoneta (Post 4158900)
What are you talking about? Where did I say anything about backups?

You are too dependent and over rating MTBF and MTTF to a point that everybody is force to rely on MTBF and MTTF for any failure when these are just specs that does not relate to real world failure. There is no way to provide any specs for real world failure.

Yes, you did not say anything about backups. I am getting annoyed when people like you enforce MTBF and MTTF down everybody's throats when it is not understood. I understand the spec, but it seems you do not.


Quote:

Originally Posted by macemoneta (Post 4158900)
I'm going to go ahead and guess that your major was not in statistics.

It does not matter that I do or do not have a major in statistics does it. What matter is you do not like my opinions and they are actually true opinions. In fact MTBF and MTTF is calculated by a proprietary method for each manufacture. Sure you can state them the same, but it is far being the same. These specs is the same as stating a response times and contrast ratios for LCD. Statistics have shown that they are not the same from one brand and then to the next. Also statistics have shown that MTBF and MTTF does not relate to real world use.


Quote:

Originally Posted by macemoneta (Post 4158900)
Personally, I've never seen a dead person heal.

I have not seen a hard drive heal it self.


BTW, I already have this discussion in other threads and people still do not understand that I do not care for something like MTBF and MTTF. Sure it is something to think about, but it not something to be very, very paranoid about.

macemoneta 11-14-2010 11:55 PM

Quote:

I am getting annoyed when people like you enforce MTBF and MTTF down everybody's throats
Either we are having a language problem, or you need to stop posting in technical forums stoned.

H_TeXMeX_H 11-15-2010 04:38 AM

Ignore Electro, he likes to do these things.

jefro 11-15-2010 03:19 PM

@Kevin Benko, I can say that you can, if you wish, test this drive to find any short term failure. I base this on having retired from a large computer maker and before that being in military electronics providing me with actual real life situations where your suggestion has been used for decades.

As I stated before the test needs to be within the operating bounds of the actual projected use. If you were to exceed the temps and voltages/current or throwing the heads too far it could damage your hard drive. If you were to test to 10% of MTBF you'd be fine in my opinion.

Electro 11-15-2010 07:07 PM

Quote:

Originally Posted by macemoneta (Post 4159035)
Either we are having a language problem, or you need to stop posting in technical forums stoned.

I do not have problem with technical forums. People just have problem with opinions. I keep an open mind to think about MTBF and MTTF, but I am not paranoid about those specs as much as you are.

The point I am trying to make which you do not understand is MTBF and MTTF is manufactures have their own way of calculating these specs that suit their own hardware even though the hard drives seem same to you.

One thing you forgot is the health of the medium or the area where the data is actually stored. MTBF and MTTF does not tell you that. Getting information of the integrity of the medium is not easy. There are programs like Spinrite that checks the integrity. It checks based on ECC. All hard drives does ECC, but you have to run a program to find out how much the hard drive is doing ECC. ECC is another thing to think about just like MTBF and MTTF. These three factors are good to think about but it is not something to be paranoid about because eventually a hard drive will fail.

I an not going to stop because you can not control where I go. I take your opinions, but you do not take mine.

jefro 11-15-2010 07:50 PM

Gentlemen, please carry this on through PM's.


All times are GMT -5. The time now is 03:52 PM.