How many bad sectors is too many?

nbritton · 02-01-2015, 12:41 PM

How many bad sectors is too many for a modern 2TB SAS drive in an enterprise environment?

veerain · 02-01-2015, 12:53 PM

Usually a HDD has some extra space to accommodate few bad sectors. You should better check hard drive health with a tool.

For ide/sata we use smartmontools. But for scsi/sas I don't know of one. May be smartmontools can be used for them also.

May be this page helps.

Soadyheid · 02-02-2015, 09:21 AM

Disks usually have two spare alternate cylinders to be used when re-vectoring bad blocks. (Well, they did in the "Bad Ol' Days!) You'd need to know the disk's geometry to get an idea of how many sectors/blocks are available to allow you to gauge what is "too many"

A cylinder will be the number of sectors on one platter surface X the number of heads. I reckon a 1Tb disk is going to have a rather large number. I don't know what the manufacturing process is like now, (glass platters, etc?) but even a "new" disk had a bad block file as they couldn't guarantee a 100% defect free disk way back in nineteen oatcake.
So, if your disk shows a large number (thousands for a 1Tb) it doesn't mean it's got to be replaced... If the bad block count is increasing rapidly though, it does!

I'm sure there are folk out there with a greater knowledge of modern disks and practices than I have who will correct this if it no longer holds true.

Play Bonny!

metaschima · 02-02-2015, 04:25 PM

SMART will tell you how many is too many on average.

jefro · 02-02-2015, 08:17 PM

While I'd be tempted to always agree that smart is the normal answer. Since the question involves enterprise level then I'd say that you are the person that says what is too many. Your use, your level of data security and the use of this disk determines what is called for. It could be that 4 is too many.

Soadyheid · 02-03-2015, 04:21 AM

Quote:

Since the question involves enterprise level then I'd say that you are the person that says what is too many. Your use, your level of data security and the use of this disk determines what is called for. It could be that 4 is too many.

Having worked in an Enterprise environment, we'd generally never replace a disk which had logged a few bad sectors. We always monitored it for about a week, max, to see if the bad sector count increased, and if so at what sort of rate. The disks are designed to re-vector bad sectors to maintain data integrity. If it starts clocking an increasing number of faulty sectors then by all means replace it.

For one or two bad sectors, you also have to take into consideration the cost of the replacement disk (negligible) but more importantly the risk involved in its replacement weighed against leaving it on monitor. Is it a single "Hot Swap" disk or do you have to pull a tray of say ten disks to access the one you want to change?

Replacing a disk invariably involves going through a change management process which involves Change Boards, Software Analysts (OS & Apps), DBAs, Hardware engineers, etc. So in general the fact that the disk has logged a couple of bad blocks which it is designed to handle transparently has to be weighed against the risk of disturbing the environment it "lives" in; the other disks in the array on the same SCSI bus, the array controller, the state of the DR (Disaster Recovery) partner system, the system load over a set period (You'd replace the disk when it's "quiet") and even in some cases, the weather, I'm thinking Utility Companies who don't want maintenance happening when they're coping with storm force winds and having to cope with disruption to their distribution networks. Financial Services restrict maintenance in line with the demand on their businesses, Month End, Holidays, "Black Friday" events, etc.

At the end of the day it's all down to Risk.

Play Bonny!

jefro · 02-03-2015, 05:04 PM

I say that the OP is still the best person to answer this. I have a lot of servers and systems that I'd never change a disk until it burns up. There are a very few that I'd change it on the spot.

nbritton · 02-03-2015, 10:43 PM

I tested 9,072 2TB SAS drives and only 286 (3.15%) had sector errors reported by /sbin/badblocks. Of the drives with sector errors the number of bad sectors typically ranged from 4 (Q1) to 16 (Q3), with a median of 8. Values above 25 were statistical outliers, meaning they were more than 3 standard deviations off the bell curve.

Click image for larger version

Name: Screen Shot 2015-02-03 at 10.39.26 PM.png
Views: 269
Size: 46.3 KB
ID: 17509

Soadyheid · 02-04-2015, 06:08 AM

@nbritton

Thanks for the stats, things certainly seem to have moved on since my coal fired, steam powered, chain driven 2.1 and 4.3Gb SCSI disks!

Play Bonny!

jefro · 02-04-2015, 03:39 PM

You need to use or at least look at smart ratio's for that model. Use OEM's test suite for full diags and tests. Then you need to decide if you can live with the potential data loss.

When it doubt, change it out.