Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: Cinnamon Mint 20.1 (Laptop) and 20.2 (Desktop)
Posts: 1,672
Rep:
Disks usually have two spare alternate cylinders to be used when re-vectoring bad blocks. (Well, they did in the "Bad Ol' Days!) You'd need to know the disk's geometry to get an idea of how many sectors/blocks are available to allow you to gauge what is "too many"
A cylinder will be the number of sectors on one platter surface X the number of heads. I reckon a 1Tb disk is going to have a rather large number. I don't know what the manufacturing process is like now, (glass platters, etc?) but even a "new" disk had a bad block file as they couldn't guarantee a 100% defect free disk way back in nineteen oatcake.
So, if your disk shows a large number (thousands for a 1Tb) it doesn't mean it's got to be replaced... If the bad block count is increasing rapidly though, it does!
I'm sure there are folk out there with a greater knowledge of modern disks and practices than I have who will correct this if it no longer holds true.
While I'd be tempted to always agree that smart is the normal answer. Since the question involves enterprise level then I'd say that you are the person that says what is too many. Your use, your level of data security and the use of this disk determines what is called for. It could be that 4 is too many.
Distribution: Cinnamon Mint 20.1 (Laptop) and 20.2 (Desktop)
Posts: 1,672
Rep:
Quote:
Since the question involves enterprise level then I'd say that you are the person that says what is too many. Your use, your level of data security and the use of this disk determines what is called for. It could be that 4 is too many.
Having worked in an Enterprise environment, we'd generally never replace a disk which had logged a few bad sectors. We always monitored it for about a week, max, to see if the bad sector count increased, and if so at what sort of rate. The disks are designed to re-vector bad sectors to maintain data integrity. If it starts clocking an increasing number of faulty sectors then by all means replace it.
For one or two bad sectors, you also have to take into consideration the cost of the replacement disk (negligible) but more importantly the risk involved in its replacement weighed against leaving it on monitor. Is it a single "Hot Swap" disk or do you have to pull a tray of say ten disks to access the one you want to change?
Replacing a disk invariably involves going through a change management process which involves Change Boards, Software Analysts (OS & Apps), DBAs, Hardware engineers, etc. So in general the fact that the disk has logged a couple of bad blocks which it is designed to handle transparently has to be weighed against the risk of disturbing the environment it "lives" in; the other disks in the array on the same SCSI bus, the array controller, the state of the DR (Disaster Recovery) partner system, the system load over a set period (You'd replace the disk when it's "quiet") and even in some cases, the weather, I'm thinking Utility Companies who don't want maintenance happening when they're coping with storm force winds and having to cope with disruption to their distribution networks. Financial Services restrict maintenance in line with the demand on their businesses, Month End, Holidays, "Black Friday" events, etc.
I say that the OP is still the best person to answer this. I have a lot of servers and systems that I'd never change a disk until it burns up. There are a very few that I'd change it on the spot.
Distribution: Red Hat Enterprise Linux, Mac OS X, Ubuntu, Fedora, FreeBSD
Posts: 89
Original Poster
Rep:
I tested 9,072 2TB SAS drives and only 286 (3.15%) had sector errors reported by /sbin/badblocks. Of the drives with sector errors the number of bad sectors typically ranged from 4 (Q1) to 16 (Q3), with a median of 8. Values above 25 were statistical outliers, meaning they were more than 3 standard deviations off the bell curve.
You need to use or at least look at smart ratio's for that model. Use OEM's test suite for full diags and tests. Then you need to decide if you can live with the potential data loss.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.