Quote:
Originally Posted by zpimp
for ide/sata we have smart but not for servers
|
Yes, we do. All modern hard drives support S.M.A.R.T., but on larger storage systems the S.M.A.R.T. status is checked by the controller firmware, so you would use the RAID controller management software rather than
smartd.
Quote:
Originally Posted by zpimp
i know there are some proprietary stuff, wich work sometimes
but for sas hdd , hardware raid i dont know
|
If by "proprietary stuff" you're referring to management software from the likes of HP, Dell, Fujitsu, LSI, Areca and others, I'm happy to report that it all works wery well.
Quote:
Originally Posted by zpimp
the reason i ask, is because i see some poor bastards, whose hdds are failing
and only thing they do is pray
|
I've seen that too, and it's most unfortunate. Some individuals seem to believe that hard drives aren't subject to wear and tear, even though the manufacturers' MTBF data are readily available, and companies like Backblaze routinely publish reports with empirical data on the failure rates of hard drives of various types and sizes.
But you started off talking about servers. I hope you haven't seen a server going down because of faulty hard drives? Because that would indicate a seriously incompetent system administrator.
Quote:
Originally Posted by zpimp
i read about predictive failure analysis
but i dont know much about this
|
S.M.A.R.T. does a decent job when it comes to predicting failure, but not before a number of sectors have gone bad. In a non-RAID setup that means data has probably been lost.
Of course, if the drive suffers a sudden, catastrophic failure (head stopper coming loose, bearings seizing, electronics malfunctioning), S.M.A.R.T. will be no help at all.
Quote:
Originally Posted by zpimp
i cant accept you can prevent losing data on pcs but not on servers with raid
|
You don't really believe that's the case, do you? Server manufacturers solved that problem many years ago with management and monitoring software.
Here's what you do:
Make sure the management software for your server/controller is configured to send notifications whenever a drive fails. If you use software RAID,
mdadm in monitor mode and
smartd will get the job done.
Make sure at least two different notification mechanisms are used to minimize the risks of silent failure.
Make sure the RAID set is verified/scrubbed regularly to catch growing defects ("bit rot") on rarely used areas of the disks. Most hardware RAID controllers can do either scheduled or continuous background/idle scrubbing.
Consider installing an online spare if there are no systems administrators on-site.
Consider using RAID 6 if the RAID set is large enough that there's a real risk of a second drive failing during rebuild.