What software do server farms use to be able pull out dead HDDs?
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
What software do server farms use to be able pull out dead HDDs?
Hello people!
Server farms like Google and other big ones have tonns and tonns of HDDs working together and they have many HDDs die per day. If I'm not mistaken, IT wokers there just pull out dead HDDs from server racks and replace them with new ones on the fly, with no need to turn the servers off. Apparently they don't even care about on which disks their data is stored. It's like RAID but much more sophisticated than, say, RAID5 or RAID6 and one piece of data is stored on several HDDs spreaded all over server farm which may cover several huge buildings and even network of several buildings... At least it's how I understand it. Is it so? And if yes — what software do they use for this approach (to detect dead HDDs, their location and replace them on the fly)?
I believe you're broadly correct, but that they store the same info in more than one place and no-one worries if a fact does go missing; google search is not supposed to be definitive, much less ACID compliant.
The google spider bot will find it again ....
Google apps (docs, email etc) are different.
There's any num of ways of checking for bad disks; usually via snmp and some systems will notify you when a disk dies.
They probably don't bother for this app, but you can get HW RAID that allows hot swapping of disks and the OS never even notices
The disk must support hotplug, sata disk are supposed to be hotpluggable. The disk subsystem also must support hotplug. Mdraid also supports hotplug. For the os to support hotplug raid is required.
Hello people!
Server farms like Google and other big ones have tonns and tonns of HDDs working together and they have many HDDs die per day. If I'm not mistaken, IT wokers there just pull out dead HDDs from server racks and replace them with new ones on the fly, with no need to turn the servers off. Apparently they don't even care about on which disks their data is stored. It's like RAID but much more sophisticated than say RAID5 or RAID6 and one piece of data is stored on several HDDs spreaded all over server farm which may cover several huge buildings and even network of several buildings... At least it's how I understand it. Is it so? And if yes — what software do they use for this approach (to detect dead HDDs, their location and replace them on the fly)?
Yes, and when you say there's one piece of data spread out over multiple drives...that IS what RAID is. And they don't use software (per se), to do this, they use SAN's. The 'disk' is presented to a server via a fiber channel host-bus adapter (HBA). What that disk is, depends on how the SAN administrator presents it. It could be one part of one disk, one whole disk, or 20 whole disks, split into four array's of 5. The operating system will 'see' one disk. That's it. All the hot-plug and failover happens in the SAN....the OS never knows.
You can buy cheap (relatively), hardware RAID systems, and define an array with a hot-spare drive, and the OS will notice that a drive has failed, but the system will keep running. You can then. swap out the failed drive, and it will rebuild the array, and put the spare back into its previous state.
Servers are commodity-class x86 PCs running customized versions of Linux. The goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. How this is measured is unclear, but it is likely to incorporate running costs of the entire server, and CPU power consumption could be a significant factor.[2] Servers as of 2009-2010 consisted of a custom made open top systems containing two processors (each with 2 cores[3]), a considerable amount of RAM spread over 8 DIMM slots housing double height DIMMS, and two SATA hard drives connected through a non-standard ATX sized power supply.[4] According to CNET and to a book by Hennessy, each server has a novel 12 volt battery to reduce costs and improve power efficiency [3][5]
In terms of detecting when a hard drive might fail, the utilities in the smartmontools package can be used, in particular the online self-test commands.
There are also distributed file systems (e.g. PVFS, Lustre) that automatically distribute data and metadata over multiple servers in a cluster. Google is using something like this, but they've written their own (if you search for GoogleFS, you should be able to find some high level descriptions of it).
Google is using something like this, but they've written their own (if you search for GoogleFS, you should be able to find some high level descriptions of it).
Thanks for information.
There were some news about Google using ext4 (probably with their own modifications). They switched from ext2 without using ext3.
And, besides, I'm not only talking about Google. What about Twitter, Tumblr, Facebook, MySpace, Blekko, Wikipedia, Linkedin, PayPal, IBM, Intel, AMD...?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.