LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Why superblocks go bad (https://www.linuxquestions.org/questions/linux-software-2/why-superblocks-go-bad-4175580235/)

pieterhouwen 05-20-2016 02:26 AM

Why superblocks go bad
 
Hi all,

I had a question regarding superblocks. At my workplace we routinely have failed hard drives, and when examining them I found out that most have bad superblocks. I do know how to remedy it, but I'm not sure what the actual cause is.

Thanks in advance!

-Pieter

zhjim 05-20-2016 03:18 AM

If a superblock goes bad it watched to many block busters :P

Honestly i think its mere coincidence that the superblock gets damage. If you have a harddrive failure propably all blocks are prone to the error. Just the superblock contains all important informations of the filesystem itself. Like size, i-nodes count, used and free space. But it should not be that big of a problem cause normaly you have more then one superblock per filesystem. IIRC you can also set the number of those when creating the fs or using the apropiate $[fs}tune command. The actual cause of the damage is the hardware failure.

business_kid 05-20-2016 03:19 AM

Did you pay us the courtesy of checking in Google first? Plenty up there for you.

pieterhouwen 05-20-2016 03:31 AM

@zhjim Thank you for your quick and insightful reply. That answered my question.

@business_kid Yeah, I did look for it on Google, but I couldn't find anything useful. Only on how to recover, not the cause.

hydrurga 05-20-2016 05:56 AM

Quote:

Originally Posted by pieterhouwen (Post 5548082)
Hi all,

I had a question regarding superblocks. At my workplace we routinely have failed hard drives, and when examining them I found out that most have bad superblocks. I do know how to remedy it, but I'm not sure what the actual cause is.

Thanks in advance!

-Pieter

Out of interest, in what way(s) are the superblocks bad? It could just be that they're inconsistent when compared to what fsck sees on the rest of the failed hard drive.

If you can give us more info on which superblock errors are reported by e2fsck, that might help us determine if the superblock itself is being corrupted or whether it is just becoming out of sync with the rest of the disk due to the corruption of inodes and data blocks.

pieterhouwen 05-20-2016 06:11 AM

I'll try to find out after the weekend, I currently have no corrupt drives to check

hydrurga 05-20-2016 06:11 AM

Quote:

Originally Posted by zhjim (Post 5548093)
If a superblock goes bad it watched to many block busters :P

Honestly i think its mere coincidence that the superblock gets damage. If you have a harddrive failure propably all blocks are prone to the error. Just the superblock contains all important informations of the filesystem itself. Like size, i-nodes count, used and free space. But it should not be that big of a problem cause normaly you have more then one superblock per filesystem. IIRC you can also set the number of those when creating the fs or using the apropiate $[fs}tune command. The actual cause of the damage is the hardware failure.

Assuming that everyone here is talking about ext2/3/4, just a note that there will always be backup superblocks, unless the filesystem is incredibly small or you are using the sparse_super2 feature and you set num_backup_sb to 0.

However, these backup superblocks are not there to duplicate the changing values in the superblock i.e. after creation, they are not updated each time the master superblock is updated on a day-to-day basis, unless you implement a structural change using tune2fs or resize2fs etc. They are there to keep a copy of the *structure* of your filesystem. So, what I am saying is don't depend on your backup superblock(s) being an exact copy of your master superblock - items such as s_free_blocks_count_lo/hi and s_free_inodes_count will probably be well out of date.

sundialsvcs 05-20-2016 07:32 AM

And, if you "routinely have failed hard drives," something's wrong at your workplace! Even in continuous duty, a drive mechanism should last for years.

('Scuse me ... what's that noise ... 'click click click' ... uh oh ...) ;)

Easily the most common cause of disk-drive failures is an even-very-slight instability in the incoming electrical power. Drives use direct-current synchronous motors which are absolutely intolerant of any variation in current. You should have beefy UPS = Uninterruptible Power-Supply boxes on everything. Yes, buy one for every secretary.

Their purpose is actually not to keep the lights on when the lights go out. The battery is there to provide the means to stabilize the current, in much the same way that a buffer-tank or column suppresses "hammer" surges in a water supply line.

Have an electrician review the wiring. Be sure, for example, that photocopiers, laser printers, and other power-hungry devices are on separate circuits from the computers. (They don't need UPSes, and will very-quickly chew them up.) These machines can send a nasty surge down the line every time they're start to do something ... especially the older copier-equipment that is based on mirror-optics and not scanners. (Which you should get rid of, anyway.) In poorly-designed and especially "repurposed" office buildings, the culprit could be the tenant next door.

The only reason why "superblocks" might be seen as "going bad," is that they're (of course) the blocks most-frequently written. Therefore, if the drive is going fishy, this is the block that you are most likely to realize has been corrupted ... especially given that corruption in that block can cut-off access to everything else.

Modern disk-drives have a built-in diagnostic capability, called SMART, in which the drive's own electronics keeps statistics, chooses on its own to "spare-out" questionable blocks, and so on. Linux {and all the others) does have tools that can read this information, and there are hardware-monitors (e.g. Nagios plug-ins) that can monitor it. You can often, in this way, discover a pending failure well before it happens. If you are proactively looking for such things . . .


All times are GMT -5. The time now is 11:13 AM.