[SOLVED] Why wouldn't I want to fix an error with fsck?

Doug G · 07-12-2016, 11:35 PM

for non-RAID configurations, when I've had to recover disks, I pretty much follow JaredDM's advice. If there is the slightest suspicion of a hardware issue with the drive, I plan to immediately replace the drive. Life's too short to entrust stuff I care about to a suspect drive.

The firmware in modern disks will replace bad sectors automatically when writing, but may forever lose the original contents of the bad sector. fsck may then detect a filesystem error and be able to repair it, but the missing data is forever missing. If it happened to be in some important data file, you may not see a problem for days/months.

If there is a degenerative disk data problem, such as head contact, the problem will grow over time and disk usage. So I will NOT run fsck or any other data recovery software until I have backups in place.

I first attempt to make a full dd image of the drive, both to be a complete backup, and as the disk image to work with to attempt filesystem repair.

If making a full disk image fails, then I immediately try to backup everything important off the failing drive before starting any attempts at filesystem repair.

Finally, with good known backups in place, I will either attempt to repair the filesystem on the failing drive (using the image copy), or (more likely) just replace the drive with a new one.

Shadow_7 · 07-13-2016, 01:00 AM

I can think of one instance where auto-repair failed me. When using a usb docking station. On a laptop that had a non-working fan (for about a year, but never pushed the CPU much for it to be an issue). The usb ports started to fail, resulting in reported errors that may or may not have existed. But in it's attempt to repair said errors, it created errors due to the failing usb bus. It wasn't auto, but I pushed enter to repair figuring that it would only be a couple. A few pages later and I was off the enter key and realizing something had gone awry.

Beyond that, there are filesystems that are better handled with tools other than fsck. Especially if you're using something exotic or just not integrated into fsck fully. Perhaps xfs_repair or anything zfs, or anything that you've had to acquire by non-distro means. Throw in some encryption and the cut and dry simple times are behind us.

LukeRFI · 07-13-2016, 08:18 AM

Quote:

Originally Posted by jpollard

As I also said, or equivalent.

ddrescue is not even closely equivalent to the hardware tools we use to image a hard drive. ddrescue is the best option for those whose data isn't worth a few hundred bucks for a pro to get the best recovery possible, yet important enough to risk killing the drive to recover what they can.

For reference a great article written by DeepSpar, arguably the company who produces the best data recovery imaging systems in the world. This might not be of much interest to @jpollard, but for the rest of the forum members, it might prove to be an interesting read.

https://www.technibble.com/technical...ry-procedures/

Quote:

Those tools assume the disk heads and media are not physically damaged.

On the contrary, those tools assume that the heads and media are physically damaged or failing, thus why REAL data recovery professionals use these tools and have invested tens of thousands of dollars into these systems. It should be noted that in many cases, we will inspect the heads in the clean room, before we ever power the drive on.

Quote:

I'm surprised you don't know what a formatter card is then. It happens to be that little circuit card on the back of every disk made. The one handling head positioning, CRC checks, handles low level formatting...

I can only assume you are either using a term for an early version of the PCB from back in the 1970's or English is your second language. When I google "formatter card", the best I can come up with is a "formatter board" for an HP printer.

Quote:

Replacing heads is worthless if the media is actually damaged. You just end up damaging more heads.

Man, I must be stupid...I've replaced heads and recovered data from many drives with physical media damage. Yes, in many instances, we can burn through a few sets of heads. But with the professional data recovery equipment and facilities we use, as well as our brains, we are very good at what we do.

Quote:

The only way to check for that is disassembly of the unit to inspect each under a microscope. And both require a clean-room environment to do. I'm not even sure that can be done with some of the newer disks - they are requiring a helium environment for operation.

You forgot to mention that we take the platters out one by one and read them with our super cool universal platter reader.

I only reply to this for the benefits of those who might read this thread in the future. As with everything on the internet, you need to be careful about what information you read and believe. I see forums full of data recovery advice from many who seem to know less than those asking the questions. If you dropped your drive, it seems reasonable to think that any issues you have moving forward are physical and not something handled by software. If you are suddenly have weird file system issues, something must have caused it. If it is just the OS going stupid, it may be fixable with a file system repair. However, if the hard drive is the root cause, anything you do will only make things worse. So, to treat both as though the hard drive is failing is the safest course of action, IMHO.

DaneM · 07-13-2016, 11:41 AM

Thank-you very much to all who have replied. This information is bound to prove useful in the future. I hazard to say that some of it ought to be stickied, so those with data corruption problems will have a place to see the pros, cons, and varying viewpoints regarding data recovery. I'm marking the thread as [SOLVED], but please feel free to add any additional advice you have.

jpollard · 07-13-2016, 10:10 PM

Quote:

Originally Posted by Doug G

for non-RAID configurations, when I've had to recover disks, I pretty much follow JaredDM's advice. If there is the slightest suspicion of a hardware issue with the drive, I plan to immediately replace the drive. Life's too short to entrust stuff I care about to a suspect drive.

The firmware in modern disks will replace bad sectors automatically when writing, but may forever lose the original contents of the bad sector. fsck may then detect a filesystem error and be able to repair it, but the missing data is forever missing. If it happened to be in some important data file, you may not see a problem for days/months.

Which is true for any general degradation - though it helps in a raid configuration as the system does occasionally check all blocks for consistency. And then the rebuild of the block goes to a replacement sector and no error actually shows, other than in the smart data reports.

Quote:

If there is a degenerative disk data problem, such as head contact, the problem will grow over time and disk usage. So I will NOT run fsck or any other data recovery software until I have backups in place.

Which is a physical damage... and making backups can/will cause additional damage.

Quote:

I first attempt to make a full dd image of the drive, both to be a complete backup, and as the disk image to work with to attempt filesystem repair.

If making a full disk image fails, then I immediately try to backup everything important off the failing drive before starting any attempts at filesystem repair.

Once the drive is physically damaged, there is no where else to go. But ANY access to a physically damaged drive can cause additional damage.

Quote:

Finally, with good known backups in place, I will either attempt to repair the filesystem on the failing drive (using the image copy), or (more likely) just replace the drive with a new one.

Which is why raided filesystems are useful. The problem is that with the multi-TB disks in a raid situation, it can take many hours to days (depending on size) to rebuild. I seem to remember rebuilding my home filesystem (raid 1, 500GB) took about 4 hours. By that, my 9 TB (raid 5) would need 24 hours or more.

But the filesystem on top of the raid structures should never show any errors.

A system disk is usually considered throwaway as the operating system is easy to rebuild as long as you have configuration level backups. fsck on the OS filesystems is never an issue.

JaredDM · 07-13-2016, 11:58 PM

Quote:

Originally Posted by jpollard

Which is true for any general degradation - though it helps in a raid configuration as the system does occasionally check all blocks for consistency. And then the rebuild of the block goes to a replacement sector and no error actually shows, other than in the smart data reports.

Your understanding of how RAID works is beyond flawed. RAID does not do any occasional checking for consistency except on a few high end cards and even then you usually need to manually run the consistency check. Also RAID has nothing at all to do with re-mapping failed sectors, that's all handled internally in the individual hard drives, the control card is completely oblivious to it.

jpollard · 07-14-2016, 05:12 AM

Quote:

Originally Posted by JaredDM

Your understanding of how RAID works is beyond flawed. RAID does not do any occasional checking for consistency except on a few high end cards and even then you usually need to manually run the consistency check. Also RAID has nothing at all to do with re-mapping failed sectors, that's all handled internally in the individual hard drives, the control card is completely oblivious to it.

Does on Linux (well, at least with Fedora). It is scheduled to be done roughly once a month. After that, yes the drive does the replacement. I wasn't meaning the raid software did the replacement - it just finds the bad reads. The rebuild does a write...and that causes the drive to do the replacement (takes about 6 hours for a 9TB raid5).

Just one of the advantages of software over a hardware controller - it is easier to add functionality. My problem with hardware raid controllers (and I do have one - but disabled) is that very frequently they just don't work very well. The Linux software raid is just as fast, with little overhead, done in background below the priority of process initiated I/O.

jpollard · 07-14-2016, 05:50 AM

Quote:

Originally Posted by LukeRFI

ddrescue is not even closely equivalent to the hardware tools we use to image a hard drive. ddrescue is the best option for those whose data isn't worth a few hundred bucks for a pro to get the best recovery possible, yet important enough to risk killing the drive to recover what they can.

For reference a great article written by DeepSpar, arguably the company who produces the best data recovery imaging systems in the world. This might not be of much interest to @jpollard, but for the rest of the forum members, it might prove to be an interesting read.

https://www.technibble.com/technical...ry-procedures/

On the contrary, those tools assume that the heads and media are physically damaged or failing, thus why REAL data recovery professionals use these tools and have invested tens of thousands of dollars into these systems. It should be noted that in many cases, we will inspect the heads in the clean room, before we ever power the drive on.

If the connection is through a SATA connection without removal of the heads - yes it is assuming the heads are not damaged. Disassembly and inspecting the heads only assumes the media has not been damaged... You also have to inspect the media.

Quote:

I can only assume you are either using a term for an early version of the PCB from back in the 1970's or English is your second language. When I google "formatter card", the best I can come up with is a "formatter board" for an HP printer.

I can't help that Microsoft has dummed down general knowledge. Ask a manufacturer. Formatting disks and keeping up with the physical recording is what it does. Hence it was called a "formatter", Each drive has to have one. Yes, the are PCBs, but so is a motherboard.

Quote:

Man, I must be stupid...I've replaced heads and recovered data from many drives with physical media damage. Yes, in many instances, we can burn through a few sets of heads. But with the professional data recovery equipment and facilities we use, as well as our brains, we are very good at what we do.

burning through a few sets of heads also damages the media more (been there seen that too)r. Cleaning each patter helps and I was told also sometimes requires the platter to be carefully polished to clean the physical damage to reduce future additional damage before reading it.

Quote:

You forgot to mention that we take the platters out one by one and read them with our super cool universal platter reader.

Finally. Hopefully someone inspected the platter first - though it may be possible to have optical imaging tools recognize platter damage. This inspection and cleaning is what makes recovery expensive.

Quote:

I only reply to this for the benefits of those who might read this thread in the future. As with everything on the internet, you need to be careful about what information you read and believe. I see forums full of data recovery advice from many who seem to know less than those asking the questions. If you dropped your drive, it seems reasonable to think that any issues you have moving forward are physical and not something handled by software. If you are suddenly have weird file system issues, something must have caused it. If it is just the OS going stupid, it may be fixable with a file system repair. However, if the hard drive is the root cause, anything you do will only make things worse. So, to treat both as though the hard drive is failing is the safest course of action, IMHO.

System crashes due to power failures are well understood to be nearly always fixable. Current disk drives stop writing data (and have the power to finish an active one as well as pull the heads out before the disk looses too much spin) - but that can still leave the logical structure incomplete. Journaling filesystems attempt to cover these situations - and fsck on those does an automatic completion. In the few remaining cases, the problems are usually misidentified block allocations (allocated, but not yet added to a file). Sometimes a directory won't get updated - causing lost files. Again easy to fix with fsck.

MOST people know enough that a dropped drive is physically damaged. MOST people can identify when physical damage is likely to have occurred. SOME people can identify possible damage just by listening to the disk - and not just the "click click" of heads. The bearing rumble (missing, normal, or the grinding of a cracked bearing), and other sounds of a disk. Sometimes you need a stethoscope to hear - but good hearing really helps (also helps to be able hear at the high end of spectrum). Not everyone can do that, and those that do, don't necessarily have the experience to know what the sounds of a failure are. (It also helps to have grown up before heavy metal - your hearing is less likely to have been damaged :-)

WITHOUT physical damage, fsck is a reasonable tool to use.