badblocks vs. disk cache?

gd2shoe · 06-28-2004, 01:49 AM

This isn't a problem, so much a curiosity that I can think of no other way to answer (seeing that I'm no great programmer as yet, can't check the source).

As I understand it, badblocks writes a series of patterns to the device (read: hard drive) and reads these back in order to find blocks on the disk that don't properly record data.

On the other hand, hard drive manufactures have discovered that a good deal of information requested from the drive is repetitive. To exploit this and make our drives respond faster

, they have started installing cache directly on the drive itself.

Now, put A and B together. Could a bad sector be masked unintentionally by this cache? Pretend badblocks were to write a pattern, which gets written to cache and the disk. The disk records it wrong, but the correct pattern is still in cache. badblocks asks for the pattern back and gets the appropriate response. The program is then oblivious to the problem.

Associated questions:

The -n option does a piece at a time, while keeping your data safe. Does the -w option do the whole partition at once (being larger than the cache, less likely to cause problems) or does it too do only a little at a time?

Is there a way for software to tell the drive to stop using cache?

I realize drives today are much more reliable than they used to be. But eventually todays drives are going to get old. Is there a problem? Is there a solution?
Or am I just naive?

Electro · 06-28-2004, 02:37 AM

Its unlikely that the cache will go bad before the magnetic medium. I think there are programs to turn off hard drive cache.

IMO, if its a bad hard drive. Replace it. If there are very, very valuable files on the hard drive, send it to the data recovery service. Some data recovery service takes apart the hard drive and then uses a laser to read data.

If you are making backups, you should not be worrying.

gd2shoe · 06-28-2004, 02:58 AM

Sorry, not trying to sound rude, but it doesn't seem that you read the scenario carefully. I'm not concerned about the cache going out. I would give badblocks a good chance to catch that (but an interesting possibility in itself). I'm concerned (hypothetically) about the platters themselves going out. And NOT just going out, but going out UNDETECTED, because cache may interfere with the badblocks check. This could make problems very difficult to diagnose. Imagine an undetectable bad sector on the swap partition...

And yes, I know to make backups.
And yes, I realize there are data recovery services.

Though both of these do bear repeating

Electro · 06-28-2004, 03:48 AM

Any disk utility program that you run, may not detect or do everything correctly. You have to trust it, manually fix it, or manually do a change yourself. Running badblocks goes deep into the low-level (machine langauge) to scan the drive or partition. You can either trust what it displays or ignore what it displays.

Will you trust a program like Partition Magic to split your primary partition or will you trust yourself spliting the primary partition.

There a lot of other things to worry about. This little thing should not be put on the worry list.

TuckChodd · 04-30-2010, 06:33 PM

I know this post is old but I stumbled across while wondering the exact same thing that the OP did.

Electro, your reply is so unbelievably worthless you should be ashamed of yourself and seriously consider never posting anything on the internet ever again.

If you don't understand a question, reread it, research the content, etc until you do. If you still don't then NEVER post a response. If you feel that you must still post anyway (maybe you just love to see your username on a webpage or so) the only acceptable response is "I don't know, sorry".

The OP's question was entirely valid and I'm sure he would like to smack you in the face with a hard drive or strangle you with a SATA cable.

Now please go back to drooling on yourself.

gd2shoe · 11-24-2010, 02:07 AM

That's about how I felt at the time. I'm still hoping to find a real answer somewhere.

You'll note that I didn't say anything. After two posts, it was very clear that he wasn't going to understand me, and I was only going to frustrate myself. Insulting him directly wasn't going to get any more intelligent a response.

Truthfully, your response to him 6 years later is kinda silly (much in the same way my response to you is a bit silly 7 months later). Who knows, maybe he's grown up a bit by now.

Incidentally, badblocks scans 64K at a time by default. Perhaps increasing that to a size larger than the drives cache might clear old data. It's far from a guarantee, but maybe it's a start. This is all assuming badblocks doesn't disable or bypass the onboard cache somehow (my current belief).

And remember, if memory is bad, badblocks will trash your data. I highly recommend memtest prior to badblocks.

djsmiley2k · 11-24-2010, 02:26 AM

Code:

man badblocks

- it has options to turn the cache off.

As for the cache confusing badblocks, when its cached the hdd has no way of knowing if the "new" block you'll read is identical to the one it has in cache. I think the cache simply works by addresses, so if your requesting data from the same address over and over, it'll pull from cache.

Think is the important word there, I have a feeling this might be the kind of thing that HDD manufacturers keep secret.

For example, 3 blocks:

a: 111
b: 111
c: 000

a is requested, returned hdd and placed into cache
b is requested, returned from hdd and placed into cache <---- There is no way for the hdd to know the content of b is exactly the same as a without first reading it from the platter (Is there?)
c is requested, returned and pushes a out of cache
b is requested, b is returned from cache
a is requested, returned from hdd and pushes c out of cache

Hope that makes sense.

gd2shoe · 11-24-2010, 03:19 AM

Nope, it doesn't.

Quote:

Code:

man badblocks

- it has options to turn the cache off.

Say what? Not last I checked, and the following command returns nothing. In other words, the word "cache" is not mentioned once in the man page. I'd be interested to know what your man page says.

Code:

man -P cat badblocks | grep -i cache

I think you're a little confused either about how badblocks works, or cache, or both.

Let's assume there is no cache, or that cache is disabled. Badblocks sends several blocks to the hard drive to be stored. The drive sees these blocks, and begins the slow process of writing them to the platters. When it's finished, badblocks requests those same blocks from the drive. The drive goes back to the platters and reads the data. Now badblocks can compare what was sent to the drive versus what the drive actually stored. If there's a discrepancy, the drive cannot be trusted.

Now let's assume for a minute that badblocks does not or cannot disable cache. Badblocks sends several blocks to the hard drive to be stored. The drive sees these blocks, and stores them in cache. It writes them to the platters at it's leisure. Now badblocks requests those same blocks from the drive. The drive sees no reason to wait for the slow mechanical platters. It's seen this data recently; it's still in the cache. It even might not be magnetically encoded yet. The drive sends the cached copy back to badblocks. Cache has made the whole process much, much faster, but has hidden any problems that badblocks is designed to find.

In diagram form:

Code:

badblocks 0000 -> cache 0000 -> platter 1111 (bad drive, data incorrectly recorded on platter)
badblocks 0000 <- cache 0000 -- platter 1111 (platter is never checked, badblocks passes incorrectly)

I hope I'm being clear. (I have a bad habit of posting later than I should.)

catkin · 11-24-2010, 04:30 AM

Quote:

Originally Posted by gd2shoe

I hope I'm being clear. (I have a bad habit of posting later than I should.)

I understand your concern and have examined the current badblocks source code. I do not understand every line of it but saw nothing to suggest any special measures to ensure that data written to the HDD and read back has been written to the platter(s) rather than the HDD's cache. It does use O_DIRECT on the open call thus bypassing Linux's file buffers and it may be that when the open() syscall is used this way that the kernel's interaction with the hardware tries to bypass the hardware's buffer/cache too.

H_TeXMeX_H · 11-24-2010, 07:45 AM

The scenario mentioned is highly unlikely, because:

1) The cache is small compared to the size of the drive.
2) The chances of you requesting a block that has been cached as good while the disk is bad are very low.
3) There are probably internal ECC methods that prevent the above scenario by not using the cache if it is bad. I'm not sure about this, but this seems like a good thing to do.
4) I would think there is a way to flush this cache ... is there ?

djsmiley2k · 11-24-2010, 07:48 AM

Whoops, its in e2fsck you can disable the cache, and also run a badblocks check at the same time. Hense my confusion.

catkin · 11-24-2010, 09:02 AM

Quote:

Originally Posted by djsmiley2k

Whoops, its in e2fsck you can disable the cache, and also run a badblocks check at the same time. Hense my confusion.

badblocks is part of the e2fsck suite and uses its libraries so it would be the only sane design choice to disable the cache when running badblocks and restore it to the original setting afterwards.

H_TeXMeX_H · 11-24-2010, 11:59 AM

Quote:

Originally Posted by catkin

badblocks is part of the e2fsck suite and uses its libraries so it would be the only sane design choice to disable the cache when running badblocks and restore it to the original setting afterwards.

I see, so there is a way to disable cache completely, well there you go, that solves it.

gd2shoe · 11-24-2010, 01:53 PM

It must exist therefore it does exist? That's an interesting fallacy. Granted, it's the one I've been using on this issue.

Quote:

The scenario mentioned is highly unlikely, because:

1) The cache is small compared to the size of the drive.

That only applies to -w mode, not to -n mode. If you're unwilling (or hesitant) to erase your data during the scan, you're stuck doing -b*-c bytes at a time. By default, that means scanning 64K at a time, far, far smaller than any drive cache out there. ((2)is unsubstantiated, (3) is irrelevant, but (4) is interesting)

Quote:

Whoops, its in e2fsck you can disable the cache, and also run a badblocks check at the same time. Hense my confusion.

Code:

man e2fsck
-F      Flush the filesystem device's buffer cache before beginning.

Understandable. The e2fsck page is a little ambiguous. It might refer to the kernel RAM buffers associated with the device. (Some part of which is referred to as cache; compare to the "free" command. Device in this context could easily refer to the the /dev block file)

If there is a way to instruct the drive to flush its cache before each read, and if badblocks uses it (or if open() with O_DIRECT does), it would certainly be a solution.

I'm no longer in the position of refurbishing old machines on a semi-regular basis, but I do occasionally need badblocks. I'm just happy to get some on-topic responses this time.

gd2shoe · 11-24-2010, 02:33 PM

I hereby coin:

Quote:

Is must sit ergo is est
(It must be, therefore, it is)

It's only a true fallacy when "must" is debatable, such as when humans are involved...
(Anyone who actually knows Latin better than Google translate should feel free to correct it.)

Sorry, just having fun.