LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   fsck is not removing bad blocks, why not? (https://www.linuxquestions.org/questions/linux-general-1/fsck-is-not-removing-bad-blocks-why-not-629467/)

exceed1 03-20-2008 10:26 AM

fsck is not removing bad blocks, why not?
 
Hi

I have a dedicated server im running (a private one, not running anything big). I took a look at the log files the other day and noticed that the harddrive had some bad blocks. I then got the system to runlevel 1 to issue an fsck.ext3. I added the "-c" option and the "-y" option to add the blocks to a bad blocks list and get the problems fixed. After this i ran fsck.ext3 again with the "-y" and "-f" option since fsck.ext3 gave information about the filesystem being clean, so i had to force the check, the fsck tool still said that the harddrive had 55 bad blocks.

Hmm, i thought, that was weird. As it says in the manual page for "fsck", the "-p" option should automatically fix any errors and the "-c" option should add the bad blocks to a bad blocks list. My question is, when i now run fsck again, it still says that there are 55 bad blocks... why isnt fsck fixing the errors ? (the filesystem is unmounted).

Any help is appericiated :)

MS3FGX 03-20-2008 11:51 AM

You can't fix a bad block, it is permanently destroyed on the physical disk. 55 blocks is quite a bit, the drive is definitely no longer safe to use.

The purpose of listing them is so that the system will be aware of which ones are dead so that the filesystem can still be read from to get all of your data off of it. Getting bad blocks on a drive is a sign of (generally) imminent failure, you need to backup everything as soon as possible (limit use of the drive until you have recovered everything) and replace it.

You may be able to recover some of the data that was on those blocks, but unless it was very important your best course is to just get what you can still easily copy off before the drive stops working completely.

marozsas 03-20-2008 12:33 PM

I don't know. :)

But you can try do run fsck using the non-destructive write test instead the simple read test. Just use double c in fsck "fsck -cc ...".

Before you do that, get the current list of badblocks and compare it with the same list when the fsck is done in the second time.

Code:

dumpe2fs -b /dev/your-block-device > /somewhere/in/other/filesystem.before-fsck-cc
fsck -cc ...
dumpe2fs -b /dev/your-block-device > /somewhere/in/other/filesystem.after-fsck-cc

In this way you can check if fsck had marked some blocks as bad or not.

exceed1 03-20-2008 12:34 PM

Thanks for your reply, it was very interesting.

My question is now, when the blocks have been marked as bad by the badblocks program (since fdisk wouldnt mark them as bad blocks i had to use the badblocks program), why should the disk fail completely, why cant it just continue to work like normal when it now knows that it shouldnt read or write anything to these blocks on the disk?

jailbait 03-20-2008 01:32 PM

Quote:

Originally Posted by exceed1 (Post 3095140)

Thanks for your reply, it was very interesting.

My question is now, when the blocks have been marked as bad by the badblocks program (since fdisk wouldnt mark them as bad blocks i had to use the badblocks program), why should the disk fail completely, why cant it just continue to work like normal when it now knows that it shouldnt read or write anything to these blocks on the disk?

How bad blocks are handles on your hard drive can vary depending on how old the drive is. The current method is this:

There are spare blocks at the end of the hard drive. When a block becomes defective the hard drive's firmware assigns one of the spare blocks to replace the bad block. When your cpu accesses a bad block the firmware automatically converts the access to the spare block. This scheme works until all of the spare blocks are in use. Once you have more bad blocks than spares you get into the situation where bad blocks cannot be fixed.

--------------------
Steve Stites

exceed1 03-20-2008 02:09 PM

thanks for the reply, it was very informative :)

when i now ran the badblocks program it says that it cant find any bad blocks, but when i run fsck it says that there are 55 bad blocks. also, when i checked the logs for some time ago (/var/log/messages and /var/log/syslog) it said that there were bad blocks. which tool is correct here? i have also been told that the badblocks program is better to find bad blocks on the HDD than fsck, is that correct?

output from the tools:
badblocks:
"Pass completed, 0 bad blocks found."

fsck:
"....other info.."
"55 bad blocks"
"...more info.."

jailbait 03-20-2008 04:28 PM

Quote:

Originally Posted by exceed1 (Post 3095208)

when i now ran the badblocks program it says that it cant find any bad blocks, but when i run fsck it says that there are 55 bad blocks. also, when i checked the logs for some time ago (/var/log/messages and /var/log/syslog) it said that there were bad blocks. which tool is correct here? i have also been told that the badblocks program is better to find bad blocks on the HDD than fsck, is that correct?

output from the tools:
badblocks:
"Pass completed, 0 bad blocks found."

fsck:
"....other info.."
"55 bad blocks"
"...more info.."

I don't know which program does the best job of diagnosing bad blocks. The hard drive manufacturers have bootable diagnostic diskettes available for download which will do destructive testing on your hard drive and assign spares to bad blocks. The last time I had this problem about 6 years ago I used one of their diagnostic diskettes to straighten out the problem. I also remember a time when I used one of the diagnostic diskettes and found out I was out of spare blocks and the drive was kaput.

If all of the bad blocks are clustered near each other you can also cure the problem by partitioning the hard drive so that all of the bad blocks are in free space not allocated to any partition.

------------------------
Steve Stites

exceed1 03-20-2008 05:10 PM

Ok. Ill check it out.

I tried to check the health of the disk with smartctl and the output i got was this:
"SMART overall-healt assessment test result: PASSED"

Is this information i can trust or does smartctl check other parts of the disk and not everything?

MS3FGX 03-20-2008 08:11 PM

SMART testing is little more than an educated guess. I have had completely dead drives pass SMART tests in the past, and perfectly functional ones fail.

exceed1 03-21-2008 08:07 AM

Hmm, ok, so the smart tool isnt something you should use or at least be aware that the status from the test can be wrong.


All times are GMT -5. The time now is 09:24 PM.