Offline uncorrectable sectors
I'm running centos 5.9 server on an em350 netbook and on startup I get a warning:
Device: /dev/sda [SAT], 1 Offline uncorrectable sectors is there any way to fix this? the machine is command-line only (except for webmin which is installed). |
I'd suggest that you boot to the OEM hard drive diags first. Then decide which way to go.
It could be any number of issues but most likely some disk problem. The fix is not really reliable. Any time you have data errors, there is no way to trust the rest of the data. You'd have to compare backup to the current data or use last known good backup for resolution. |
If you have sensitive data on it or its destined for something important, change the disk.
You can do the following: install smartctl and do: Code:
smartctl --attributes /dev/sda There are temporary fixes such as rewriting (non-destructive) the whole disk a few times - i had bad sectors go away like that. But sometimes they came back. Used these as base: http://www.sjvs.nl/forcing-a-hard-di...e-bad-sectors/ http://www.cyberciti.biz/faq/recover...ted-partition/ http://www.howtogeek.com/howto/37659...isk-utilities/ Particularly the (***destructive!!!***) write-sector deemed efficient every time when the drive wasnt done for good. Be aware, it will mess the file system up to a certain extent (i was lucky, but you just might lose stuff, do a full backup with all you have there!!!). Also, the badblocks command was very useful - you can get it to rewrite your whole disk with the data that was prevously on it non-destructively - this sometimes makes bad sectors go away , but at least you will have all the bad/unreadable sectors name in the dmesg to feed to the write-sector command. Make sure the badblocks command is used offline (boot the thing from a usb drive or something with a live image and do the operations from that. |
I'd replace the disk of possible but my understanding is most disk have spare blocks and your manufacturers tool should remap around the bad sectors. However, bad sectors seem contagious and generally means doom is on the way so I would look to replace the disk as soon as possible.
|
thanks for your response. I did smartctl --attributes /dev/sda which gave me:
Quote:
|
Any reason you don't want to try the factory diags?
|
yeah it means rebooting the server into the OEM hard drive diags and that means server downtime = no email, website and other important functions.
|
Quote:
either way, a stitch in time saves nine as the saying goes, if the hard drive is failing you should know because how much downtime do you think a dead hard drive is going to cost you? |
I understand your point about server downtime sometimes being unavoidable but I have full image backups and like I said in an earlier post, I don't think this is a failing drive problem. I got it after restoring an image to the drive, and because the drive wasn't 'exacttly' the same size as the image expected I got this error. Plus I tested the drive before using it and it was 100%, so for the meantime I'd like to hang on for any way to fix this in place.
|
the catch however is that the kind of checks that seem to be necessary would require a lower level access to the drive than is perhaps possible while there is data on the drive in use, as it would be a risk of corrupting said data, this is the same reaon you can't fsck a mounted volume, data can be corrupted if it's being changed as it's being scanned. If it were my server I'd just bite the bullet and take it off line.
|
I don't think this has anything to do with restoring an image, this is low level harddrive. If the drive is hot-swappable you can pull it and use another machine to run th diagnostics but the remapping is done in the hard drives firmware in my understanding.
Code:
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1 |
All times are GMT -5. The time now is 02:04 PM. |