LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 06-19-2013, 04:03 AM   #1
tonj
Member
 
Registered: Sep 2008
Posts: 301

Rep: Reputation: 22
Offline uncorrectable sectors


I'm running centos 5.9 server on an em350 netbook and on startup I get a warning:
Device: /dev/sda [SAT], 1 Offline uncorrectable sectors
is there any way to fix this? the machine is command-line only (except for webmin which is installed).
 
Old 06-19-2013, 02:44 PM   #2
jefro
LQ Guru
 
Registered: Mar 2008
Posts: 13,138

Rep: Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665
I'd suggest that you boot to the OEM hard drive diags first. Then decide which way to go.

It could be any number of issues but most likely some disk problem.

The fix is not really reliable. Any time you have data errors, there is no way to trust the rest of the data. You'd have to compare backup to the current data or use last known good backup for resolution.
 
Old 06-20-2013, 04:14 PM   #3
gradinaruvasile
Member
 
Registered: Apr 2010
Location: Cluj, Romania
Distribution: Debian Testing
Posts: 539

Rep: Reputation: 105Reputation: 105
If you have sensitive data on it or its destined for something important, change the disk.

You can do the following: install smartctl and do:

Code:
smartctl --attributes /dev/sda
Look for Reallocated_Sector_Ct, Current_Pending_Sector, Offline_Uncorrectable to be 0. If not, some issues will happen sometimes.
There are temporary fixes such as rewriting (non-destructive) the whole disk a few times - i had bad sectors go away like that. But sometimes they came back. Used these as base:

http://www.sjvs.nl/forcing-a-hard-di...e-bad-sectors/

http://www.cyberciti.biz/faq/recover...ted-partition/

http://www.howtogeek.com/howto/37659...isk-utilities/


Particularly the (***destructive!!!***) write-sector deemed efficient every time when the drive wasnt done for good. Be aware, it will mess the file system up to a certain extent (i was lucky, but you just might lose stuff, do a full backup with all you have there!!!).

Also, the badblocks command was very useful - you can get it to rewrite your whole disk with the data that was prevously on it non-destructively - this sometimes makes bad sectors go away , but at least you will have all the bad/unreadable sectors name in the dmesg to feed to the write-sector command.
Make sure the badblocks command is used offline (boot the thing from a usb drive or something with a live image and do the operations from that.
 
Old 06-21-2013, 12:24 AM   #4
Soapm
Member
 
Registered: Dec 2012
Posts: 180

Rep: Reputation: Disabled
I'd replace the disk of possible but my understanding is most disk have spare blocks and your manufacturers tool should remap around the bad sectors. However, bad sectors seem contagious and generally means doom is on the way so I would look to replace the disk as soon as possible.
 
Old 06-21-2013, 12:35 AM   #5
tonj
Member
 
Registered: Sep 2008
Posts: 301

Original Poster
Rep: Reputation: 22
thanks for your response. I did smartctl --attributes /dev/sda which gave me:
Quote:
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 046 Pre-fail Always - 157694
2 Throughput_Performance 0x0005 100 100 030 Pre-fail Offline - 47316992
3 Spin_Up_Time 0x0003 100 100 025 Pre-fail Always - 1
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 4864
5 Reallocated_Sector_Ct 0x0033 100 100 024 Pre-fail Always - 0 (2000, 0)
7 Seek_Error_Rate 0x000f 100 100 047 Pre-fail Always - 4046
8 Seek_Time_Performance 0x0005 100 100 019 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 060 060 000 Old_age Always - 20110
10 Spin_Retry_Count 0x0013 100 100 020 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2716
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 167
193 Load_Cycle_Count 0x0032 085 085 000 Old_age Always - 319835
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 44 (Min/Max 6/60)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 914
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 (0, 6469)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000f 100 100 060 Pre-fail Always - 12427
203 Run_Out_Cancel 0x0002 100 100 000 Old_age Always - 429581793769
240 Transfer_Error_Rate 0x003e 200 200 000 Old_age Always - 0
I couldn't find the badblocks command you spoke of. I was hoping there might be a tool I could run on the system that would repair this error in place. It's worth mentioning that I don't think this is a failing drive problem. I got it after restoring an image to the drive, and because the drive wasn't 'exacttly' the same size as the image expected I got this error.
 
Old 06-21-2013, 02:38 PM   #6
jefro
LQ Guru
 
Registered: Mar 2008
Posts: 13,138

Rep: Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665Reputation: 1665
Any reason you don't want to try the factory diags?
 
Old 06-21-2013, 02:45 PM   #7
tonj
Member
 
Registered: Sep 2008
Posts: 301

Original Poster
Rep: Reputation: 22
yeah it means rebooting the server into the OEM hard drive diags and that means server downtime = no email, website and other important functions.
 
Old 06-21-2013, 03:25 PM   #8
frieza
Senior Member
 
Registered: Feb 2002
Location: harvard, il
Distribution: Ubuntu 11.4,DD-WRT micro plus ssh,lfs-6.6,Fedora 15,Fedora 16
Posts: 3,113

Rep: Reputation: 372Reputation: 372Reputation: 372Reputation: 372
Quote:
Originally Posted by tonj View Post
yeah it means rebooting the server into the OEM hard drive diags and that means server downtime = no email, website and other important functions.
sometimes server downtime is unavoidable, pick a time durring off-peak usage and do it, first and foremost, start backing up the data now
either way, a stitch in time saves nine as the saying goes, if the hard drive is failing you should know because how much downtime do you think a dead hard drive is going to cost you?
 
Old 06-21-2013, 03:41 PM   #9
tonj
Member
 
Registered: Sep 2008
Posts: 301

Original Poster
Rep: Reputation: 22
I understand your point about server downtime sometimes being unavoidable but I have full image backups and like I said in an earlier post, I don't think this is a failing drive problem. I got it after restoring an image to the drive, and because the drive wasn't 'exacttly' the same size as the image expected I got this error. Plus I tested the drive before using it and it was 100%, so for the meantime I'd like to hang on for any way to fix this in place.
 
Old 06-21-2013, 03:53 PM   #10
frieza
Senior Member
 
Registered: Feb 2002
Location: harvard, il
Distribution: Ubuntu 11.4,DD-WRT micro plus ssh,lfs-6.6,Fedora 15,Fedora 16
Posts: 3,113

Rep: Reputation: 372Reputation: 372Reputation: 372Reputation: 372
the catch however is that the kind of checks that seem to be necessary would require a lower level access to the drive than is perhaps possible while there is data on the drive in use, as it would be a risk of corrupting said data, this is the same reaon you can't fsck a mounted volume, data can be corrupted if it's being changed as it's being scanned. If it were my server I'd just bite the bullet and take it off line.
 
Old 06-21-2013, 05:44 PM   #11
Soapm
Member
 
Registered: Dec 2012
Posts: 180

Rep: Reputation: Disabled
I don't think this has anything to do with restoring an image, this is low level harddrive. If the drive is hot-swappable you can pull it and use another machine to run th diagnostics but the remapping is done in the hard drives firmware in my understanding.

Code:
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Offline uncorrectable sectors angelo.c Red Hat 4 01-08-2012 03:53 AM
re-allocated sectors count shows 2 bad sectors, in Ubuntu 10.04 disk utility james2b Linux - Hardware 4 10-12-2010 11:16 PM
S.M.A.R.T. message: Device: /dev/sda, 1 Offline uncorrectable sectors Glassious SUSE / openSUSE 3 12-28-2007 03:04 PM
uncorrectable sectors detected lord-fu Fedora 6 08-06-2007 08:18 AM
Fedora FC5 SATA Offline uncorrectable/unreadable sectors detected ArchW Linux - Hardware 1 07-11-2006 07:15 AM


All times are GMT -5. The time now is 01:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration