LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 05-10-2014, 12:53 PM   #1
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 788

Rep: Reputation: 43
Bad Sector won't go away


On a Debian system - Unstable, AMD 10 6800K CPU, 4 GB RAM, 2 x 160GB SATA HDDS, 2 SATA DVD writers - I am getting the daily message:

Quote:
The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors

Device info:
Maxtor 6L160M0, S/N:L407W4QH, FW:BANC1E00, 163 GB
When I run a short test
Code:
smartctl -t short -d sat /dev/sda
and then look at the result with
Code:
smartctl -l selftest /dev/sda
I always get the same result:
Quote:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 48224 144701458
# 2 Short offline Completed: read failure 60% 48224 144701458
# 3 Short offline Completed: read failure 60% 48164 144701458
# 4 Short offline Completed: read failure 60% 48163 144701458
# 5 Short offline Completed: read failure 60% 48163 144701458
This is the partition table printed by fdisk:

Quote:
Command (m for help): p

Disk /dev/sda: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders, total 320173056 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0005fe80

Device Boot Start End Blocks Id System
/dev/sda1 * 63 39070079 19535008+ 83 Linux
/dev/sda2 39070080 320159384 140544652+ 5 Extended
/dev/sda5 39070143 164071844 62500851 83 Linux
/dev/sda6 164071908 203141924 19535008+ 83 Linux
/dev/sda7 203141988 310568579 53713296 83 Linux
/dev/sda8 310568643 315259559 2345458+ 83 Linux
/dev/sda9 315259623 320159384 2449881 82 Linux swap / Solaris
I am interpreting this that my problem lies in sda5.

However, booting with Knoppix and running
Code:
e2fsck -c -f -k -p /dev/sda5
only results in a message "Updating inode table" (or something similar) and the next day I get the SMART warning again.

Should e2fsck have cured this?

Help appreciated.
 
Old 05-10-2014, 01:35 PM   #2
Emerson
Senior Member
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~
Posts: 3,180

Rep: Reputation: Disabled
Your hard drive is a toast, it is not passing the test. Order a new one NOW. And make sure your backups are current.
 
Old 05-10-2014, 09:57 PM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,939

Rep: Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799
Please post the output from "smartctl -A /dev/sda". The problem could be a simple as a single bad sector which just needs to be written to so that the drive can reallocate it to a spare sector. That is _only_ going to happen when a write to that sector occurs unless at some point the drive does manage to get a correct read from that sector and so can reallocate it on its own. Bad sectors that are pending reallocation will cause some offline tests to fail.

Assuming that the problem is just some small number of bad sectors, the Bad Block HOWTO shows the procedure for finding them, determining what file they are (or are not) part of, and making the drive reallocate them. If there are just a small number of bad sectors and this number is not increasing with time, then the drive is OK to use. There are various events such as vibration or power supply glitches that can cause a sector to become bad without being a warning of impending doom.

Good backups are, of course, always important. Drives can and do fail without warning.
 
Old 05-10-2014, 10:20 PM   #4
Emerson
Senior Member
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~
Posts: 3,180

Rep: Reputation: Disabled
Code:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 48224 144701458
Do not get confused, the test was not completed, it is a failure, the drive is dead.
Code:
smartctl --all /dev/sda | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered"
Above is for sda, the info you should be looking at.
 
Old 05-11-2014, 02:08 AM   #5
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 788

Original Poster
Rep: Reputation: 43
Feedback as requested:

Quote:
Please post the output from "smartctl -A /dev/sda".
I am assuming that Item 5 is the problem which is why I am reluctant to just dump the drive on this basis. sda5 is /home which is backed up daily.

Code:
davcefai:/home/david# smartctl -A /dev/sda
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.14-1-686-pae] (local build)                   
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org               
                                                                                                 
Warning! SMART Attribute Thresholds Structure error: invalid SMART checksum.                             
=== START OF READ SMART DATA SECTION ===                                                                 
SMART Attributes Data Structure revision number: 16                                                      
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   207   205   063    Pre-fail  Always       -       10075
  4 Start_Stop_Count        0x0032   251   251   000    Old_age   Always       -       4903
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       1
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   247   232   187    Pre-fail  Always       -       54510
  9 Power_On_Minutes        0x0032   114   114   000    Old_age   Always       -       169h+38m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   242   242   000    Old_age   Always       -       4733
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   036   253   000    Old_age   Always       -       32
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       7920
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       1
198 Offline_Uncorrectable   0x0008   252   252   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0
202 Data_Address_Mark_Errs  0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Soft_ECC_Correction     0x000a   253   252   000    Old_age   Always       -       0
205 Thermal_Asperity_Rate   0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   239   239   000    Old_age   Offline      -       171
210 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
211 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0
212 Unknown_Attribute       0x0032   253   252   000    Old_age   Always       -       0

davcefai:/home/david#

Last edited by TobiSGD; 05-11-2014 at 09:58 PM. Reason: Mod-Edit: Changed quote-tags to code-tags
 
Old 05-11-2014, 03:33 AM   #6
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 788

Original Poster
Rep: Reputation: 43
@ Emerson

This is the output of the command you suggested. Does it look that bad that the drive needs to be dumped? OK, good excuse to get a bigger drive, meads I don't need to dump a lot of Beethoven to DVD :-)


Code:
davcefai:/home/david# smartctl --all /dev/sda | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered"
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       1
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       8312
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       1
198 Offline_Uncorrectable   0x0008   252   252   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
davcefai:/home/david#

Last edited by TobiSGD; 05-11-2014 at 09:59 PM. Reason: Mod-Edit: Changed quote-tags to code-tags
 
Old 05-11-2014, 08:06 AM   #7
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,939

Rep: Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799
The problem is #197, Current_Pending_Sector. That is just one bad sector, and the drive otherwise looks fine. A bad sector that is pending reallocation is visible to the OS (will cause an I/O error if read) and will cause the offline test to fail at that location. Follow the steps in the Bad Block HOWTO to get that sector reallocated. Parameter #5, Reallocated_Sector_Ct, should then increase to 2, and the offline tests should then pass. That drive hasn't been used much, just under 170 power-on hours, and you should expect it to have a normal lifetime.

The steps in the HOWTO aren't as hard as they look (it covers several different cases -- you will be concerned with just one), but if you don't want to do that, the ham-fisted approach would be to back up the files on the affected partition, clear the partition with "dd if=/dev/zero of=/dev/sda5 bs=64k", then remake the filesystem and restore the backup.

Of course if you just want a bigger disk, by all means go ahead and get one.

BTW, when you post output please use [CODE]...[/CODE] tags and not [QUOTE]...[/QUOTE] tags so that formatting is preserved.

Last edited by rknichols; 05-11-2014 at 08:12 AM. Reason: Add BTW
 
Old 05-11-2014, 11:05 AM   #8
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,874

Rep: Reputation: 458Reputation: 458Reputation: 458Reputation: 458Reputation: 458
The attributes look fine, except of course for the bad sector, which is not good. You could try zeroing the HDD like rknichols suggests as this may repair soft errors. Obviously backup before doing this.
 
Old 05-11-2014, 10:00 PM   #9
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 16,443
Blog Entries: 2

Rep: Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508Reputation: 4508
Quote:
Originally Posted by rknichols View Post
BTW, when you post output please use [CODE]...[/CODE] tags and not [QUOTE][/QUOTE] tags so that formatting is preserved.
Indeed, this will make your posts much more readable. I have fixed that for now.

Last edited by TobiSGD; 05-11-2014 at 10:02 PM.
 
Old 05-13-2014, 11:18 AM   #10
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 788

Original Poster
Rep: Reputation: 43
Apologies anf thanks for the format fix.
 
Old 05-13-2014, 11:37 AM   #11
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 788

Original Poster
Rep: Reputation: 43
I have tried following the Badblocks Howto but have run into a snag. Here follows a blow by blow account in the hope that somebody will point out where I went off the straight and narrow path.

Step 1: Find error:

Code:
davcefai:/home/david# smartctl -l selftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.14-1-686-pae] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       60%     48224         144701458
# 2  Short offline       Completed: read failure       60%     48224         144701458
# 3  Short offline       Completed: read failure       60%     48164         144701458
# 4  Short offline       Completed: read failure       60%     48163         144701458
Definitely at 144701458!

-----------------------------------------------------------------------------------------------
Step 2: Locate Partition where the error is:


Block number = 144701458 x 512 / 4096 = 18087682.25

Code:
davcefai:/home/david# fdisk -lu /dev/sda

Disk /dev/sda: 163.9 GB, 163928604672 bytes
255 heads, 63 sectors/track, 19929 cylinders, total 320173056 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0005fe80

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63    39070079    19535008+  83  Linux
/dev/sda2        39070080   320159384   140544652+   5  Extended
/dev/sda5        39070143   164071844    62500851   83  Linux
/dev/sda6       164071908   203141924    19535008+  83  Linux
/dev/sda7       203141988   310568579    53713296   83  Linux
/dev/sda8       310568643   315259559     2345458+  83  Linux
/dev/sda9       315259623   320159384     2449881   82  Linux swap / Solaris
davcefai:/home/david#
18087682 must be in sda1


Step 3: Find Mount Point and fs type

looking in /etc/fstab I find:

Code:
# /dev/sda1	= /
/dev/disk/by-uuid/c22032e6-9df4-4cc9-a1ff-9b2698b4a2b7 / ext3 nouser,defaults,errors=remount-ro,atime,auto,rw,dev,exec,suid 0 1
No surprises here (I think)

Step 4: Confirm the Block Size:

Code:
davcefai:/home/david#  tune2fs -l /dev/sda1 | grep Block
Block count:              4883752
Block size:               4096
Blocks per group:         32768
Ok, 4096 as assumed earlier,

Step 5: Now to locate the inode:

Code:
davcefai:/home/david# debugfs
debugfs 1.42.9 (4-Feb-2014)
debugfs:  open /dev/sda1
debugfs:  testb 18087682
Illegal block number passed to ext2fs_test_block_bitmap #18087682 for block bitmap for /dev/sda1
Block 18087682 not in use
debugfs:  testb 18087683
Illegal block number passed to ext2fs_test_block_bitmap #18087683 for block bitmap for /dev/sda1
Block 18087683 not in use
debugfs:
I don't know what the error message means. I would appreciate being told what I am doing wrong!
 
Old 05-13-2014, 12:48 PM   #12
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,939

Rep: Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799
LBA is in 512-byte sectors. "fdisk -u" gives addresses in 512-byte sectors. (The "Blocks" column shows 1024-byte blocks.) So, your bad block is in sda5, as you first suspected.

(144701458-39070143)/8 = 13203914.375

Block 13203914 of the filesystem, 3rd sector of that 4K block.
 
Old 05-13-2014, 01:34 PM   #13
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 788

Original Poster
Rep: Reputation: 43
Quote:
Block 13203914 of the filesystem, 3rd sector of that 4K block
Thanks for this. However, moving along, I get:
Code:
davcefai:/home/david# debugfs
debugfs 1.42.9 (4-Feb-2014)
debugfs:  open /dev/sda5
debugfs:  testb 13203914
Block 13203914 marked in use
debugfs:  icheck 13203914
Block   Inode number
13203914        <block not found>
debugfs:
Which rather puts a damper on the proceedings. icheck takes about half a minute to run, could it be timing out? I don't see how it can not find a block it has previously found with the testb command.

Could I trouble you a little longer?

Thanks.
 
Old 05-13-2014, 04:08 PM   #14
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 1,939

Rep: Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799Reputation: 799
Quote:
Originally Posted by davcefai View Post
Code:
davcefai:/home/david# debugfs
debugfs 1.42.9 (4-Feb-2014)
debugfs:  open /dev/sda5
debugfs:  testb 13203914
Block 13203914 marked in use
debugfs:  icheck 13203914
Block   Inode number
13203914        <block not found>
debugfs:
That means that the block is used by filesystem metadata, probably by some currently free inodes. The only way I know of for finding which one is to run
Code:
dumpe2fs /dev/sda5 | less
and page down through the listing until you see block numbers in that range, e.g.
Code:
Group 4: (Blocks 131072-163839)
  Block bitmap at 131072 (+0), Inode bitmap at 131073 (+1)
  Inode table at 131074-131584 (+2)
  19111 free blocks, 7493 free inodes, 88 directories
  Free blocks: 131597-131607, 132017-132020, 132057-132064, 132079, 132120, 132622, 137249, 137273, 137281-137364, 137657-137659, 137744, 137748-141311, 141313-142311, 142558, 146290-146327, 146589, 146649-146651, 146653-146705, 146745, 147384-147458, 147460-148648, 149383, 149971-154528, 154530-154921, 154926-154961, 154963-159743, 159747, 159751-159889, 159891-159903, 159905-160234, 160247, 160249-160260, 161033-163839
  Free inodes: 32717-32854, 32856-32936, 32938, 32962, 32966, 32982, 32988-33280, 33283, 33285-34542, 34544-34550, 34552-34554, 34556-34558, 34560-34682, 34687-34692, 35305-40880
Unfortunately, the program will probably die from an I/O error at that point, but hopefully you will be able to see the "Inode table at ..." line and can confirm that the bad sector is within that inode table. The inodes in that sector pretty much have to be free or else your e2fsck would have died with an I/O error, so it should be safe to zero them. First, to be absolutely certain you have the right sector run
Code:
hdparm --read-sector  144701458
If you do get the expected I/O error from that, zero it by running
Code:
hdparm --write-sector  144701458
That should make "smartctl -A /dev/sda" report "0" for the Current_Pending_Sector count, and the Reallocated_Sector_Ct will probably increase to "2". It would be best to run "e2fsck -f /dev/sda5" just to be sure you haven't stepped on something in use.

You did say you had backups for this filesystem, right?
 
1 members found this post helpful.
Old 05-13-2014, 04:30 PM   #15
davcefai
Member
 
Registered: Dec 2004
Location: Malta
Distribution: Debian Sid
Posts: 788

Original Poster
Rep: Reputation: 43
Quote:
You did say you had backups for this filesystem, right?
BackupPC, daily at 1500
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bad sector in HardDisk Arun Kurian Linux - Hardware 3 11-11-2013 03:19 PM
Scan bad sector ust Linux - Newbie 5 01-19-2012 04:51 AM
bad sector problem ahmed gamal Slackware 2 08-18-2008 06:38 PM
Bad sector woes :S kevingpo Fedora 4 07-07-2005 02:57 AM
bad sector in HDD ??? hitesh_linux Linux - General 2 06-20-2003 03:54 PM


All times are GMT -5. The time now is 09:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration