LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices



Reply
 
Search this Thread
Old 10-27-2010, 01:47 PM   #1
hsugawar
LQ Newbie
 
Registered: Aug 2006
Posts: 17

Rep: Reputation: 0
SMARTD reported disk sector read error


2 months ago, I built a dedicated backup server using 4 2TB SATA drives for software RAID5 (6TB total). A few days ago, smartd sent out mail saying it found an unreadable sector on one of the drives. I ran selftest using smartctl and the error persisted. So far, neither software RAID nor ext4fs running on top of the RAID volume have reported errors.

What should I do for the best now?
MOST OPTIMISTIC: It is normal to have a few bad sectors among billions. Software RAID takes care of alternatives. Keep using the system until RAID spits a more serious warning in the future.

MOST PESSIMISTIC: It is a really bad sign to lead a disaster. No good disk drive should have bad sectors especially when it is only 2 months old. The error is simply not detected by software RAID and ext4. Replace the drive immediately and return the bad drive to the supplier.

I will welcome any useful suggestion. Thanks.
hiro
 
Old 10-27-2010, 05:39 PM   #2
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Hanover, Germany
Distribution: Main: Gentoo Others: What fits the task
Posts: 15,798
Blog Entries: 2

Rep: Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201Reputation: 4201
Most pessimistic is the way to go. Ecery drive has spare sectors to use if a bad sector is found. The report from SMART usually only comes up, if there are no more spare sectors free. Replace the drive.
 
Old 10-29-2010, 09:54 AM   #3
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272
Bad sectors are often the first sign of failure, and if it is 2 months old, send it back for a replacement.
 
Old 11-02-2010, 12:55 PM   #4
hsugawar
LQ Newbie
 
Registered: Aug 2006
Posts: 17

Original Poster
Rep: Reputation: 0
Thanks for the replies

I examined the smartctl report a little bit more carefully and found the sector read error is "pending." "Pending" seems to mean "sector replacement is delayed until a write-error is detected on the sector."

Further long self-tests "completed without error." How should I interpret this?

hiro
 
Old 11-02-2010, 01:21 PM   #5
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
Quote:
Originally Posted by hsugawar View Post
I examined the smartctl report a little bit more carefully and found the sector read error is "pending." "Pending" seems to mean "sector replacement is delayed until a write-error is detected on the sector."
Correct; bad block remapping only happens when the block is written. SMART has detected that the block is bad and is reporting it. Your problem is that you don't know how significant the data on that block is. A fuller explanation here. It is possible, but non-trivial to find out where the block is and hence its significance to you. Procedure detailed here. If that's too much to take on or you have a good backup of all the files you can use the HDD manufacturer's utility to fix it -- and take the risk that the bad block held something important to you.
 
Old 11-02-2010, 08:26 PM   #6
hsugawar
LQ Newbie
 
Registered: Aug 2006
Posts: 17

Original Poster
Rep: Reputation: 0
catkin,

Thank you for the good suggestions. Yes, I had read the smartmontools article before coming here. Yeah, it's not trivial, and I wondered if there might be an easy solution.

Below is the dumps for the worried drive. The bad sector seems to lie at the beginning of /dev/sd3 which is a part of a software RAID5 volume (/dev/md2). Fortunately, this volume is only for a backup repository file system and it can be easily set off-line (unmounted). So, I think running a simple program like the following on the first few thousand sectors can detect and trigger auto-remapping of the disk. Do you think I am correct?

fd = open("/dev/sda3", O_RDWR);
for (i = 0; i < 10000; i++) {
n = read(fd, buf, 512);
if (!n) break;
if (n != 512) {
fprintf(stderr, "Bad sector (%d)\n", i);
lseek(fd, -512, SEEK_CUR);
write(fd, buf, 512);
}
}

Thanks,
hiro

[root@shadow ~]# smartctl -l selftest /dev/sda
smartctl 5.39.1 2010-01-28 r3054 [i386-redhat-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 1515 -
# 2 Extended offline Completed without error 00% 1499 -
# 3 Extended offline Completed: read failure 80% 1473 98047576
# 4 Extended offline Aborted by host 10% 1472 -
# 5 Short offline Completed without error 00% 1446 -

[root@shadow ~]# fdisk -lu /dev/sda

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00042339

Device Boot Start End Blocks Id System
/dev/sda1 * 63 385559 192748+ fd Linux raid autodetect
/dev/sda2 385560 98044694 48829567+ fd Linux raid autodetect
/dev/sda3 98044695 3893609789 1897782547+ fd Linux raid autodetect
/dev/sda4 3893610496 3907028991 6709248 5 Extended
/dev/sda5 3893614592 3894638591 512000 82 Linux swap / Solaris
 
Old 11-03-2010, 09:49 AM   #7
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
I don't know whether reading the bad sector would return an error -- but it would be interesting to find out

The writes should trigger bad-sector mapping but what about the block contents? Is there any guarantee they would be valid? Might it be safer to remove the affected drive, re-initialise it and let RAID 5 re-load it with valid data? Or is it OK to let the RAID 5 correct the possibly invalid sector? I'm no RAID 5 expert.
 
Old 11-03-2010, 03:08 PM   #8
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272Reputation: 1272
If a smart long test came up clean, the drive may be ok. I pretty sure the SMART long test WILL detect bad blocks, the short one will not.

Some useful info:
http://smartmontools.sourceforge.net/badblockhowto.html
 
Old 11-11-2010, 06:21 PM   #9
hsugawar
LQ Newbie
 
Registered: Aug 2006
Posts: 17

Original Poster
Rep: Reputation: 0
catkin and H_TeXMeX_H,

Thank you very much for the very useful comments.

I tried another long test on the drive and it completed successfully. So, for the time being, I optimistically assume that the fault was temporary or sector remapping is already in effect.

Yes, it would be far better to let SMART perform a write attempt on the suspicious sector than writing my own program. It will be just pulling off the SATA cable for a while and reconnect it. Then the MD daemon should start reconstructing the MD data structures possible with an update on the subject sector.

I will try the let-MD approach if I find the sector read error persists.

Thanks a lot!!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hard disk sector error trebek Linux - Hardware 8 04-01-2008 03:13 PM
PNY 512M Memory stick error- cannot read sector 0 hal8000b Linux - Hardware 1 12-16-2007 01:16 AM
Error for command...read sector error? Help, please! woms14 Solaris / OpenSolaris 1 07-07-2007 04:36 AM
Smartd Error Message generated by Smartd Daemon Proces rexjenny Red Hat 1 11-29-2006 08:12 PM
Possible hard drive problem reported by smartd tron_thomas Linux - Hardware 3 04-22-2006 07:27 PM


All times are GMT -5. The time now is 02:56 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration