LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Reply
 
Search this Thread
Old 06-21-2011, 05:41 PM   #1
lgtrean
LQ Newbie
 
Registered: Jan 2006
Location: New York City
Distribution: Debian GNU/Linux 6.0.1 (squeeze)
Posts: 29

Rep: Reputation: 18
how do I check my hard disk for errors. possible hard disk failure


Hello,

I was using Terminal and browsing a directory in my home folder. My "home" directory is located on "/dev/sdb1".
When in Terminal I typed "ls" in one of my directories and the output was garbage. The output didn't show the files in the directory. I think it said something like, "input/output error". Unfortunately, I didn't write the exact error down. Instead I rebooted.

The hard disk with the problem is:
Code:
$ sudo hdparm -I /dev/sdb
[sudo] password for brian: 

/dev/sdb:

ATA device, with non-removable media
	Model Number:       WDC WD5000KS-00MNB0                     
	Serial Number:      WD-WCANU1019633
	Firmware Revision:  07.02E07
Standards:
	Supported: 7 6 5 4 
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  976773168
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:      476940 MBytes
	device size with M = 1000*1000:      500107 MBytes (500 GB)
	cache/buffer size  = 16384 KBytes
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, with device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 8
	Recommended acoustic management value: 128, current value: 128
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	    	SET_MAX security extension
	   *	Automatic Acoustic Management feature set
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	WRITE_{DMA|MULTIPLE}_FUA_EXT
	   *	64-bit World wide name
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Host-initiated interface power management
	   *	Phy event counters
	   *	DMA Setup Auto-Activate optimization
	   *	Software settings preservation
	   *	SMART Command Transport (SCT) feature set
	   *	SCT Long Sector Access (AC1)
	   *	SCT LBA Segment Access (AC2)
	   *	SCT Error Recovery Control (AC3)
	   *	SCT Features Control (AC4)
	   *	SCT Data Tables (AC5)
	    	unknown 206[12] (vendor specific)
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
	not	supported: enhanced erase
	138min for SECURITY ERASE UNIT. 
Logical Unit WWN Device Identifier: 50014ee20002257a
	NAA		: 5
	IEEE OUI	: 0014ee
	Unique ID	: 20002257a
Checksum: correct
uname output:
Code:
$ uname -r
2.6.32-5-amd64
lsb_release output:
Code:
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 6.0.1 (squeeze)
Release:	6.0.1
Codename:	squeeze
During the reboot my computer was unable to mount my "home" directory located on "/dev/sdb1".
But, I was able to see my other devices. During the reboot I saw a message that said something like "fsck unable to resolve: 'UUID=0f24fae1-135c-4750-9928-4632e2f04f45'". That's the UUID of my "home" directory located on "/dev/sdb1".

fstab output:
Code:
$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# / was on /dev/sda2 during installation
UUID=f13fe524-b8ae-4f35-8831-9ba9e9db2dfa /               ext4    errors=remount-ro 0       1
# /home was on /dev/sdb1 during installation
UUID=0f24fae1-135c-4750-9928-4632e2f04f45 /home           ext4    defaults        0       2
# /wd500 was on /dev/sdc1 during installation
UUID=1fba7d0c-e82c-4837-b1bd-6192d7dd3f88 /wd500          ext4    rw,user,exec        0       2
# /wdgiga was on /dev/sdc3 during installation
UUID=3baa432d-3480-402d-8df3-1b90dbc5f655 /wdgiga         ext4    rw,user,exec        0       2
# /wdtera was on /dev/sdc2 during installation
UUID=9a762875-5e0d-4edf-8bdd-8aaaea6403d5 /wdtera         ext4    rw,user,exec        0       2
# /xtraSpace was on /dev/sda3 during installation
UUID=654d5c57-129b-4e06-92f1-673a8b4bcf56 /xtraSpace      ext4    defaults        0       2
# swap was on /dev/sda1 during installation
UUID=f7fc69af-d475-44f1-87dd-63eb8ca0b7ed none            swap    sw              0       0
/dev/scd1       /media/cdrom0   udf,iso9660 user,noauto     0       0
/dev/scd0       /media/cdrom1   udf,iso9660 user,noauto     0       0
#/dev/sdc1       /media/usb0     auto    rw,user,noauto  0       0
#/dev/sdc2       /media/usb1     auto    rw,user,noauto  0       0
#/dev/sdc3       /media/usb2     auto    rw,user,noauto  0       0

I was able to boot but had no home directory (all my stuff was backed up). I decided to reboot using the SystemRescueCd (www.sysresccd.org). I ran "FSArchiver: Filesystem Archiver for Linux". You can see from the output that it didn't see my
"home" directory (which usually would mount on "/dev/sdb1").

The output is below:
Code:
=====================>>> fsarchiver probe simple <<<=====================
[======DISK======] [=============NAME==============] [====SIZE====] [MAJ] [MIN]
[sda            ] [WDC WD800JD-75MS              ] [    74.51 GB] [  8] [  0]
[sdb            ] [My Book 1130                  ] [    1.82 TB] [  8] [ 16]

[=====DEVICE=====] [==FILESYS==] [======LABEL======] [====SIZE====] [MAJ] [MIN]
[loop0          ] [squashfs  ] [<unknown>        ] [  265.55 MB] [  7] [  0]
[sda1            ] [swap      ] [<unknown>        ] [    1.53 GB] [  8] [  1]
[sda2            ] [ext4      ] [<unknown>        ] [    14.63 GB] [  8] [  2]
[sda3            ] [ext4      ] [<unknown>        ] [    58.35 GB] [  8] [  3]
[sdb1            ] [ext4      ] [wd500            ] [  499.37 GB] [  8] [ 17]
[sdb2            ] [ext4      ] [<unknown>        ] [    1.33 TB] [  8] [ 18]
[sdb3            ] [ext4      ] [<unknown>        ] [    1.00 GB] [  8] [ 19]
I also ran gparted but it didn't list my home directory. So, I figured my "home" directory (which usually would mount on "/dev/sdb1") was dead so I bought a replacement hard drive.

Then, I rebooted without using the SystemRescueCd and I saw this message scroll by, "/home: recovering journal".
I also saw that message when I looked in /var/log/fsck/checkfs:
Code:
$ cat /var/log/fsck/checkfs 
Log of fsck -C -R -A -a 
Tue Jun 21 15:51:12 2011

fsck from util-linux-ng 2.17.2
/dev/sda3: clean, 205/3825664 files, 15093148/15295744 blocks
wd500: clean, 201315/32727040 files, 118986212/130905644 blocks
/home: recovering journal
/dev/sdc3: clean, 12/65808 files, 12660/263064 blocks
/dev/sdc2: clean, 241520/89300992 files, 275299582/357201258 blocks
/home: Clearing orphaned inode 38781677 (uid=1000, gid=1000, mode=0100644, size=32768)
/home: Clearing orphaned inode 38780933 (uid=1000, gid=1000, mode=0100600, size=77192)
/home: clean, 202121/61063168 files, 119022216/122096000 blocks

Tue Jun 21 15:52:02 2011
----------------
And when I logged in my "home" directory located on "/dev/sdb1" was alive. Here's the current output of my disk space usage:
Code:
 
$ df -H
Filesystem             Size   Used  Avail Use% Mounted on
/dev/sda2               16G    12G   3.3G  79% /
tmpfs                  1.9G      0   1.9G   0% /lib/init/rw
udev                   1.9G   246k   1.9G   1% /dev
tmpfs                  1.9G   4.1k   1.9G   1% /dev/shm
/dev/sdb1              493G   472G    21G  96% /home
/dev/sdc1              528G   471G    57G  90% /wd500
/dev/sdc3              1.1G    35M   1.1G   4% /wdgiga
/dev/sdc2              1.5T   1.2T   336G  77% /wdtera
/dev/sda3               62G    61G   830M  99% /xtraSpace

Below is very truncated output of a small portion from /var/log/messages that might be referring to the device that had problems ("/home on /dev/sdb1"). I don't know if it will be useful:
Code:
Jun 21 12:03:12 kub nagios3: Auto-save of retention data completed successfully.
Jun 21 12:31:24 kub kernel: [59330.816096] ata4: hard resetting link
Jun 21 12:31:29 kub kernel: [59336.180017] ata4: link is slow to respond, please be patient (ready=0)
Jun 21 12:31:34 kub kernel: [59340.828034] ata4: hard resetting link
Jun 21 12:31:39 kub kernel: [59346.188034] ata4: link is slow to respond, please be patient (ready=0)
Jun 21 12:31:44 kub kernel: [59350.836041] ata4: hard resetting link
Jun 21 12:31:49 kub kernel: [59356.196016] ata4: link is slow to respond, please be patient (ready=0)
Jun 21 12:32:19 kub kernel: [59385.876034] ata4: limiting SATA link speed to 1.5 Gbps
Jun 21 12:32:19 kub kernel: [59385.876039] ata4: hard resetting link
Jun 21 12:32:24 kub kernel: [59390.900032] ata4.00: disabled
Jun 21 12:32:24 kub kernel: [59390.900040] ata4.00: device reported invalid CHS sector 0
Jun 21 12:32:24 kub kernel: [59390.900044] ata4.00: device reported invalid CHS sector 0
Jun 21 12:32:24 kub kernel: [59390.900062] ata4: EH complete
Jun 21 12:32:24 kub kernel: [59390.900090] sd 3:0:0:0: [sdb] Unhandled error code
Jun 21 12:32:24 kub kernel: [59390.900093] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 21 12:32:24 kub kernel: [59390.900099] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 26 c8 48 1f 00 01 00 00
Jun 21 12:32:24 kub kernel: [59390.900139] sd 3:0:0:0: [sdb] Unhandled error code
Jun 21 12:32:24 kub kernel: [59390.900142] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 21 12:32:24 kub kernel: [59390.900146] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 26 c8 47 1f 00 01 00 00
Jun 21 12:32:24 kub kernel: [59390.998653] sd 3:0:0:0: [sdb] Unhandled error code
Jun 21 12:32:24 kub kernel: [59390.998659] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 21 12:32:24 kub kernel: [59390.998665] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 0a a2 13 2f 00 00 08 00
Jun 21 12:32:30 kub kernel: [59396.804145] sd 3:0:0:0: [sdb] Unhandled error code
Jun 21 12:32:30 kub kernel: [59396.804151] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 21 12:32:30 kub kernel: [59396.804157] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 11 f2 46 bf 00 00 08 00
Jun 21 12:32:30 kub kernel: [59396.804181] lost page write due to I/O error on sdb1
Jun 21 12:32:30 kub kernel: [59396.804201] JBD2: Detected IO errors while flushing file data on sdb1-8
Jun 21 12:32:30 kub kernel: [59396.804213] sd 3:0:0:0: [sdb] Unhandled error code
Jun 21 12:32:30 kub kernel: [59396.804337] sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jun 21 12:32:30 kub kernel: [59396.804342] sd 3:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 31 77 00 00 08 00
Jun 21 12:32:30 kub kernel: [59396.804363] lost page write due to I/O error on sdb1
My question is what should I do now? Using Linux how do I check my hard disk for errors?

Thank you for your advice.
 
Old 06-21-2011, 05:44 PM   #2
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Hanover, Germany
Distribution: Gentoo
Posts: 15,357
Blog Entries: 2

Rep: Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980
Download the manufacturers diagnosis tool, burn it to CD and test the drive.
 
Old 06-21-2011, 06:25 PM   #3
John VV
Guru
 
Registered: Aug 2005
Posts: 12,602

Rep: Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677Reputation: 1677
a western digital drive
the only problem is there tool is MS Windows ONLY
install win7 then use there tool to test .

"/." and "ars tech" and i think "phoronix" had news on that a while back .
 
Old 06-21-2011, 06:30 PM   #4
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Hanover, Germany
Distribution: Gentoo
Posts: 15,357
Blog Entries: 2

Rep: Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980Reputation: 3980
Quote:
Originally Posted by John VV View Post
the only problem is there tool is MS Windows ONLY
Wrong, you can get the bootable ISO with a DOS based test utility here.
No need for Windows at all.
 
Old 06-22-2011, 08:15 AM   #5
Soadyheid
Member
 
Registered: Aug 2010
Location: Near Edinburgh, Scotland
Posts: 720

Rep: Reputation: 121Reputation: 121
@ JohnVV
Quote:
the only problem is there tool is MS Windows ONLY
install win7 then use there tool to test .
Sorry John, I don't mean to be cheeky or rude... Text speak causes problems in understanding posts, so does using the wrong word.
Code:
there = over there, their = belonging to them, they're = 'they are'
Play Bonny!
 
Old 06-22-2011, 09:11 AM   #6
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
You can run either the manufacturers utility:
http://www.ultimatebootcd.com/

or you can run 'smartctl -t long /dev/sdb', wait for it to finish, then check the result.

Personally, if I see errors like that and have never run SMART tests and the drive is very young or old, I would backup data ASAP.
 
Old 06-22-2011, 04:06 PM   #7
lgtrean
LQ Newbie
 
Registered: Jan 2006
Location: New York City
Distribution: Debian GNU/Linux 6.0.1 (squeeze)
Posts: 29

Original Poster
Rep: Reputation: 18
Thank you everyone.

I downloaded the Data Lifeguard Diagnostic for DOS (CD) from wdc.com and ran all the tests. The disk passed the tests.

I wonder what caused the problem to begin with.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
check hard disk for hardware errors cccc Linux - Hardware 6 09-05-2010 12:35 PM
Copy of IBM Server hard disk data to Another USB External Hard disk mazharcdn Linux - Server 2 09-02-2009 12:41 AM
Hard Disk Failure: How do I move data to a new hard disk drive? spyros Linux - Software 2 10-31-2008 03:01 PM
SATA Hard disk(Windows) MBR corrupted by IDE hard disk(Linux) Peter_APIIT Fedora 6 07-07-2007 12:20 AM
60GB laptop hard disk & 200GB external USB hard disk linux compatibility powah Linux - Hardware 0 03-07-2006 10:55 AM


All times are GMT -5. The time now is 06:41 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration