"short read (fsck)" on disc access (failed command: READ FPDMA QUEUED)

estellnb · 02-14-2012, 10:40 AM

Whenever I try to access the home partition on sda the kernel spys out the following sequences of read errors (posted as attachement). The hard disk is completely newly purchased (Samsung SpinPoint 750GB) and could be formatted without any apparent error. Furthermore I do not seem to get this kind of error for the /var partition residing on sda:

"
ata4.00: failed command: READ FPDMA QUEUED
ata4.00: cmd 60/40:08:18:a0:d9/00:00:22:00:00/40 tag 1 ncq 32768 in
res 41/40:00:18:a0:d9/00:00:22:00:00/40 Emask 0x409 (media error) <F>
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4.00: configured for UDMA/133
ata4: EH complete
....
ata4.00: device reported invalid CHS sector 0
....
Buffer I/O error on device sda8, logical block 820227
"

Is the hard disk rotten or could there also be a problem with my SATA controller?

marozsas · 02-14-2012, 12:31 PM

Enabling S.M.A.R.T. on disk and running for a long test, you can decide if it is a error on disk on elsewhere.
The SMART test doesn't use the disk controller, so if it return an error for sure it is internal error on disk itself.
The SMART test can be run in a live system. Something like this (replace the device sda with your real device enumeration) :

Code:

smartctl -t long /dev/sda

Also, you can create again the filesystem on affected partition (losing all data on the current filesystem) using the option -c (or even better, the option -cc) of mkfs tools, checking every sector for read or read/write errors.

estellnb · 02-17-2012, 09:01 AM

Yes; both of my Samsung plates do in deed support SMART well.
After a > smartctl -s on /dev/sd[ab]
| smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
| Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
|
| === START OF ENABLE/DISABLE COMMANDS SECTION ===
| SMART Enabled.

you can let all smart parameters be shown to you

> smartctl -p show /dev/sda
No presets are defined for this drive. Its identity strings:
MODEL: SAMSUNG HN-M750MBB
FIRMWARE: 2AR10001
do not match any of the known regular expressions.
Use -P showall to list all known regular expressions.

>smartctl --all /dev/sda
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 098 051 Pre-fail Always - 103
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
...
>smartctl -a /dev/sdb
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 089 089 025 Pre-fail Always - 3414
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 37
...

You can see from the WORST field very well that there must already have occurred Read Errors on sda (098) but not so on sdb (100);- sda: 750GB Samsung Spinpoint, sdb: 640 GB Samsung Spinpoint. Fortunately SMART has also shown read errors having occurred before SMART activation in this case although I suspect that this normally isn`t guaranteed (Smartctl -t offline sdx should actualize attributes on ascended SMART so if not done automatically (offline attrs.).).

Read Errors are even more concisely revealed by selftests:

> smartctl -t short /dev/sda
and afterwards (given/outprinted time has passed):
> smartctl -t long /dev/sda
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 70% 16 584679424
# 2 Short offline Completed: read failure 90% 16 584679424

(Remaining seems to not change any more as soon as a test has been interrupted because of the encounterment of a bad sector (LBA_of_first_error). Tests in progress have not yet been displayed by 'smartctl -t long' to me, so have a look at your clock!

I believe the first thing to do with a new hard disk should be enabling smart and logging --all attributes for later comparence. Perhaps also a short selftest can ascertin oneself of not having bought a damaged hard disk.
The good SMART support of new Samsung HDDs comes to me as a pleasurable surprise as my old WDs have not shown the results of self tests correctly.

copy should also be available at elstel.org

selfprogrammed · 02-22-2012, 05:27 PM

I bought a new drive and it could not stand my controller. It passed all the SMART tests, but the diagnostics were useless. Returned it got another from the same pile. The drive had the same model number but the jumpers and circuit board were completely different. It worked perfectly.

You got a bad area on the disk.
To find out how big it is.
>> badblocks -n -v /dev/sda0

If the list gets too big, take it back to the dealer. Having a large number of bad blocks may mean
that there is a gouge on the disk, and it will likely spread because the debris is on the heads. Major drive failure will occur in a few months.

If you want to live with it, then let fsck do the the bad blocks test because it is tediously messy any other way. It will read every sector and isolate any that are unusable so they never bother you again. Some drives have spare sectors and can replace such bad blocks invisibly, but to you it does not matter unless you are really concerned about uninterrupted contiguous disk space.
see the
>> man fsck.ext2