LinuxQuestions.org - hard drive with bad read/write caching

- Slackware (https://www.linuxquestions.org/questions/slackware-14/)

- - hard drive with bad read/write caching (https://www.linuxquestions.org/questions/slackware-14/hard-drive-with-bad-read-write-caching-476488/)

hard drive with bad read/write caching

Hi all,

All of the sudden this week on of my hard drives is working really slow, it's a 200gig drive using ext3 format with 4k clusters. It has swap, then remainder is the linux partition, no windows partitions. There's no windows on that pc :D so no scandisk like tool.

I noticed the issue when moving a large qemu file over to it, it was brutally slow, only about 1 to 2 mb/s transfer rate when using midnight commander.

Hdparm show's:

/dev/hdb (the bad drive? 200gig):
Timing cached reads: 1648 MB in 2.00 seconds = 824.60 MB/sec
Timing buffered disk reads: 40 MB in 3.07 seconds = 13.02 MB/sec
# hdparm -tT /dev/hda

/dev/hda (a good drive on same cable of that pc - 80gig):
Timing cached reads: 1648 MB in 2.00 seconds = 824.66 MB/sec
Timing buffered disk reads: 136 MB in 3.02 seconds = 44.97 MB/sec

hdparm /dev/hdb
/dev/hdb:
multcount = 0 (off)
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 24321/255/63, sectors = 390721968, start = 0

There both ata 100 drives western digital. Hdparm show's it configured for udma5 as the drive should be, with write caching enabled, etc.

It's just started doing this a few days ago, and I fear the drive is going to take a dive on me. Other than use the pc for routine stuff, nothing has changed on it all related to modules, the kernel, etc.

I just reformated it to be reiserfs and getting the same performance.

My kernel config is basically slack testings for ide drivers, etc which was the same as the stock kernel. Tho pat's default kernel in slackware put the multcount at 16 and my kernel does not. I fear to put the drive onto multcount tho if the kernel does not do it automatically.

I even get these same numbers if I boot up into 2.6.13 kernel too.

Maybe I need to do a 'surface scan', is there a gnu tool for that? Any thoughts greatly appreciated.

Thanks in advance.

badblocks can be used to scan for, well, bad blocks, but each fstype can use this information in a different way so it's really only useful in itself for your own information.

With fsck-ext3 you want to use the -c option to mark any bad blocks found, but If your HDD is slower overall then I don't think this is a bad blocks issue.

hi ciotog,

thanks for the reply.

i converted the drive back to ext3 and tried as you said, still very slow.

i even went into data lifeguard tools from western digital and ran their 'quick test' and 'extended test' and I get no errors. which really does not surprise me, as I've really only found over the years that western digital tools only tell you the drive is dead, and well that you can usually tell as the pc usually barely posts if that. tho, i've never had any great prevention success with maxtor's max-blast either to be honest.

interestingly enough I tried powerquest drive image in dos mode to reformat this drive, and it gives an error code when it's formatting the drive near the end of the drive, about 90% into formatting, regardless of fat32, ntfs, linux ext2 or ext3.

i threw pupply linux live cd on the pc too and i get the same issues. i figured i'd try a different distro for kicks and giggles.

i threw in a new cable too.

i'm now coming to my conclusion on this like you pointed out, the fact that the drive is slower shows something else going on.

and it figures i'm just out of warranty on this drive :(

at least i found out before it totally died so i could get my data off of it.

Quote:

Originally Posted by Old_Fogie

hi ciotog,
and it figures i'm just out of warranty on this drive :(

Don't they always seem to do that these days.

I had one of the infamous Fujitsu drives that caused them to drop out of the desktop HDD business, and when it was failing it passed their testing software too. I had to install Windows, boot it twice (it would boot the first time) so that the data got corrupted (at this time Linux still booted fine, but it was evident that any file that wasn't read-only became corrupted), and then it failed the test.

If the drive supports SMART, you could try running smartctl (for example "smartctl --all /dev/hdb"). The "raw_value" column under the attributes chart should tell you if there were any failures, and how bad it is.

wow ciotog that's a really neat trick that smartcl I like that one, that made it's way to my notebook in my kjots :D

Hard drives never cease to amaze me, how they just ...flat out will just die or bring the system to a halt, they're always different but such a pain.

What confuses me more is how 'hard drive cables' go bad. I used to think people were nuts, as I'd see people posting error code 9 for a hard drive on a windows site, and someone would say 'change your cable'. But those die out a year or so give or take too, and I don't understand it. I'm from the old school where a wire was a wire, metal and all, no moving parts, that confuses me even more LOL.

Well problem solved a new hard drive and back to as normal as I can be. Heheh, time to go break some more stuff now :D