Four hard disk problems over four hard disks in a row.
Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Four hard disk problems over four hard disks in a row.
This is less a Linux question, but I had no idea where else I could get an answer on this. And besides, Linux has helped me to identify a lot of this problem.
I have had four hard disks in a row not work. First was the connectors were broken (Not the computer's fault.), second drive failed within two days. Third drive was a DoA. And now I'm on my fourth drive... and what was originally seeming like bad sectors is still giving me troubles and now it seems like it's having even more troubles than before. Twice in as many as three days Linux was forced to remount at LEAST my /home partition read-only because apparently it was having troubles.
So I have to wonder. What aspect of my hardware could cause hard disk failure? Because I refuse to believe I'd have this bad of luck on coincidence.
Most probable is your power supply. Either overloaded (which should trip the circuit breaker, but sometimes doesn't) or large fluctuations in the power (or frequency) supplied by your mains (Do you have a UPS between your system and the mains?), but. most often, a failing capacitor in the power supply. A new power supply, if that's the problem, should be fairly cheap and easy to install.
Sorry to hear about all the issues you've been experiencing with your system recently. Unfortunately, there are a great many possibilities.
(As you pointed out, the first hard drive had broken pins, so that doesn't really count; third drive being DOA--same thing)
How did the second drive fail? Did it physically make horrible noises/stop spinning, or simply become unusable to your computer?
With the fourth drive having to remount /home in read-only mode, within a short amount of time... interesting.
It's always possible you have a bad motherboard/system board. That could cause errors to be written to the disk, and you would not necessarily know about it until attempting to access the data later. It could be a bad integrated HD controller on the board, or another defective part in the I/O chain.
You could be connecting the replacement drives to a faulty IDE cable; have you tried replacing that?
A distinct possibility: are these drives being shipped to you via the same delivery company? You might very well have a local delivery person that practices his field goals with your nice, new HD boxes. I've seen it happen, although it certainly isn't the norm.
A long shot, but possible: are you sure you don't have the computer near a massive magnet of some sort? Like a really, really big speaker or unshielded CRT etc?
Is the computer new? If not, are there any other major changes to the computer that might coincide with the approximate start of your HD troubles (i.e. a move to a new room--or house, different electrical connections, additional nearby electrical equipment, or the addition of a new puppy prone to knocking things over) ?
Distribution: Mandriva 2009 X86_64 suse 11.3 X86_64 Centos X86_64 Debian X86_64 Linux MInt 86_64 OS X
Posts: 2,369
Rep:
Quote:
Originally Posted by Yaro
This is less a Linux question, but I had no idea where else I could get an answer on this. And besides, Linux has helped me to identify a lot of this problem.
I have had four hard disks in a row not work. First was the connectors were broken (Not the computer's fault.), second drive failed within two days. Third drive was a DoA. And now I'm on my fourth drive... and what was originally seeming like bad sectors is still giving me troubles and now it seems like it's having even more troubles than before. Twice in as many as three days Linux was forced to remount at LEAST my /home partition read-only because apparently it was having troubles.
So I have to wonder. What aspect of my hardware could cause hard disk failure? Because I refuse to believe I'd have this bad of luck on coincidence.
How did you conclude that it is youŕe hard drive ?
Normally a OS communicate with youŕe controller and not directly with youŕe drive .
Second a bat internal memory can give unpredictable errors.
Even a bat PSU can do the same.
In short are you sure that the rest of the components are good.
How can I determine if this is a PSU problem? I have some money to spare so I might be willing to upgrade the PSU again to meet my needs if I can determine that it is indeed the PSU.
The second drive was returning pretty much the same kernel-log errors and stopping my machine for several seconds so that the soft reset can be done. That drive was Seagate, but my last two were Western Digital.
I do not know if that other drive had bad sectors like this one. I do recall it had remounted /home read-only as well, naturally causing Pidgin to crash. (Pidgin is picky about being able to write to disk.)
As for defective motherboard/controllers, I'd like to know how to determine that, too. Motherboards tend to cost way more than PSUs and I'd definitely want to make absolutely certain it is the motherboard.
Different SATA cables. Same problem.
Well, the first three drives were ordered from NewEgg and delivered via Fedex. The fourth was bought locally at Wal-Mart, so the transport company (Myself) was quite reputable.
Well, I do have a subwoofer nearby, but I didn't have a 2.1 sound system until I got the fourth drive.
This computer was bought in 2007. I've made several hardware upgrades since then to the point the only hardware that came with it I still use is motherboard, processor, and one of my RAM sticks.
As for it possibly being a false positive, I did a BIOS SMART test and it keeps failing the read element. I believe that's because there's STILL bad sectors I haven't fixed, despite running a fsck -c /dev/sda5 (sda5 being the device block where the bad sectors are. And for those who don't know, the -c switch makes fsck transparently call badblocks for the intent of updating the bad sectors inode. A lot better than doing all this manually.)
Writing is no problem, except when my filesystems get remounted read only when something happens of course.
I believe that's because there's STILL bad sectors I haven't fixed, despite running a fsck -c /dev/sda5 (sda5 being the device block where the bad sectors are.
That's a read test - do it as "-c -c".
And go find something else to do for a few hours - maybe sleep ...
I get those "frozen" errors every time I boot this laptop. The messages result from the SATA driver "believing" the drive when it claims to support SATA-2 with 3Gb/sec transfer rates when, in fact, the drive will only support 1.5Gb/sec rates. Since the drive is sdb on the laptop, and I boot from sda the fact that GRUB can't "adjust" the parameters reported by the drive and, therefor, fails to boot from the drive, is of no consequence for my system. (Although I do keep a System Rescue image on a pen drive "just in case.")
The SATA library eventually "get tired" of the drive "barfing" at it, and lowers the access speed to 1.5Gb/sec and everything works well thereafter. (I've found no way to change the erroneous information reported by the HD. This is an HP laptop, and the Phoenix BIOS is written to conform with the HP, "No user could possibly need to do that!" specifications, so there are no HD tuning options in the BIOS.)
Anyhow, if you're using a SATA drive, I suspect (from the error messages you displayed) that you've got a similar problem. Although, if you have the problem on your boot drive, you might find the GRUB is unable to boot the drive and that things like fsck are also unreliable until the drive speed parameter is reset.
I get those "frozen" errors every time I boot this laptop. The messages result from the SATA driver "believing" the drive when it claims to support SATA-2 with 3Gb/sec transfer rates when, in fact, the drive will only support 1.5Gb/sec rates. Since the drive is sdb on the laptop, and I boot from sda the fact that GRUB can't "adjust" the parameters reported by the drive and, therefor, fails to boot from the drive, is of no consequence for my system. (Although I do keep a System Rescue image on a pen drive "just in case.")
The SATA library eventually "get tired" of the drive "barfing" at it, and lowers the access speed to 1.5Gb/sec and everything works well thereafter. (I've found no way to change the erroneous information reported by the HD. This is an HP laptop, and the Phoenix BIOS is written to conform with the HP, "No user could possibly need to do that!" specifications, so there are no HD tuning options in the BIOS.)
Anyhow, if you're using a SATA drive, I suspect (from the error messages you displayed) that you've got a similar problem. Although, if you have the problem on your boot drive, you might find the GRUB is unable to boot the drive and that things like fsck are also unreliable until the drive speed parameter is reset.
No, this is a full 7200 RPM, 3.0 G/s drive. Or so the box claims.
No, this is a full 7200 RPM, 3.0 G/s drive. Or so the box claims.
Yes, that's what my box claimed too. But either the BIOS or the box has a problem with the actual drive I installed in this laptop. (Perhaps your BIOS has an upgrade -- HP "declines" to offer BIOS upgrades to support "non Vista" systems on their "Vista certified" laptops.)
<edit>
Oh, by the way, if I boot the Vista on sda, no disk errors are reported for sdb, but I'm not sure if it's accessed correctly since Vista, of course, just reports it as an "Unformated Drive."
</edit>
Last edited by PTrenholme; 04-25-2009 at 01:04 PM.
Yes, that's what my box claimed too. But either the BIOS or the box has a problem with the actual drive I installed in this laptop. (Perhaps your BIOS has an upgrade -- HP "declines" to offer BIOS upgrades to support "non Vista" systems on their "Vista certified" laptops.)
<edit>
Oh, by the way, if I boot the Vista on sda, no disk errors are reported for sdb, but I'm not sure if it's accessed correctly since Vista, of course, just reports it as an "Unformated Drive."
</edit>
Can't say I've done BIOS upgrades, either.
And according to the specs of my motherboard I'm supposed to have support for 3.0 G/s drives.
And according to the specs of my motherboard I'm supposed to have support for 3.0 G/s drives.
Have you checked for a BIOS upgrade? Most m/b vendors have Web apps that can do the upgrade for you.
As to the "claims," those are often written by the sales department from specifications provided to the development group. I.e., they are often wishes, not reality.
Now, I'm not sure if my experience is related to your problems, but, as I noted above -- and illustrate below -- my errors seem similar to the excerpt you posted above. To illustrate my point, here's a couple listings from this laptop (edited to remove irrelevant and repetitive items (like DVD drive listing, etc.)
First, this is how lshw reports my two drives. Note that they look identical except for capacity;
Code:
$ sudo lshw -c disk
*-cdrom
...
*-disk:0
description: ATA Disk
product: FUJITSU MHZ2160B
vendor: Fujitsu
physical id: 0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: 8909
serial: K616T8325RMF
size: 149GiB (160GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=a602a602
*-disk:1
description: ATA Disk
product: Hitachi HTS54323
vendor: Hitachi
physical id: 1
bus info: scsi@1:0.0.0
logical name: /dev/sdb
version: FB4O
serial: 080621FB0400LEG5MTSA
size: 298GiB (320GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=00032256
Second, here's what happens when I boot a 64-bit Linux OS (Fedora, Ubuntu, or DSL. But maybe not DSL-it's been a while):
Code:
$ dmesg | grep ata2
ata2: SATA max UDMA/133 irq_stat 0x00400040, connection status changed irq 23
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133
ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata2.00: configured for UDMA/133
--- Block repeated three times . . .
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x380000 action 0x6 frozen
ata2.00: irq_stat 0x08000001, interface fatal error
ata2: SError: { 10B8B Dispar BadCRC }
ata2.00: cmd c8/00:80:08:00:00/00:00:00:00:00/e0 tag 0 dma 65536 in
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: configured for UDMA/133
ata2: EH complete
--- End of block
ata2: limiting SATA link speed to 1.5 Gbps
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { 10B8B Dispar }
ata2.00: cmd c8/00:80:e1:20:06/00:00:00:00:00/e0 tag 0 dma 65536 in
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up <unknown> (SStatus 103 SControl 310)
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x780000 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { 10B8B Dispar BadCRC Handshk }
ata2.00: cmd c8/00:80:e1:20:06/00:00:00:00:00/e0 tag 0 dma 65536 in
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: configured for UDMA/133
ata2: EH complete
After the above error messages, the 320Gb drive works without any more error messages.
I haven't encountered any more errors since I did the fsck -c -c... so I am going to wait maybe 24 hours and see what happens before marking this SOLVED.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.