LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 04-24-2009, 12:34 PM   #1
Yaro
Member
 
Registered: Nov 2008
Posts: 41

Rep: Reputation: 21
Four hard disk problems over four hard disks in a row.


This is less a Linux question, but I had no idea where else I could get an answer on this. And besides, Linux has helped me to identify a lot of this problem.

I have had four hard disks in a row not work. First was the connectors were broken (Not the computer's fault.), second drive failed within two days. Third drive was a DoA. And now I'm on my fourth drive... and what was originally seeming like bad sectors is still giving me troubles and now it seems like it's having even more troubles than before. Twice in as many as three days Linux was forced to remount at LEAST my /home partition read-only because apparently it was having troubles.

So I have to wonder. What aspect of my hardware could cause hard disk failure? Because I refuse to believe I'd have this bad of luck on coincidence.
 
Old 04-24-2009, 12:53 PM   #2
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Most probable is your power supply. Either overloaded (which should trip the circuit breaker, but sometimes doesn't) or large fluctuations in the power (or frequency) supplied by your mains (Do you have a UPS between your system and the mains?), but. most often, a failing capacitor in the power supply. A new power supply, if that's the problem, should be fairly cheap and easy to install.
 
Old 04-24-2009, 12:56 PM   #3
strick1226
Member
 
Registered: Feb 2005
Distribution: Arch, CentOS, Fedora, macOS, SLES, Ubuntu
Posts: 327

Rep: Reputation: 63
Hi, Yaro,

Sorry to hear about all the issues you've been experiencing with your system recently. Unfortunately, there are a great many possibilities.

(As you pointed out, the first hard drive had broken pins, so that doesn't really count; third drive being DOA--same thing)

How did the second drive fail? Did it physically make horrible noises/stop spinning, or simply become unusable to your computer?

With the fourth drive having to remount /home in read-only mode, within a short amount of time... interesting.

It's always possible you have a bad motherboard/system board. That could cause errors to be written to the disk, and you would not necessarily know about it until attempting to access the data later. It could be a bad integrated HD controller on the board, or another defective part in the I/O chain.

You could be connecting the replacement drives to a faulty IDE cable; have you tried replacing that?

A distinct possibility: are these drives being shipped to you via the same delivery company? You might very well have a local delivery person that practices his field goals with your nice, new HD boxes. I've seen it happen, although it certainly isn't the norm.

A long shot, but possible: are you sure you don't have the computer near a massive magnet of some sort? Like a really, really big speaker or unshielded CRT etc?

Is the computer new? If not, are there any other major changes to the computer that might coincide with the approximate start of your HD troubles (i.e. a move to a new room--or house, different electrical connections, additional nearby electrical equipment, or the addition of a new puppy prone to knocking things over) ?
 
Old 04-24-2009, 01:00 PM   #4
ronlau9
Senior Member
 
Registered: Dec 2007
Location: In front of my LINUX OR MAC BOX
Distribution: Mandriva 2009 X86_64 suse 11.3 X86_64 Centos X86_64 Debian X86_64 Linux MInt 86_64 OS X
Posts: 2,369

Rep: Reputation: Disabled
Quote:
Originally Posted by Yaro View Post
This is less a Linux question, but I had no idea where else I could get an answer on this. And besides, Linux has helped me to identify a lot of this problem.

I have had four hard disks in a row not work. First was the connectors were broken (Not the computer's fault.), second drive failed within two days. Third drive was a DoA. And now I'm on my fourth drive... and what was originally seeming like bad sectors is still giving me troubles and now it seems like it's having even more troubles than before. Twice in as many as three days Linux was forced to remount at LEAST my /home partition read-only because apparently it was having troubles.

So I have to wonder. What aspect of my hardware could cause hard disk failure? Because I refuse to believe I'd have this bad of luck on coincidence.
How did you conclude that it is youŕe hard drive ?
Normally a OS communicate with youŕe controller and not directly with youŕe drive .
Second a bat internal memory can give unpredictable errors.
Even a bat PSU can do the same.
In short are you sure that the rest of the components are good.
 
Old 04-24-2009, 01:50 PM   #5
Yaro
Member
 
Registered: Nov 2008
Posts: 41

Original Poster
Rep: Reputation: 21
How can I determine if this is a PSU problem? I have some money to spare so I might be willing to upgrade the PSU again to meet my needs if I can determine that it is indeed the PSU.

The second drive was returning pretty much the same kernel-log errors and stopping my machine for several seconds so that the soft reset can be done. That drive was Seagate, but my last two were Western Digital.

I do not know if that other drive had bad sectors like this one. I do recall it had remounted /home read-only as well, naturally causing Pidgin to crash. (Pidgin is picky about being able to write to disk.)

As for defective motherboard/controllers, I'd like to know how to determine that, too. Motherboards tend to cost way more than PSUs and I'd definitely want to make absolutely certain it is the motherboard.

Different SATA cables. Same problem.

Well, the first three drives were ordered from NewEgg and delivered via Fedex. The fourth was bought locally at Wal-Mart, so the transport company (Myself) was quite reputable.

Well, I do have a subwoofer nearby, but I didn't have a 2.1 sound system until I got the fourth drive.

This computer was bought in 2007. I've made several hardware upgrades since then to the point the only hardware that came with it I still use is motherboard, processor, and one of my RAM sticks.

As for it possibly being a false positive, I did a BIOS SMART test and it keeps failing the read element. I believe that's because there's STILL bad sectors I haven't fixed, despite running a fsck -c /dev/sda5 (sda5 being the device block where the bad sectors are. And for those who don't know, the -c switch makes fsck transparently call badblocks for the intent of updating the bad sectors inode. A lot better than doing all this manually.)

Writing is no problem, except when my filesystems get remounted read only when something happens of course.
 
Old 04-24-2009, 06:20 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Quote:
Originally Posted by Yaro View Post
I believe that's because there's STILL bad sectors I haven't fixed, despite running a fsck -c /dev/sda5 (sda5 being the device block where the bad sectors are.
That's a read test - do it as "-c -c".
And go find something else to do for a few hours - maybe sleep ...
 
Old 04-24-2009, 06:27 PM   #7
amani
Senior Member
 
Registered: Jul 2006
Location: Kolkata, India
Distribution: Debian 64-bit GNU/Linux, Kubuntu64, Fedora QA, Slackware,
Posts: 2,766

Rep: Reputation: Disabled
Post output of

#su
#lshw

#hdparm -I /dev/sda

etc

See
#dmesg
after boot


Use the parted magic or system rescue cd for testing drives
 
Old 04-24-2009, 08:54 PM   #8
Yaro
Member
 
Registered: Nov 2008
Posts: 41

Original Poster
Rep: Reputation: 21
lshw has way too much output. Give me something specific.

Here's the hdparm output:

Code:
/dev/sda:

ATA device, with non-removable media
	Model Number:       WDC WD5000AACS-00G8B1                   
	Serial Number:      WD-WCAUH0307513
	Firmware Revision:  05.04C05
	Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5
Standards:
	Supported: 8 7 6 5 
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  976773168
	device size with M = 1024*1024:      476940 MBytes
	device size with M = 1000*1000:      500107 MBytes (500 GB)
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, with device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 0
	Recommended acoustic management value: 128, current value: 254
	DMA: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5 udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	    	SET_MAX security extension
	    	Automatic Acoustic Management feature set
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	64-bit World wide name
	   *	Segmented DOWNLOAD_MICROCODE
	   *	SATA-I signaling speed (1.5Gb/s)
	   *	SATA-II signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Host-initiated interface power management
	   *	Phy event counters
	    	DMA Setup Auto-Activate optimization
	   *	Software settings preservation
	   *	SMART Command Transport (SCT) feature set
	   *	SCT Long Sector Access (AC1)
	   *	SCT LBA Segment Access (AC2)
	   *	SCT Error Recovery Control (AC3)
	   *	SCT Features Control (AC4)
	   *	SCT Data Tables (AC5)
	    	unknown 206[12] (vendor specific)
	    	unknown 206[13] (vendor specific)
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
	not	frozen
	not	expired: security count
		supported: enhanced erase
	118min for SECURITY ERASE UNIT. 118min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee22677660
	NAA		: 5
	IEEE OUI	: 14ee
	Unique ID	: 22677660
Checksum: correct
I got an error even in the Ubuntu LiveCD about my hard disk:

Code:
[  563.816075] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x1950000 action 0xe frozen
[  563.816087] ata1: SError: { PHYRdyChg CommWake Dispar LinkSeq TrStaTrns }
[  563.816099] ata1.00: cmd 35/00:10:e7:6f:d5/00:00:11:00:00/e0 tag 0 dma 8192 out
[  563.816102]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[  563.816107] ata1.00: status: { DRDY }
[  564.532031] ata1: soft resetting link
[  564.688052] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  565.168370] ata1.00: configured for UDMA/133
[  565.168392] ata1: EH complete
[  565.192467] sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors: (500 GB/465 GiB)
[  565.192620] sd 0:0:0:0: [sda] Write Protect is off
[  565.192625] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[  565.203591] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
I like to say I'm pretty good with Linux. But the libata errors are very hard to read.

I am going to e2fsck -c -c /dev/sda5 now.
 
Old 04-25-2009, 10:30 AM   #9
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
I get those "frozen" errors every time I boot this laptop. The messages result from the SATA driver "believing" the drive when it claims to support SATA-2 with 3Gb/sec transfer rates when, in fact, the drive will only support 1.5Gb/sec rates. Since the drive is sdb on the laptop, and I boot from sda the fact that GRUB can't "adjust" the parameters reported by the drive and, therefor, fails to boot from the drive, is of no consequence for my system. (Although I do keep a System Rescue image on a pen drive "just in case.")

The SATA library eventually "get tired" of the drive "barfing" at it, and lowers the access speed to 1.5Gb/sec and everything works well thereafter. (I've found no way to change the erroneous information reported by the HD. This is an HP laptop, and the Phoenix BIOS is written to conform with the HP, "No user could possibly need to do that!" specifications, so there are no HD tuning options in the BIOS.)

Anyhow, if you're using a SATA drive, I suspect (from the error messages you displayed) that you've got a similar problem. Although, if you have the problem on your boot drive, you might find the GRUB is unable to boot the drive and that things like fsck are also unreliable until the drive speed parameter is reset.
 
Old 04-25-2009, 10:55 AM   #10
Yaro
Member
 
Registered: Nov 2008
Posts: 41

Original Poster
Rep: Reputation: 21
Quote:
Originally Posted by PTrenholme View Post
I get those "frozen" errors every time I boot this laptop. The messages result from the SATA driver "believing" the drive when it claims to support SATA-2 with 3Gb/sec transfer rates when, in fact, the drive will only support 1.5Gb/sec rates. Since the drive is sdb on the laptop, and I boot from sda the fact that GRUB can't "adjust" the parameters reported by the drive and, therefor, fails to boot from the drive, is of no consequence for my system. (Although I do keep a System Rescue image on a pen drive "just in case.")

The SATA library eventually "get tired" of the drive "barfing" at it, and lowers the access speed to 1.5Gb/sec and everything works well thereafter. (I've found no way to change the erroneous information reported by the HD. This is an HP laptop, and the Phoenix BIOS is written to conform with the HP, "No user could possibly need to do that!" specifications, so there are no HD tuning options in the BIOS.)

Anyhow, if you're using a SATA drive, I suspect (from the error messages you displayed) that you've got a similar problem. Although, if you have the problem on your boot drive, you might find the GRUB is unable to boot the drive and that things like fsck are also unreliable until the drive speed parameter is reset.
No, this is a full 7200 RPM, 3.0 G/s drive. Or so the box claims.
 
Old 04-25-2009, 01:00 PM   #11
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Quote:
Originally Posted by Yaro View Post
No, this is a full 7200 RPM, 3.0 G/s drive. Or so the box claims.
Yes, that's what my box claimed too. But either the BIOS or the box has a problem with the actual drive I installed in this laptop. (Perhaps your BIOS has an upgrade -- HP "declines" to offer BIOS upgrades to support "non Vista" systems on their "Vista certified" laptops.)

<edit>
Oh, by the way, if I boot the Vista on sda, no disk errors are reported for sdb, but I'm not sure if it's accessed correctly since Vista, of course, just reports it as an "Unformated Drive."
</edit>

Last edited by PTrenholme; 04-25-2009 at 01:04 PM.
 
Old 04-25-2009, 03:51 PM   #12
Yaro
Member
 
Registered: Nov 2008
Posts: 41

Original Poster
Rep: Reputation: 21
Quote:
Originally Posted by PTrenholme View Post
Yes, that's what my box claimed too. But either the BIOS or the box has a problem with the actual drive I installed in this laptop. (Perhaps your BIOS has an upgrade -- HP "declines" to offer BIOS upgrades to support "non Vista" systems on their "Vista certified" laptops.)

<edit>
Oh, by the way, if I boot the Vista on sda, no disk errors are reported for sdb, but I'm not sure if it's accessed correctly since Vista, of course, just reports it as an "Unformated Drive."
</edit>
Can't say I've done BIOS upgrades, either.

And according to the specs of my motherboard I'm supposed to have support for 3.0 G/s drives.
 
Old 04-25-2009, 04:46 PM   #13
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Quote:
Originally Posted by Yaro View Post
Can't say I've done BIOS upgrades, either.

And according to the specs of my motherboard I'm supposed to have support for 3.0 G/s drives.
Have you checked for a BIOS upgrade? Most m/b vendors have Web apps that can do the upgrade for you.

As to the "claims," those are often written by the sales department from specifications provided to the development group. I.e., they are often wishes, not reality.

Now, I'm not sure if my experience is related to your problems, but, as I noted above -- and illustrate below -- my errors seem similar to the excerpt you posted above. To illustrate my point, here's a couple listings from this laptop (edited to remove irrelevant and repetitive items (like DVD drive listing, etc.)

First, this is how lshw reports my two drives. Note that they look identical except for capacity;
Code:
$ sudo lshw -c disk
  *-cdrom
...
  *-disk:0
       description: ATA Disk
       product: FUJITSU MHZ2160B
       vendor: Fujitsu
       physical id: 0
       bus info: scsi@0:0.0.0
       logical name: /dev/sda
       version: 8909
       serial: K616T8325RMF
       size: 149GiB (160GB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 signature=a602a602
  *-disk:1
       description: ATA Disk
       product: Hitachi HTS54323
       vendor: Hitachi
       physical id: 1
       bus info: scsi@1:0.0.0
       logical name: /dev/sdb
       version: FB4O
       serial: 080621FB0400LEG5MTSA
       size: 298GiB (320GB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 signature=00032256
Second, here's what happens when I boot a 64-bit Linux OS (Fedora, Ubuntu, or DSL. But maybe not DSL-it's been a while):
Code:
$ dmesg | grep ata2
ata2: SATA max UDMA/133 irq_stat 0x00400040, connection status changed irq 23
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                       
ata2.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133              
ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 0/32)                 
ata2.00: configured for UDMA/133                                             
--- Block repeated three times . . .
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x380000 action 0x6 frozen       
ata2.00: irq_stat 0x08000001, interface fatal error                          
ata2: SError: { 10B8B Dispar BadCRC }                                        
ata2.00: cmd c8/00:80:08:00:00/00:00:00:00:00/e0 tag 0 dma 65536 in          
ata2.00: status: { DRDY }                                                    
ata2: hard resetting link                                                    
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                       
ata2.00: configured for UDMA/133                                             
ata2: EH complete                                                            
--- End of block 
ata2: limiting SATA link speed to 1.5 Gbps
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x180000 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { 10B8B Dispar }
ata2.00: cmd c8/00:80:e1:20:06/00:00:00:00:00/e0 tag 0 dma 65536 in
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up <unknown> (SStatus 103 SControl 310)
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x780000 action 0x6 frozen
ata2.00: irq_stat 0x08000000, interface fatal error
ata2: SError: { 10B8B Dispar BadCRC Handshk }
ata2.00: cmd c8/00:80:e1:20:06/00:00:00:00:00/e0 tag 0 dma 65536 in
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: configured for UDMA/133
ata2: EH complete
After the above error messages, the 320Gb drive works without any more error messages.
 
Old 04-25-2009, 06:47 PM   #14
Yaro
Member
 
Registered: Nov 2008
Posts: 41

Original Poster
Rep: Reputation: 21
I haven't encountered any more errors since I did the fsck -c -c... so I am going to wait maybe 24 hours and see what happens before marking this SOLVED.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mounting linux files from another hard disk(IDE) to current hard disk(sata) the lord protector Linux - Hardware 5 05-04-2008 11:30 AM
Now I have a scsi hard disk, two IDE hard disk, i want install linux on scsi hard dis tecpenguin Linux - Server 4 11-10-2007 06:44 AM
60GB laptop hard disk & 200GB external USB hard disk linux compatibility powah Linux - Hardware 0 03-07-2006 10:55 AM
Reiser file system / Hard Disk/ Hard Drive Problems Oxyacetylene Linux - Software 4 10-10-2005 02:24 PM
Creating Boot loader disks rather then installing GRUB to hard disk?? barry237 Fedora 1 06-19-2004 10:52 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 01:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration