Sata Hard Drive Causing Machine Crash

simon72post · 12-11-2008, 07:27 AM

Please can someone help?
Otherwise I may have to go back to windows.

About a year ago I had an old Compaq pc running fine with an IDE hard drive under fedora 6. On an ext3 file system.

I upgraded the hard drive to a western digital 1TB green SATA drive (WD10EACS)
Connected to a SATA controller card.

About a month after running the machine would crash with an error saying it had sector errors.

I would scan the disk and rebuild the machine and then a month later, I have the same fault again.

When I ran the western digital hard drive test software it came back with no errors.

I have tried 2 different SATA controllers one via and one ati.
And was only able to get the original SATA card working.

So I came to the conclusion it was a timing problem trying to use a 10 year old machine with a new hard drive. So I replaced the motherboard with a Jetway J7f4 itx motherboard cn700 chipset.

I then installed the latest version of mythubuntu 8.10 because I was unable to get the display working with fedora.

A month later had the same problem with sector problems again caused the machine to crash.

Again I ran the western digital test disc. This time it found errors and fixed them.

I reinstalled mythbuntu 8.10 and a week later the same problem.
So now I sent the hard drive back to western digital and got a replacement drive.

I have now got the new WD hard drive

And today the system has crashed again after running 2 days with the new drive installed.

So to sum it up

I have replaced the motherboard.
I have replaced the hard drive.
I have tried 2 different versions of Linux and each time I did a full format as ext3.
And I’m still having the same problem.

Now I have run out of ideas.
The only ideas I have left are go back to windows which I really don’t want to do.
Or buy a 750 GB IDE drive. But I would prefer to stick with the drive I have.

I only have a little knowledge with Linux. And if someone tells me where I can find log files for this problem. I will try to find them to give you some more information.

Please can someone help?

H_TeXMeX_H · 12-11-2008, 08:35 AM

Welcome to LQ.

That is interesting. It would be nice if you could find and post the exact error word for word.

It's obvious that it cannot be a hardware issue as you swapped out everything that could be at fault.

So to which SATA controller do you have that drive hooked up currently. You can also post the output of:

Code:

/sbin/lspci -vv

I only really need a section that looks like:

Code:

00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0])
	Subsystem: Intel Corporation Unknown device 5044
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 21
	Region 0: I/O ports at 3428 [size=8]
	Region 1: I/O ports at 3434 [size=4]
	Region 2: I/O ports at 3420 [size=8]
	Region 3: I/O ports at 3430 [size=4]
	Region 4: I/O ports at 3020 [size=32]
	Region 5: Memory at 93225000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: <access denied>
	Kernel driver in use: ahci

about the SATA controllers.

SlowCoder · 12-11-2008, 09:28 AM

Quote:

Originally Posted by H_TeXMeX_H

It's obvious that it cannot be a hardware issue as you swapped out everything that could be at fault.

Bad SATA cable, or power supply fluctuations, possibly?

H_TeXMeX_H · 12-11-2008, 09:37 AM

Quote:

Originally Posted by SlowCoder

Bad SATA cable, or power supply fluctuations, possibly?

It could be a bad SATA cable. But wouldn't a bad PSU also cause other side-effects ?

Sure try a different SATA cable.

simon72post · 12-11-2008, 10:16 AM

Hi

Thanks for your response.

It's not PSU or cable problems because they have both been changed.

I'm unable to look at the problem at the moment as I'm at work.

But I will look at the /sbin/lspci -vv file tonight. And post the error.

jiml8 · 12-11-2008, 10:34 AM

It is a mistake to assume that there is no hardware problem because all hardware has been swapped out. There are several scenarios that can be constructed to demonstrate this fallacy. The most common scenario goes like this: component A fails and does damage to component B. Component A is subsequently swapped out, and now the damaged component B damages the new component A. Then component B is swapped out, and the damaged component A now damages the new component B. While not common, it does occasionally happen.

In the case of a hard drive/controller issue, I would expect such a problem to be extraordinarily rare. Far more likely is that the new drive is defective the same way the old drive was defective. However if I read your original post correctly, you were only able to make one SATA controller work, so that is the only SATA controller that you have tried?

Sector errors normally herald a problem with the controller card on the hard drive OR with the hard drive controller on the motherboard or in the PCI slot. This is where I would be looking, even if you have already changed those things. The only other possibility that occurs to me is a thermal problem, but your drive or SATA card would have to be getting really hot to lead to such problems.

edit:

There is a third possibility. Vibration. If the system is getting bumped or subjected to any shocks, this certainly could induce your problem.

H_TeXMeX_H · 12-11-2008, 10:59 AM

Quote:

Originally Posted by jiml8

It is a mistake to assume that there is no hardware problem because all hardware has been swapped out. There are several scenarios that can be constructed to demonstrate this fallacy. The most common scenario goes like this: component A fails and does damage to component B. Component A is subsequently swapped out, and now the damaged component B damages the new component A. Then component B is swapped out, and the damaged component A now damages the new component B. While not common, it does occasionally happen.

Sector errors normally herald a problem with the controller card on the hard drive OR with the hard drive controller on the motherboard or in the PCI slot. This is where I would be looking, even if you have already changed those things. The only other possibility that occurs to me is a thermal problem, but your drive or SATA card would have to be getting really hot to lead to such problems.

edit:

There is a third possibility. Vibration. If the system is getting bumped or subjected to any shocks, this certainly could induce your problem.

Yes, I agree, it could have been such a chain of events, but you have to be damn lucky to have such a thing happen.

I'm assuming, probabilistically, that the controller card on the HDD cannot be bad on 2 different HDDs bought by the same person, the changes of that are small.

I too suspect that if it is hooked up via a SATA PCI card, either that or the PCI port (rare on 2 different boards) could be the problem, that's why I asked how the HDD is currently hooked up.

It could be a thermal problem, and you could try to use 'smartctl' to find out if the temperature ever went over the maximum. But usually only SCSI HDDs get hot enough to worry about.

EDIT: here's an example of how to find this out, by running 'smartctl -A /dev/sda' I get:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   068   059   045    Old_age   Always       -       32 (Lifetime Min/Max 23/32)
194 Temperature_Celsius     0x0022   032   041   000    Old_age   Always       -       32 (0 20 0 0)

In this case the drive never overheated.

Vibration is definitely possible, but that depends on how long the drive has been in service and in what environment.

What is suspect most is the SATA controller has drivers that are not 100% good or stable. If that is the case, I would suggest trying to run it in AHCI mode for a while and see if it makes a difference.

simon72post · 12-11-2008, 02:11 PM

Here is a copy of the lspci file you needed.

00:0f.0 IDE interface: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32
Interrupt: pin B routed to IRQ 20
Region 0: I/O ports at ff00 [size=8]
Region 1: I/O ports at fe00 [size=4]
Region 2: I/O ports at fd00 [size=8]
Region 3: I/O ports at fc00 [size=4]
Region 4: I/O ports at fb00 [size=16]
Region 5: I/O ports at f400 [size=256]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: sata_via
Kernel modules: sata_via

00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
Subsystem: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32
Interrupt: pin A routed to IRQ 20
Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1]
Region 4: I/O ports at fa00 [size=16]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: pata_via
Kernel modules: pata_via
-----------------------------------------------------------------------------------------------

And regarding the message about swapping out parts and the old pci controller.
the old pci controller is irrelevant as I'm now using the on board controller build in to the new motherboard.

I keep wondering if the problem is due to the way the hard drive controls the power and sin up and down. as its supposed to be a low power energy efficient hard drive. which regulats its own platter speed and energy usage.

http://www.wdc.com/en/products/Products.asp?DriveID=336

But I dont know how this is controlled and if it can be altered.

Just another note there are no other hard drives or cd drives connected to the motherboard.
Only this one WD (WD10EACS) drive connected the the master SATA port.

Are there any other logs I can look at that have infomation on the hard drive itself.

H_TeXMeX_H · 12-11-2008, 02:16 PM

Well, in case it is the SATA drivers, why not try going into the BIOS and change the mode of the SATA controller to AHCI. Then you won't need either sata_via or pata_via drivers which can cause conflicts and strange behavior from the drive in some cases. I think Ubuntu should have the ahci driver available by default.

simon72post · 12-11-2008, 02:26 PM

I have just tried to run the smartctl command listed above.
But was unable to get it to work.

I did manage to get this infomation from smartctl though.

=== START OF INFORMATION SECTION ===
Device Model: WDC WD10EACS-00ZJB0
Serial Number: WD-WCASJ0594379
Firmware Version: 01.01B01
User Capacity: 1,000,204,886,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Dec 11 20:19:58 2008 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Disabled

SMART Disabled. Use option -s with argument 'on' to enable it.

-----------------------------------------------------------------
Also H_TeXMeX_H mensioned about temprature.

I dont think this is a problem for a start the drive is probably one of the coolest running drives I have used it hardly gets warm even after a lot of use. and the machine is in a room in my loft which is cold at the moment.

H_TeXMeX_H · 12-11-2008, 02:39 PM

Well it says "SMART Disabled. Use option -s with argument 'on' to enable it." or you can turn it on in the BIOS. But, then if it wasn't on, then it didn't get the temperature.

simon72post · 12-11-2008, 02:47 PM

I have just had a look at the bios and HDD S.M.A.R.T. was disabled so I have enabled it.

And the DRDY timing setting is on slowest. if this is related.

I am unable to find AHCI in my bios
But the VIA SATA fuction is enabled
and the SATA/RAID mode is set to SATA mode as which I belive is correct as I only have the one drive.

simon72post · 12-12-2008, 02:52 AM

Hi

I have managed to get more infomation from S.M.A.R.T.

I hope this will help diagnose the problem.

=== START OF INFORMATION SECTION ===
Device Model: WDC WD10EACS-00ZJB0
Serial Number: WD-WCASJ0594379
Firmware Version: 01.01B01
User Capacity: 1,000,204,886,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Fri Dec 12 08:46:06 2008 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (26400) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off supp ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 253 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 253 195 021 Pre-fail Always - 2483
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 109
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 58
10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 109
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 102
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 4137
194 Temperature_Celsius 0x0022 134 102 000 Old_age Always - 18
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 051 Old_age Offline - 0

SMART Error Log Version: 1
ATA Error Count: 6 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle .

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:46.564 READ DMA
27 00 00 00 00 00 00 08 00:00:46.564 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:46.555 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:46.548 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:46.548 READ NATIVE MAX ADDRESS EXT

Error 5 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle .

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:42.673 READ DMA
27 00 00 00 00 00 00 08 00:00:42.673 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:42.664 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:42.657 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:42.657 READ NATIVE MAX ADDRESS EXT

Error 4 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle .

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:38.782 READ DMA
27 00 00 00 00 00 00 08 00:00:38.782 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:38.773 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:38.766 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:38.766 READ NATIVE MAX ADDRESS EXT

Error 3 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle .

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:34.890 READ DMA
27 00 00 00 00 00 00 08 00:00:34.890 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:34.881 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:34.874 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:34.874 READ NATIVE MAX ADDRESS EXT

Error 2 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle .

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:30.867 READ DMA
27 00 00 00 00 00 00 08 00:00:30.867 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:30.858 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:30.851 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:30.851 READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

--------------------------------------------------------------------------------

Error 6 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:46.564 READ DMA
27 00 00 00 00 00 00 08 00:00:46.564 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:46.555 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:46.548 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:46.548 READ NATIVE MAX ADDRESS EXT

Error 5 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:42.673 READ DMA
27 00 00 00 00 00 00 08 00:00:42.673 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:42.664 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:42.657 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:42.657 READ NATIVE MAX ADDRESS EXT

Error 4 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:38.782 READ DMA
27 00 00 00 00 00 00 08 00:00:38.782 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:38.773 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:38.766 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:38.766 READ NATIVE MAX ADDRESS EXT

Error 3 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:34.890 READ DMA
27 00 00 00 00 00 00 08 00:00:34.890 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:34.881 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:34.874 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:34.874 READ NATIVE MAX ADDRESS EXT

Error 2 occurred at disk power-on lifetime: 39 hours (1 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 8f 70 90 e0 Error: UNC at LBA = 0x0090708f = 9465999

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 80 3f 70 90 00 08 00:00:30.867 READ DMA
27 00 00 00 00 00 00 08 00:00:30.867 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 00 08 00:00:30.858 IDENTIFY DEVICE
ef 03 46 00 00 00 00 08 00:00:30.851 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 00 08 00:00:30.851 READ NATIVE MAX ADDRESS EXT

H_TeXMeX_H · 12-12-2008, 03:16 AM

Quote:

Originally Posted by simon72post

and the SATA/RAID mode is set to SATA mode as which I belive is correct as I only have the one drive.
...

Code:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   253   051    Pre-fail  Always       -                       0
  3 Spin_Up_Time            0x0003   253   195   021    Pre-fail  Always       -                       2483
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -                       109
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -                       0
  7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -                       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -                       58
 10 Spin_Retry_Count        0x0012   100   100   051    Old_age   Always       -                       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -                       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -                       109
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -                       102
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -                       4137
194 Temperature_Celsius     0x0022   134   102   000    Old_age   Always       -                       18
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -                       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -                       1
198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   Offline      -                       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -                       0
200 Multi_Zone_Error_Rate   0x0008   100   253   051    Old_age   Offline      -                       0

I say you should try putting the SATA/RAID mode to RAID instead of SATA, this will make it use the ahci driver.

Please put terminal output in code tags if possible, it will make it much nicer. As far as I see, nothing is failing, but there were errors reported as you can see further down in the output. I would recommend that you do a long test on the drive and see if it any errors come up. To do this run:

Code:

smartctl -t long /dev/sda

then wait the amount of time it says for it to finish (you can do other things in the meantime, but don't turn off the computer) then run 'smartctl -a /dev/sda' again to see the results at the bottom.

simon72post · 12-12-2008, 08:31 AM

Hi

The machine has just crashed again.

I have gone into the bios and enabled Raid.

I now get a message saying "if you want to install the linux default raid driver, please do not use DPROM cration operation"

but due to the machine crashing I'm unable to get the gui up and running again.
and it will only mount the drive as read only.

the machine try's to scan the disc get about 16% then stops.
with an error message /dev/sda1: Inode 187886 has illegal blocks.
fsck died with exit status 4

and only gives me the option to scan the disc manualy.

I am unable to run the command smartctl -t long /dev/sda.

I get a message smartctl is unavilable in /usr/sbin/smartctl

"this is likely caused by the lack of admin privilages."

and I was trying to run the command as root.

I managed to run e2fsck -y /dev/sda1

and now I have the gui back.

I'm in the process of running smartctl -t long /dev/sda now

I will let you know when its done.