LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Suse/Novell (http://www.linuxquestions.org/questions/suse-novell-60/)
-   -   Smart Error At Bootup, I get this error from suse only (http://www.linuxquestions.org/questions/suse-novell-60/smart-error-at-bootup-i-get-this-error-from-suse-only-594558/)

betamaxman 10-25-2007 12:00 PM

Smart Error At Bootup, I get this error from suse only
 
I have vista, suse, and a dozen or so other nix installed on seperate partitions on my first WD sata 250 gb drive. All is golden with it, however suse 10.3 denoted my partitions with sdc rather than sda as all other nix on the drive do, and each startup displays this window with this error message.

"Your hard disk drive is failing! S.M.A.R.T. message: Device: /dev/sdc, Failed SMART usage Attribute: 190 Temperature_Celsius".
The drive seems working fine and no other os installed to the same drive displays this message. and 190 celsus? Thats four times hotter than my cpu!

x_terminat_or_3 10-25-2007 01:12 PM

Not all operating systems monitor the SMART data.

Your drive cannot work at 190 Degrees Celcius. That figure is 3 times more what ordinary drives can take. However, if we take this value to be in Farenheit, then this corresponds to 87C which is still ~30C too hot, but more believable.

Assuming that your drive in fact, is running at 87C, then I can only recommend to shut down your pc RIGHT NOW and purchase a drive cooler. I have seen some that you can bolt on the underside of the drive, consisting of 2 fans. In my case, they bring down the temperature of the drives with about 10C, that would still be too hot for you though. Maybe you need to also drill some extra holes in your case and place more fans there. . .

x_terminat_or_3 10-25-2007 01:16 PM

And if your drives are so hot, take a look in your /var/log/messages


grep -i tempe /var/log/messages

Would show you all entries regarding temperature

Note: suse may place the messages log somewhere else, consult /etc/syslog.conf if you cannot find the file.

betamaxman 10-25-2007 10:58 PM

Must be a bug this what I get with
smartctl -a /dev/sdc


smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE (Serial ATA) family
Device Model: WDC WD2500JS-00MHB1
Serial Number: WD-WMANK1457533
Firmware Version: 10.02E01
User Capacity: 250,059,350,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Fri Oct 26 00:52:33 2007 ADT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (8280) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 96) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 206 187 021 Pre-fail Always - 4700
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2646
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6993
10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2586
190 Temperature_Celsius 0x0022 073 001 045 Old_age Always In_the_past 27
194 Temperature_Celsius 0x0022 123 001 000 Old_age Always - 27
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0009 200 200 051 Pre-fail Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

riba43 10-26-2007 12:07 AM

Quote:

Originally Posted by x_terminat_or_3 (Post 2936686)
And if your drives are so hot, take a look in your /var/log/messages


grep -i tempe /var/log/messages

Would show you all entries regarding temperature

Note: suse may place the messages log somewhere else, consult /etc/syslog.conf if you cannot find the file.


Hi,
interesting, the same minute I started my comp, the temp of my second drive is:

Oct 26 06:37:27 riba smartd[4512]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 91 to 90

That can not be true. Gkrellm shows the temperature is 33deg. celsius.

x_terminat_or_3 10-26-2007 01:16 AM

Quote:

That can not be true. Gkrellm shows the temperature is 33deg. celsius.
Ah, but it is true, this confirms it. Even though it says Temperature_Celsius, the temperature is in Farenheit.

91F = 32.77C

riba43 10-26-2007 01:35 AM

Quote:

Originally Posted by x_terminat_or_3 (Post 2937254)
Ah, but it is true, this confirms it. Even though it says Temperature_Celsius, the temperature is in Farenheit.

91F = 32.77C

Yes, but it says the temperature is in Celsius!!??

Oct 26 06:37:27 riba smartd[4512]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 91 to 90


betamaxman 10-27-2007 11:18 PM

I think the actual temp is not even reported, 190 must be the type of error.
Any way I am almost convinced it is a bug, this drive works fine otherwise.
The sda hda confusion I am thinking is some quark of my bios and the ide and sata drives used together.
Thanks for the replies.

x_terminat_or_3 10-29-2007 05:14 PM

I realize you are not using Fedora Core, but their latest version (still 7 at the moment) started calling ALL drives sd*, maybe suse is doing something similar?

x_terminat_or_3 10-29-2007 05:17 PM

Quote:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 206 204 063 Pre-fail Always - 13009
4 Start_Stop_Count 0x0032 253 253 000 Old_age Always - 1067
5 Reallocated_Sector_Ct 0x0033 253 050 063 Pre-fail Always In_the_past 0
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always - 0
8 Seek_Time_Performance 0x0027 249 233 187 Pre-fail Always - 55136
9 Power_On_Minutes 0x0032 207 207 000 Old_age Always - 806h+54m
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 251 251 000 Old_age Always - 1067
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always - 0
194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 45
This is part of the SMART output for one of my drives. As you can see, the actual temperature in degrees Celsius is given.

x_terminat_or_3 10-29-2007 05:18 PM

So in your sample, the actual temperature of the drive (as noted above), is 27 Degrees Celsius. Don't know how I could have missed it. My apologies for scaring you like that ;)

betamaxman 11-02-2007 09:22 PM

No prob, I appreciate the advise.

matti3 11-05-2007 01:29 PM

it is a known western digital firmware bug:
from http://www.bugtrack.almico.com/view.php?id=468 :
Quote:

I contacted WD and this is their response:

The temperatures reported by all SMART monitoring software is incorrect for the WD2500KS due to a firmware bug. The drive is not defective but the temperatures that the revision of the drive you have bought report to software are incorrect. We are working on a solution.

x_terminat_or_3 11-05-2007 03:05 PM

I don't know what's more interesting, that they actually replied to your mail, or that they admit to faulty firmware.

matti3 11-05-2007 04:05 PM

it is all quite intriguing indeed :)

currently i am searching for a way to configure the smart daemon to either disable the temp checking (and only that), or configure an offset ala speedfan. does anyone know if this is possible?


All times are GMT -5. The time now is 01:39 AM.