LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Is my drive dying? (https://www.linuxquestions.org/questions/linux-hardware-18/is-my-drive-dying-526734/)

edgjerp 02-08-2007 08:55 AM

Is my drive dying?
 
On my system I have a secondary IDE controller, and recently one of the drives on it has been making a lot of "noise" in my syslog:
Code:

 
kernel: end_request: I/O error, dev hde, sector 20738099
Feb  8 01:58:18 localhost  last message repeated 4 times
Feb  8 01:58:18 localhost kernel: end_request: I/O error, dev hde, sector 225687651
Feb  8 01:58:18 localhost  last message repeated 4 times
Feb  8 01:58:18 localhost kernel: end_request: I/O error, dev hde, sector 105363555
Feb  8 01:58:18 localhost  last message repeated 4 times
Feb  8 01:58:18 localhost kernel: end_request: I/O error, dev hde, sector 44284003
Feb  8 01:58:18 localhost  last message repeated 4 times
Feb  8 01:58:18 localhost kernel: end_request: I/O error, dev hde, sector 171423843

I have a LOT more of the same stuff in the log.

Is the drive dying, or is it something with the controller?

The controller is a little tricky to deal with, so this would be my first guess. By this I mean that if I power the drives connected to it before the BIOS on it has determined that no drives are present, LILO fails, as it seems this BIOS wants to boot the system. my solution to this is a second PSU. Also, during fs checking/mounting I get some CRC errors on drives connected to this card, which makes this part of booting a little slow.

Lenard 02-08-2007 09:37 AM

What do you mean by "noise" if a clicking or grinding type sound then yes.

Your reported errors also suggest this.

edgjerp 02-08-2007 09:41 AM

no, digital noise, as in the example given, lots and lots of complaints (I/O errors). the physical drive is as quiet as always.

alred 02-08-2007 10:05 AM

if this just happen recently then do a very clean reformating of that disk and use it for windows systems ... windows systems generally dont give this sort of trouble or at least windows systems can prolong that disks for a much longer time ...


.

marozsas 02-08-2007 10:28 AM

Do you have the smartools package installed ? It is the best way to check if your drive is ok or not. If your drive is not too old and you have S.M.A.R.T support enabled in the BIOS, you can check the drive status by querying the self monitoring capability, built in your drive.

For example, a typical output could be:
Code:

root@babylon5:~>smartctl -H /dev/sda
smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Temperature_Celsius    0x0022  051  039  045    Old_age  Always  In_the_past 49
...

You may set a self test in your disk by:
Code:

root@babylon5:~>smartctl -t long /dev/sda
smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 56 minutes for test to complete.
Test will complete after Thu Feb  8 15:21:35 2007

Use smartctl -X to abort test.
root@babylon5:~>

And after a while, retrieve the test result:
Code:

root@babylon5:~>smartctl -l selftest /dev/sda
smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      1099        -
# 2  Short offline      Completed without error      00%        0        -

root@babylon5:~>

This is the best way I know to verify the disk integrity.

farslayer 02-08-2007 10:30 AM

You could always install SMART and check the health status of the drive.
if it IS failing then I wouldn't want to use it for data in Linux or Windows..

Code:

itg-debian:~# smartctl -a /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:    WDC WD400BB-00DEA0
Serial Number:    WD-WMAD11145217
Firmware Version: 05.03E05
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  5
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Feb  8 12:25:06 2007 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (1506) seconds.
Offline data collection
capabilities:                    (0x3b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (  2) minutes.
Extended self-test routine
recommended polling time:        (  28) minutes.
Conveyance self-test routine
recommended polling time:        (  5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  200  200  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0007  103  093  021    Pre-fail  Always      -      2200
  4 Start_Stop_Count        0x0032  100  100  040    Old_age  Always      -      113
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  200  200  051    Pre-fail  Always      -      0
  9 Power_On_Hours          0x0032  055  055  000    Old_age  Always      -      33069
 10 Spin_Retry_Count        0x0013  100  100  051    Pre-fail  Always      -      0
 11 Calibration_Retry_Count 0x0013  100  253  051    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      83
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0012  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0012  200  200  000    Old_age  Always      -      0
199 UDMA_CRC_Error_Count    0x000a  200  253  000    Old_age  Always      -      2
200 Multi_Zone_Error_Rate  0x0009  200  200  051    Pre-fail  Offline      -      0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%      301        -

Device does not support Selective Self Tests/Logging

once SMART is enabled it should be fairly easy to determine the health of your device..

Alternately most drive manufacturers supply a program to test hard drives they manufacture. you can check their support web sites for more information on these programs.

farslayer 02-08-2007 10:32 AM

looks like marozsas types a bit faster than me :)

well now you have two recommendations to use SMART for testing.. heh


All times are GMT -5. The time now is 11:30 PM.