LinuxQuestions.org - Random Reboots

More info:

The error logs from smartctl show the following:

Code:

Error 269 occurred at disk power-on lifetime: 514 hours (21 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 32 b0 42 fd e1  Error: UNC 50 sectors at LBA = 0x01fd42b0 = 33374896



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 32 b0 42 fd e1 00      00:02:09.350  READ DMA

  c8 00 34 ae 42 fd e1 00      00:02:06.950  READ DMA

  c8 00 36 ac 42 fd e1 00      00:02:04.250  READ DMA

  c8 00 38 aa 42 fd e1 00      00:02:01.750  READ DMA

  c8 00 3a a8 42 fd e1 00      00:01:59.250  READ DMA



Error 268 occurred at disk power-on lifetime: 514 hours (21 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 34 b0 42 fd e1  Error: UNC 52 sectors at LBA = 0x01fd42b0 = 33374896



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 34 ae 42 fd e1 00      00:02:06.950  READ DMA

  c8 00 36 ac 42 fd e1 00      00:02:04.250  READ DMA

  c8 00 38 aa 42 fd e1 00      00:02:01.750  READ DMA

  c8 00 3a a8 42 fd e1 00      00:01:59.250  READ DMA

  c8 00 3c a6 42 fd e1 00      00:01:56.500  READ DMA



Error 267 occurred at disk power-on lifetime: 514 hours (21 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 36 b0 42 fd e1  Error: UNC 54 sectors at LBA = 0x01fd42b0 = 33374896



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 36 ac 42 fd e1 00      00:02:04.250  READ DMA

  c8 00 38 aa 42 fd e1 00      00:02:01.750  READ DMA

  c8 00 3a a8 42 fd e1 00      00:01:59.250  READ DMA

  c8 00 3c a6 42 fd e1 00      00:01:56.500  READ DMA

  c8 00 3e a4 42 fd e1 00      00:01:53.900  READ DMA



Error 266 occurred at disk power-on lifetime: 514 hours (21 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 38 b0 42 fd e1  Error: UNC 56 sectors at LBA = 0x01fd42b0 = 33374896



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 38 aa 42 fd e1 00      00:02:01.750  READ DMA

  c8 00 3a a8 42 fd e1 00      00:01:59.250  READ DMA

  c8 00 3c a6 42 fd e1 00      00:01:56.500  READ DMA

  c8 00 3e a4 42 fd e1 00      00:01:53.900  READ DMA

  c8 00 40 a2 42 fd e1 00      00:01:51.250  READ DMA



Error 265 occurred at disk power-on lifetime: 514 hours (21 days + 10 hours)

  When the command that caused the error occurred, the device was active or idle.



  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 3a b0 42 fd e1  Error: UNC 58 sectors at LBA = 0x01fd42b0 = 33374896



  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 3a a8 42 fd e1 00      00:01:59.250  READ DMA

  c8 00 3c a6 42 fd e1 00      00:01:56.500  READ DMA

  c8 00 3e a4 42 fd e1 00      00:01:53.900  READ DMA

  c8 00 40 a2 42 fd e1 00      00:01:51.250  READ DMA

  c8 00 42 a0 42 fd e1 00      00:01:48.650  READ DMA

which makes me think the drive may be failing, or at least has a bad sector at the mentioned LBA. But the smartctl -A command shows:

Code:

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



=== START OF READ SMART DATA SECTION ===

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000b  200  165  051    Pre-fail  Always      -      0

  3 Spin_Up_Time            0x0007  118  095  021    Pre-fail  Always      -      1475

  4 Start_Stop_Count        0x0032  100  100  040    Old_age  Always      -      697

  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000b  200  200  051    Pre-fail  Always      -      0

  9 Power_On_Hours          0x0032  049  049  000    Old_age  Always      -      37725

 10 Spin_Retry_Count        0x0013  100  100  051    Pre-fail  Always      -      0

 11 Calibration_Retry_Count 0x0013  100  100  051    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      498

196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0012  200  200  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0012  200  200  000    Old_age  Always      -      0

199 UDMA_CRC_Error_Count    0x000a  200  253  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x0009  200  200  051    Pre-fail  Offline      -      0

and the health status reads "PASSED". According to what I can tell from the man page for smartctl, none of these values indicate a problem.

I am running periodic self-tests which don't seem to show any problems.

Code:

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed without error      00%      578        -

# 2  Extended offline    Completed without error      00%      508        -

# 3  Short offline      Completed without error      00%      507        -

# 4  Short offline      Completed without error      00%      483        -

# 5  Short offline      Completed without error      00%      460        -

# 6  Short offline      Completed without error      00%      436        -

# 7  Short offline      Completed without error      00%      413        -

# 8  Short offline      Completed without error      00%      390        -

# 9  Extended offline    Completed without error      00%      346        -

#10  Short offline      Completed without error      00%      344        -

#11  Short offline      Completed without error      00%      321        -

#12  Short offline      Completed without error      00%      297        -

#13  Short offline      Completed without error      00%      273        -

#14  Short offline      Completed without error      00%      250        -

#15  Short offline      Completed without error      00%      227        -

#16  Short offline      Completed without error      00%      226        -

#17  Short offline      Completed without error      00%      203        -

#18  Extended offline    Completed without error      00%      181        -

#19  Short offline      Completed without error      00%      180        -

#20  Short offline      Completed without error      00%      156        -

#21  Short offline      Completed without error      00%      133        -

Am I reading this correctly?