LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   dmesg shows ATA errors at boot, Slackware 14 (https://www.linuxquestions.org/questions/slackware-14/dmesg-shows-ata-errors-at-boot-slackware-14-a-4175421675/)

D1ver 08-12-2012 04:31 AM

dmesg shows ATA errors at boot, Slackware 14
 
So I have a server which was running Slackware -current happily for the last 25 days of uptime. I upgraded to RC1 today, and when I restarted I noticed the following messages in dmesg. I do not know if they were present before the kernel upgrade or not. They seem to happen every boot now..

Code:

[  34.026646] ata5.00: exception Emask 0x10 SAct 0x1 SErr 0x400100 action 0x6 frozen
[  34.026649] ata5.00: irq_stat 0x08000000, interface fatal error
[  34.026651] ata5: SError: { UnrecovData Handshk }
[  34.026653] ata5.00: failed command: WRITE FPDMA QUEUED
[  34.026657] ata5.00: cmd 61/c8:00:3f:b9:88/01:00:04:00:00/40 tag 0 ncq 233472 out
[  34.026658]          res 40/00:04:3f:b9:88/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
[  34.026660] ata5.00: status: { DRDY }
[  34.026664] ata5: hard resetting link
[  34.330320] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  34.332471] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359)
[  34.332477] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node ffff880216c58550), AE_NOT_FOUND (20110623/psparse-536)
[  34.334955] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359)
[  34.334961] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT4._GTF] (Node ffff880216c58550), AE_NOT_FOUND (20110623/psparse-536)
[  34.335183] ata5.00: configured for UDMA/133
[  34.335194] ata5: EH complete

This set of errors shows up fairly often in dmesg. It's been up for a few hours now and the system seems to be working fine. ata5 is /dev/sdc and is mounted as '/'. This drive is only about a month old.


I've attached the output of 'fdisk -l' and 'smartctl -a /dev/sdc' if this is useful (wall of text warning..).
Code:

root@darkstar:/home/neil/Downloads# fdisk -l

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x6ba37a4a

  Device Boot      Start        End      Blocks  Id  System
/dev/sda1              63  3907029167  1953514552+  fd  Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x58dcd530

  Device Boot      Start        End      Blocks  Id  System
/dev/sdb1              63  3907029167  1953514552+  fd  Linux raid autodetect

Disk /dev/sdc: 320.1 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders, total 625142448 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x181e7e03

  Device Boot      Start        End      Blocks  Id  System
/dev/sdc1  *          63  605473784  302736861  83  Linux
/dev/sdc2      605473785  625142447    9834331+  82  Linux swap

Disk /dev/md0: 2000.3 GB, 2000264495104 bytes
2 heads, 4 sectors/track, 488345824 cylinders, total 3906766592 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Code:

root@darkstar:/home/neil/Downloads# smartctl -a /dev/sdc
smartctl 5.40 2010-10-16 r3189 [x86_64-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:    Western Digital Scorpio Black Serial ATA family
Device Model:    WDC WD3200BEKT-00PVMT0
Serial Number:    WD-WXG1C12E2648
Firmware Version: 01.01A01
User Capacity:    320,072,933,376 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Aug 12 19:26:38 2012 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)        Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0)        The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                  (6060) seconds.
Offline data collection
capabilities:                          (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003)        Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01)        Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:          (  2) minutes.
Extended self-test routine
recommended polling time:          (  63) minutes.
Conveyance self-test routine
recommended polling time:          (  5) minutes.
SCT capabilities:                (0x7035)        SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  164  163  021    Pre-fail  Always      -      800
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      17
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      620
 10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
 11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
 12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      16
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      7
193 Load_Cycle_Count        0x0032  196  196  000    Old_age  Always      -      12670
194 Temperature_Celsius    0x0022  110  107  000    Old_age  Always      -      33
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  100  253  000    Old_age  Offline      -      0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%      620        -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Help or suggestions are very much appreciated!

flank'er 08-12-2012 08:55 AM

Quote:

UnrecovData Handshk
http://lime-technology.com/wiki/inde...f_Drive_Issues
Quote:

"This is transmission error. Most common causes are power related or
unreliable connection especially if backplanes are involved. Is the
problem still reproducible? If so, can you please try to move it to
different power connector and SATA port and see what changes?"
... and run long self-test:
Code:

# smartctl -t long /dev/sdc

D1ver 08-13-2012 02:49 AM

Quote:

Originally Posted by flank'er (Post 4752339)
http://lime-technology.com/wiki/inde...f_Drive_Issues


... and run long self-test:
Code:

# smartctl -t long /dev/sdc

I've swapped the sata cable for a spare, connected to a different port on the motherboard and made sure everything is connected correctly. I don't see the error anymore now that it's been rebooted for about 10 minutes so I'll mark as solved and hope that it was just a dodgy cable!


All times are GMT -5. The time now is 12:13 PM.