LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   New hard drive exceptions of libata (https://www.linuxquestions.org/questions/linux-hardware-18/new-hard-drive-exceptions-of-libata-4175604349/)

husarz 04-21-2017 03:50 PM

New hard drive exceptions of libata
 
Hey, I bought brand new hard drive to netbook VPCYB2M1E (AMD E-350 on A50M chipset). During boot it behaves really weird:

Code:

[    1.367385] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.369351] ata1.00: ATA-8: HGST HTS721010A9E630, JB0OA3U0, max UDMA/133
[    1.369479] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA

[    1.371779] ata1.00: configured for UDMA/133
[    1.372670] scsi 0:0:0:0: Direct-Access    ATA      HGST HTS721010A9 A3U0 PQ: 0 ANSI: 5
[    1.386000] sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
[    1.386029] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    1.386336] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    1.386873] sd 0:0:0:0: [sda] Write Protect is off
[    1.387003] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    1.387365] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.478022]  sda: sda1 sda2 < sda5 sda6 sda7 sda8 >
[    1.481604] sd 0:0:0:0: [sda] Attached SCSI disk

but after sound, usb, swapon dmesg shows:

Code:

[    9.653422] EXT4-fs (sda5): re-mounted. Opts: (null)
[  13.296088] udevd (1333) used greatest stack depth: 12640 bytes left
[  13.382744] EXT4-fs (sda6): mounted filesystem with ordered data mode. Opts: (null)
[  13.542432] EXT4-fs (sda7): mounted filesystem with ordered data mode. Opts: (null)
[  13.575972] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[  13.627135] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
[  14.811173] ata1.00: exception Emask 0x10 SAct 0x20000 SErr 0x400001 action 0x6 frozen
[  14.811178] ata1.00: irq_stat 0x08000000, interface fatal error
[  14.811181] ata1: SError: { RecovData Handshk }
[  14.811187] ata1.00: failed command: WRITE FPDMA QUEUED
[  14.811196] ata1.00: cmd 61/88:88:08:e0:0b/00:00:03:00:00/40 tag 17 ncq dma 69632 out
                        res 50/00:88:08:e0:0b/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
[  14.811198] ata1.00: status: { DRDY }
[  14.811206] ata1: hard resetting link
[  15.279285] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  15.295399] ata1.00: configured for UDMA/133
[  15.295438] ata1: EH complete
[  16.209132] ata1.00: exception Emask 0x10 SAct 0xc000 SErr 0x400000 action 0x6 frozen
[  16.209137] ata1.00: irq_stat 0x08000000, interface fatal error
[  16.209140] ata1: SError: { Handshk }
[  16.209145] ata1.00: failed command: WRITE FPDMA QUEUED
[  16.209154] ata1.00: cmd 61/d0:70:d0:60:0c/01:00:05:00:00/40 tag 14 ncq dma 237568 out
                        res 50/00:e8:c8:01:0c/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[  16.209156] ata1.00: status: { DRDY }
[  16.209159] ata1.00: failed command: WRITE FPDMA QUEUED
[  16.209166] ata1.00: cmd 61/e8:78:c8:01:0c/00:00:07:00:00/40 tag 15 ncq dma 118784 out
                        res 50/00:e8:c8:01:0c/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[  16.209168] ata1.00: status: { DRDY }
[  16.209175] ata1: hard resetting link
[  16.679279] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  16.711948] ata1.00: configured for UDMA/133
[  16.712038] ata1: EH complete
[  16.723332] ata1.00: exception Emask 0x10 SAct 0x60000 SErr 0x400000 action 0x6 frozen
[  16.723339] ata1.00: irq_stat 0x08000000, interface fatal error
[  16.723345] ata1: SError: { Handshk }
[  16.723354] ata1.00: failed command: WRITE FPDMA QUEUED
[  16.723371] ata1.00: cmd 61/e8:88:c8:01:0c/00:00:07:00:00/40 tag 17 ncq dma 118784 out
                        res 50/00:d0:d0:60:0c/00:01:05:00:00/40 Emask 0x10 (ATA bus error)
[  16.723375] ata1.00: status: { DRDY }
[  16.723380] ata1.00: failed command: WRITE FPDMA QUEUED
[  16.723394] ata1.00: cmd 61/d0:90:d0:60:0c/01:00:05:00:00/40 tag 18 ncq dma 237568 out
                        res 50/00:d0:d0:60:0c/00:01:05:00:00/40 Emask 0x10 (ATA bus error)
[  16.723397] ata1.00: status: { DRDY }
[  16.723408] ata1: hard resetting link
[  17.191272] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  17.195124] ata1.00: configured for UDMA/133
[  17.195175] ata1: EH complete
[  17.469075] ata1: limiting SATA link speed to 3.0 Gbps
[  17.469084] ata1.00: exception Emask 0x10 SAct 0x20 SErr 0x400000 action 0x6 frozen
[  17.469085] ata1.00: irq_stat 0x08000000, interface fatal error
[  17.469088] ata1: SError: { Handshk }
[  17.469094] ata1.00: failed command: WRITE FPDMA QUEUED
[  17.469103] ata1.00: cmd 61/00:28:00:e0:17/08:00:05:00:00/40 tag 5 ncq dma 1048576 ou
                        res 50/00:00:00:e0:17/00:08:05:00:00/40 Emask 0x10 (ATA bus error)
[  17.469105] ata1.00: status: { DRDY }
[  17.469113] ata1: hard resetting link
[  17.935253] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  17.978853] ata1.00: configured for UDMA/133
[  17.978895] ata1: EH complete

then 'smartctl -a /dev/sda' returned:

Code:

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.9.23] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:    HGST Travelstar 7K1000
Device Model:    HGST HTS721010A9E630
Serial Number:    JR1004D31KN31M
LU WWN Device Id: 5 000cca 8c8d61d42
Firmware Version: JB0OA3U0
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Apr 21 22:16:31 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)        Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0)        The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  45) seconds.
Offline data collection
capabilities:                          (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003)        Saves SMART data before entering
                                        power-saving mode.

                                        Supports SMART auto save timer.
Error logging capability:        (0x01)        Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:          (  2) minutes.
Extended self-test routine
recommended polling time:          ( 174) minutes.
SCT capabilities:                (0x003d)        SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  100  100  062    Pre-fail  Always      -      0
  2 Throughput_Performance  0x0005  100  100  040    Pre-fail  Offline      -      0
  3 Spin_Up_Time            0x0007  100  100  033    Pre-fail  Always      -      2
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      9
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  100  100  040    Pre-fail  Offline      -      0
  9 Power_On_Hours          0x0012  100  100  000    Old_age  Always      -      8
 10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      9
191 G-Sense_Error_Rate      0x000a  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      2
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      896
194 Temperature_Celsius    0x0002  187  187  000    Old_age  Always      -      32 (Min/Max 16/36)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      36
223 Load_Retry_Count        0x000a  100  100  000    Old_age  Always      -      0

SMART Error Log Version: 1
ATA Error Count: 36 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 36 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 20 e0 e3 17 05  Error: ICRC, ABRT at LBA = 0x0517e3e0 = 85451744

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 28 00 e0 17 40 08      01:31:14.180  WRITE FPDMA QUEUED
  60 60 20 30 4c d3 40 08      01:31:14.171  READ FPDMA QUEUED
  60 e0 18 50 4b d3 40 08      01:31:14.171  READ FPDMA QUEUED
  60 00 10 50 4a d3 40 08      01:31:14.170  READ FPDMA QUEUED
  60 30 08 18 32 d3 40 08      01:31:14.165  READ FPDMA QUEUED

Error 35 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 28 88 02 0c 07  Error: ICRC, ABRT at LBA = 0x070c0288 = 118227592

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 d0 90 d0 60 0c 40 08      01:31:13.435  WRITE FPDMA QUEUED
  61 e8 88 c8 01 0c 40 08      01:31:13.435  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 a0 08      01:31:13.434  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      01:31:13.434  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      01:31:13.433  IDENTIFY DEVICE

Error 34 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 9f 62 0c 05  Error: ICRC, ABRT at LBA = 0x050c629f = 84697759

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 e8 78 c8 01 0c 40 08      01:31:12.920  WRITE FPDMA QUEUED
  61 d0 70 d0 60 0c 40 08      01:31:12.920  WRITE FPDMA QUEUED
  60 a0 68 28 91 90 40 08      01:31:12.912  READ FPDMA QUEUED
  60 00 60 28 90 90 40 08      01:31:12.906  READ FPDMA QUEUED
  60 78 58 e8 39 d3 40 08      01:31:12.898  READ FPDMA QUEUED

Error 33 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 38 58 e0 0b 03  Error: ICRC, ABRT at LBA = 0x030be058 = 51109976

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 88 88 08 e0 0b 40 08      01:31:11.523  WRITE FPDMA QUEUED
  61 08 80 00 e0 0b 40 08      01:31:11.522  WRITE FPDMA QUEUED
  60 00 78 d8 a9 90 40 08      01:31:11.514  READ FPDMA QUEUED
  60 00 70 d8 a8 90 40 08      01:31:11.507  READ FPDMA QUEUED
  60 80 68 58 a8 90 40 08      01:31:11.503  READ FPDMA QUEUED

Error 32 occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 40 40 14 18 05  Error: ICRC, ABRT at LBA = 0x05181440 = 85464128

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 c8 80 1b 18 40 08      01:29:17.252  WRITE FPDMA QUEUED
  61 00 c0 80 13 18 40 08      01:29:17.252  WRITE FPDMA QUEUED
  ef 10 02 00 00 00 a0 08      01:29:17.251  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      01:29:17.251  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      01:29:17.249  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


BW-userx 04-22-2017 09:51 AM

define

During boot it behaves really weird:

rknichols 04-22-2017 11:39 AM

Oh, I think a bunch of "interface fatal error" complaints from the OS together with ICRC (Interface CRC) errors logged by the drive qualifies. If this were a desktop PC or an external drive, I would strongly suspect the cable. In a netbook, about all that can be said is that the problem is somewhere in (a) the motherboard interface, (b) the connector, or (c) the interface circuitry in the drive. Since another drive was presumably working in the netbook, that suggests that the motherboard interface circuitry is fine. That leaves either poor contact at the connector or an internal problem in the drive itself as the only remaining candidates.

I would first see if the original drive still works without those error messages. If you have another system to which you could connect that new drive, I'd try that to see if the drive is OK. I'm hesitant to recommend any method of cleaning the connector because you can easily do more harm than good.

husarz 04-22-2017 01:45 PM

Quote:

Originally Posted by BW-userx (Post 5700655)
define

During boot it behaves really weird:

I mean all those errors and hardresets.

husarz 04-22-2017 01:54 PM

Quote:

Originally Posted by rknichols (Post 5700685)
Oh, I think a bunch of "interface fatal error" complaints from the OS together with ICRC (Interface CRC) errors logged by the drive qualifies. If this were a desktop PC or an external drive, I would strongly suspect the cable. In a netbook, about all that can be said is that the problem is somewhere in (a) the motherboard interface, (b) the connector, or (c) the interface circuitry in the drive. Since another drive was presumably working in the netbook, that suggests that the motherboard interface circuitry is fine. That leaves either poor contact at the connector or an internal problem in the drive itself as the only remaining candidates.

I would first see if the original drive still works without those error messages. If you have another system to which you could connect that new drive, I'd try that to see if the drive is OK. I'm hesitant to recommend any method of cleaning the connector because you can easily do more harm than good.

Next episode of X-files: G-Sense_Error_Rate jumped from 0 to 65552!

I have performed several tests with this drive. I connected it to same netbook but with same results. The old drive works fine. Then the X-files one has been connected to desktop machine (with AMD SB950) and there it works fine on 6.0Gb link w/o any errors form kernel for about a day. Moreover - G-Sense_Error_Rate changed back to 0 there (wth?!).

In meantime I found: https://serverfault.com/questions/40...rive-in-centos

The guy had exactly same errors as I. I have appended libata.force=3.0 to kernel parameters and system started w/o any errors. But I have to work on such configuration for some time to see if it really helped.

Final conclusion/possible reason: when the system by specification is 6.0Gb capable (AMD A50M chipset) then it not necessarily means that hardware (disk connector, when tere is no SATA cable) is as well. There is possibility that Sony who is the netbook manufacturer put hard drive SATA connector only 3.0Gb capable. Then it initially negotiates 6.0Gb because both the chipset and the drive are capable but then after first data transmission it fails and kernel decides to renegotiate connection with drive with slower one.

rknichols 04-22-2017 02:18 PM

There is no connector difference between SATA II (3.0 Gb/s) and SATA III (6.0 Gb/s). A couple of new connector types (one for 1.8 inch devices, one for slimline optical devices) are included in the SATA III standard, but the regular connector is unchanged. Cables for SATA III have better shielding, but the connectors are the same. It's possible that the chipset in the netbook supports 6.0 Gb/s, but other parts of the circuitry do not.

In your original post, I did notice another interface fatal error after the switch to 3.0 Gb/s, but perhaps that was just a hiccup in the negotiation.

FWIW, in the past I had a couple of external drives (eSATA) that also had to renegotiate to a lower speed after errors at the initial speed. I had to add that "libata.force=" to the boot parameters to make those drives connect cleanly.

husarz 04-22-2017 02:25 PM

So far symptoms and test results indicates that the issue is caused by something between the drive and SATA host adapter (in the chipset). As it is netbook there is no standard SATA cable but some custom connector, which looks like this one: http://thumbs.ebaystatic.com/images/...sHK/s-l225.jpg

rknichols 04-22-2017 02:37 PM

That's a normal SATA connector that combines the power and data connectors in one physical piece. It's totally standard for laptops, netbooks, and the like where the drive attaches directly to the circuit board with no intervening cable. My 4-year-old laptop has one just like it.


All times are GMT -5. The time now is 12:08 PM.