LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 08-11-2014, 02:22 PM   #1
inachomsky
LQ Newbie
 
Registered: Aug 2014
Posts: 3

Rep: Reputation: Disabled
HDD/BUS problem ata1.00: status: { DRDY } ata1.00: hard resetting link


I had a ubuntu server, with four disks(it is the limit of sata conexions for my motherboard).
I changed the last week a 2,5 hdd 250GB disk for a 3,5 3TB hdd.

The install is ok, but im coming to have problems.

The third disk, comming to be too slow, works, but very slow...

The second disk, when i play a movie(via network, in a rabsperry pi) sometimes be frooze.

The first disk(2,5HDD for 250GB) for the system work ok, and the new disk the four disk too.


I dont know what is the problem...
First i think in the temperature. But i install hddtemp and the high values are 43(it could be better, but i think its no problem).
Now, i thinking in the power supplie, and i disconnetc one disk of DATA, and try to run a movie of the second disk again...

But i had the same error in dmesg.

I dont know what are wrong, and i dont understand the log

A, other important information...
In the POST bios message, when the bios was detecting the disk, the second disk(the slowie disk) shows "SMART CAPABLE but comand failed", the others, "SMART CAPABLE and OK".

But i disconect this disk now, to try with less disks, and this error was show in the first disk(the system disk) now!.

All the disk "PASSED" the smartctl tests.
i can post, but i think is not the problem.

So sorry for my english, i only learn english in the spanish public education system XD.

This is the error:

Quote:
[ 984.796023] ata1: lost interrupt (Status 0x50)
[ 984.796043] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[ 984.796047] ata1.00: SError: { RecovComm PHYRdyChg CommWake DevExch }
[ 984.796050] ata1.00: failed command: READ DMA EXT
[ 984.796055] ata1.00: cmd 25/00:20:60:04:5a/00:00:5a:00:00/e0 tag 2 dma 16384 in
[ 984.796055] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[ 984.796058] ata1.00: status: { DRDY }
[ 984.796066] ata1.00: hard resetting link
[ 985.520017] ata1.01: hard resetting link
[ 985.996057] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 985.996068] ata1.01: SATA link down (SStatus 4 SControl 300)
[ 986.012323] ata1.00: configured for UDMA/133
[ 986.012331] ata1.00: device reported invalid CHS sector 0
[ 986.012340] ata1: EH complete
[ 1140.796033] ata1: lost interrupt (Status 0x50)
[ 1140.796052] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[ 1140.796056] ata1.00: SError: { RecovComm PHYRdyChg CommWake DevExch }
[ 1140.796060] ata1.00: failed command: READ DMA EXT
[ 1140.796065] ata1.00: cmd 25/00:20:20:54:27/00:00:5b:00:00/e0 tag 1 dma 16384 in
[ 1140.796065] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[ 1140.796067] ata1.00: status: { DRDY }
[ 1140.796076] ata1.00: hard resetting link
[ 1141.520017] ata1.01: hard resetting link
[ 1141.996059] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1141.996070] ata1.01: SATA link down (SStatus 4 SControl 300)
[ 1142.020333] ata1.00: configured for UDMA/133
[ 1142.020341] ata1.00: device reported invalid CHS sector 0
[ 1142.020348] ata1: EH complete
[ 1174.804032] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[ 1174.804038] ata1.00: SError: { RecovComm PHYRdyChg CommWake DevExch }
[ 1174.804042] ata1.00: failed command: READ DMA EXT
[ 1174.804046] ata1.00: cmd 25/00:20:00:2b:e4/00:00:9e:00:00/e0 tag 17 dma 16384 in
[ 1174.804046] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[ 1174.804049] ata1.00: status: { DRDY }
[ 1174.804057] ata1.00: hard resetting link
[ 1175.528013] ata1.01: hard resetting link
[ 1181.044012] ata1.00: link is slow to respond, please be patient (ready=0)
[ 1184.852014] ata1.00: SRST failed (errno=-16)
[ 1184.852021] ata1.00: hard resetting link
[ 1185.576016] ata1.01: hard resetting link
[ 1188.740054] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1188.740065] ata1.01: SATA link down (SStatus 4 SControl 300)
[ 1188.764325] ata1.00: configured for UDMA/133
[ 1188.764335] ata1: EH complete
[ 1218.908033] ata1: lost interrupt (Status 0x50)
[ 1218.908052] ata1.00: limiting SATA link speed to 1.5 Gbps
[ 1218.908057] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
[ 1218.908061] ata1.00: SError: { RecovComm PHYRdyChg CommWake DevExch }
[ 1218.908065] ata1.00: failed command: READ DMA EXT
[ 1218.908070] ata1.00: cmd 25/00:08:f0:1f:04/00:00:a9:00:00/e0 tag 21 dma 4096 in
[ 1218.908070] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[ 1218.908072] ata1.00: status: { DRDY }
[ 1218.908080] ata1.00: hard resetting link
[ 1219.632017] ata1.01: hard resetting link
[ 1225.148018] ata1.00: link is slow to respond, please be patient (ready=0)
[ 1228.956017] ata1.00: SRST failed (errno=-16)
[ 1228.956025] ata1.00: hard resetting link
[ 1229.680015] ata1.01: hard resetting link
[ 1230.156064] ata1.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1230.156074] ata1.01: SATA link down (SStatus 4 SControl 300)
[ 1230.180346] ata1.00: configured for UDMA/133
[ 1230.180352] ata1.00: device reported invalid CHS sector 0
[ 1230.180359] ata1: EH complete
[ 1473.515315] perf samples too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

Last edited by inachomsky; 08-11-2014 at 02:50 PM.
 
Old 08-13-2014, 08:20 AM   #2
inachomsky
LQ Newbie
 
Registered: Aug 2014
Posts: 3

Original Poster
Rep: Reputation: Disabled
I find this:
https://ata.wiki.kernel.org/index.ph...error_messages
Code:
     

Overview

All libata error messages produced by the kernel use a standard format:

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }

Prefix

The prefix

ata3.00:

decodes as
ata 	prefix, indicating this is a libata port or device message
3 	port number, counting from one (1)
00 	device number, usually zero unless Port Multiplier or PATA master/slave is involved
Exception line

The exception line gives an overview of the EH (Error Handler) state.

exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

Emask 	Error classification bitmask (AC_ERR_xxx in source code)
SAct 	SATA SActive register
SErr 	SATA SError register
action 	ATA_EH_xxx actions, like revalidate, softreset, hardreset (see include/linux/libata.h)
frozen 	if present, indicates the port was frozen for EH
t<number> 	number of retries
Input taskfile

The "cmd" line gives the ATA command (taskfile) sent to the device:

cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0

This lists ATA registers in the following order:
	Command
(separator)
	Feature
	NSect
	LBA L
	LBA M
	LBA H
(separator)
	HOB Feature
	HOB NSect
	HOB LBA L
	HOB LBA M
	HOB LBA H
(separator)
	Device/Head
tag 	NCQ tag number, or listed as zero if NCQ is not active/applicable.
Output taskfile, error summary

The next line contains a current dump of the ATA device's registers, along with an error summary:

res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)

In order:
	Status
(separator)
	Error
	NSect
	LBA L
	LBA M
	LBA H
(separator)
	HOB Error
	HOB NSect
	HOB LBA L
	HOB LBA M
	HOB LBA H
(separator)
	Device/Head
Emask 	ATA command's internal error mask (AC_ERR_xxx in source code)
(summary) 	An English summary of the error, such as

    timeout
    HSM violation
    media error 

See below for a full list.
Error classes

These are the possible values for the (summary) in each error message, above.
host bus error 	Host<->chip bus error (i.e. PCI, if on PCI bus)
ATA bus error 	chip<->device bus error
timeout 	Controller failed to respond to an active ATA command. This could be any number of causes. Most often this is due to an unrelated interrupt subsystem bug (try booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to deliver an interrupt when we were expecting one from the hardware.
HSM violation 	Hardware failed to respond in an expected manner. "HSM" stands for Host State Machine, a software-based finite state machine required by ATA that expects certain hardware behaviors, based on the current ATA command and other hardware-state programming details.
internal error 	Hardware flagged an impossible condition, most likely due to software misprogramming.
media error 	Software detected a media error
invalid argument 	Software marked ATA command as invalid, for some reason
device error 	Hardware indicates an error with last command. This error is delivered directly from the ATA device. If you see a lot of these, that is often an indication of a hardware problem.
unknown error 	Uncategorized error (should never happen)
ATA status expansion

The final line

status: { DRDY }

expands the ATA status register returned in the output taskfile into its component bits:
Busy 	Device busy (all other bits invalid)
DRDY 	Device ready. Normally 1, when all is OK.
DRQ 	Data ready to be sent/received via PIO
DF 	Device fault
ERR 	Error (see Error register for more info)
ATA error expansion

If any bits in the Error register are set, the Error register contents will be expanded into its component bits, for example:

error: { ICRC ABRT }

ICRC 	Interface CRC error during Ultra DMA transfer - often either a bad cable or power problem, though possibly an incorrect Ultra DMA mode setting by the driver
UNC 	Uncorrectable error - often due to bad sectors on the disk
IDNF 	Requested address was not found
ABRT 	Command aborted - either command not supported, unable to complete, or interface CRC (with ICRC)
SATA SError expansion

If any bits in the SATA SError register are set, the SError register contents will be expanded into its component bits, for example:

SError: { PHYRdyChg CommWake }

These bits are set by the SATA host interface in response to error conditions on the SATA link. Unless a drive hotplug or unplug operation occurred, it is generally not normal to see any of these bits set. If they are, it usually points strongly toward a hardware problem (often a bad SATA cable or a bad or inadequate power supply).
RecovData 	Data integrity error occurred, but the interface recovered
RecovComm 	Communications between device and host temporarily lost, but regained
UnrecovData 	Data integrity error occurred, interface did not recover
Persist 	Persistent communication or data integrity error
Proto 	SATA protocol violation detected
HostInt 	Host bus adapter internal error
PHYRdyChg 	PhyRdy signal changed state
PHYInt 	PHY internal error
CommWake 	COMWAKE detected by PHY (PHY woken up)
10B8B 	10b to 8b decoding error occurred
Dispar 	Incorrect disparity detected
BadCRC 	Link layer CRC error occurred
Handshk 	R_ERR handshake response received in response to frame transmission
LinkSeq 	Link state machine error occurred
TrStaTrns 	Transport layer state transition error occurred
UnrecFIS 	Unrecognized FIS (frame information structure) received
DevExch 	Device presence has changed
it was veeery util for understand the dmesg info.

The error can be in, motherboard(sata slots), powersupply(random upper or lower voltajes), or sata cable.

Im waiting for buy a new sata cables(i will wait a flew days for the ship ).
I think this is the solution in my case, becouse i had very old and poor quality sata cables, and the problem was coming when i put new disk and i was moving sata cables.

Last edited by inachomsky; 08-13-2014 at 09:32 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
ata1: SRST failed (errno =- 16), but there's no hard drive. tfnc99 Linux - Hardware 2 04-19-2014 06:56 PM
ata1: EH complete GazL Slackware 4 08-01-2013 05:22 PM
[SOLVED] ata1: lost interrupt (Status 0x58) Toadman Linux - Hardware 1 07-23-2013 12:50 PM
ata1.01: status: { DRDY ERR } EDDY1 Linux - Newbie 2 08-09-2011 05:22 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 10:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration