LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 08-09-2017, 06:19 PM   #1
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 535

Rep: Reputation: 92
Hard drive error on different boxes.


I keep getting these errors:

Code:
[23828.353436] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[23828.353445] ata7.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 15 pio 16392 in
                        opcode=0x4a 4a 01 00 00 10 00 00 00 08 00res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[23828.353447] ata7.00: status: { DRDY }
[23828.353455] ata7: hard resetting link
[23828.814436] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[23828.818477] ata7.00: configured for UDMA/100
[23828.819012] ata7: EH complete
When copying a large amount of data to a hard drive. Sometimes the hard drive disconnects, and running blkid will not show it as being present at all until I reboot the box. After a reboot it is fine - for a while. If it sits idle I have no problems, it's just when I'm copying a large of data to the drive.

So I moved the process to a different box, thinking I might have a bad hard drive, and I get the same exact error. Different box, different drive, the only thing they really have in common is they are both running Slackware 14.2 64 bit.

This is something new - I've been doing this process for years. It started maybe a month ago. I run slackpkg to keep the systems current, is it possible there is some kernel/driver bug that was recently introduced? It seems that if this were the case, it would not be just me.

Does this make sense, and does anyone recognize the above error and maybe have some ideas where to go? I'm pretty much stumped here.

Last edited by Ook; 08-09-2017 at 06:24 PM.
 
Old 08-09-2017, 08:14 PM   #2
Diantre
Member
 
Registered: Jun 2011
Distribution: Slackware
Posts: 515

Rep: Reputation: 224Reputation: 224Reputation: 224
What is the SMART status of the drives? (use smartctl or gsmartcontrol)

If the drives report no errors, try changing the SATA cables. If the errors still appear after changing cables, well, it could be a hardware problem in the motherboard. But first things first, check the drives for errors, then replace cables.
 
Old 08-09-2017, 10:28 PM   #3
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 535

Original Poster
Rep: Reputation: 92
smartctl <device> -H always showed passed. I even did a badblocks scan destructive, no errors. smartctl shows no reallocated sectors at all. And a long self test passed. I've done whatever diags I can on both drives, and they always passed. I'm tending to rule out hardware failure because I did this on two separate boxes, and got the same error on both of them. Either that or this is an amazing coincidence in that the same failure is occurring on both mobos and hard drives at the same time...

I'm going to play some more with this - move the drives, replace cables, try again. This is yet another of those weird things that happens from time to time where you want to go check the phase of the moon or check cosmic ray count...
 
Old 08-09-2017, 10:33 PM   #4
upnort
Member
 
Registered: Oct 2014
Distribution: Slackware, CentOS, Ubuntu MATE
Posts: 400

Rep: Reputation: Disabled
Quote:
So I moved the process to a different box, thinking I might have a bad hard drive, and I get the same exact error. Different box, different drive, the only thing they really have in common is they are both running Slackware 14.2 64 bit.
Then likely the drives are not the problem.

Quote:
This is something new - I've been doing this process for years. It started maybe a month ago.
That was about when the kernel was updated to 4.4.74 and then a few days later to 4.4.75. If you know how, would not hurt to restore the 4.4.38 kernel and see what happens.
 
Old 08-09-2017, 10:36 PM   #5
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 535

Original Poster
Rep: Reputation: 92
Quote:
Originally Posted by upnort View Post
Then likely the drives are not the problem.


That was about when the kernel was updated to 4.4.74 and then a few days later to 4.4.75. If you know how, would not hurt to restore the 4.4.38 kernel and see what happens.
I think the 4.4.75 came out about June 30, and that is about the right timing. I will indeed revert back to the 4.4.38 and try again. Thanks for that, I didn't think about reverting to an older kernel, though you would think if I suspected a kernel update, I would do so...<sigh>... retirement is in 5 and a half years, and I'm going to head south of the border and leave it all behind...can't happen soon enough...

In the meantime, I'll do that, test it, and report back.
 
Old 08-11-2017, 01:28 PM   #6
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 535

Original Poster
Rep: Reputation: 92
I reverted to 4.4.38 kernel, and it still happened. I then disabled NCQ, and the frequency dropped to about 5% as often as it was happening, and it recovers fully each time. Very strange.

A bit of research shows that this particular error tends to pop up from time to time, and has done so for many years. It is infrequent and I haven't seen where anyone has come up with a definitive fix, but most blame driver specific compatibility issues with various sata controllers as the primary culprit.

I think I'll watch it and see what happens...
 
Old 09-01-2017, 04:35 PM   #7
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 535

Original Poster
Rep: Reputation: 92
This continues to happen on any drive that gets a lot of write activity, but it no longer drops the drive offline. IDK why, but it doesn't seem to actually cause any problems, so I think I'm just going to sit back and ignore it for now....
 
Old 09-12-2017, 02:31 PM   #8
brobr
Member
 
Registered: Oct 2003
Location: uk
Distribution: Slackware
Posts: 401

Rep: Reputation: 74
Hi, question, is this an external drive stuff gets copied to? Maybe something is not good in the enclosure or the cable is easily loosened when the cable gets touched; the tiny usb3-B connectors that go into an external drives are awful; the newer usb3-c is much tighter. One of my external drives gave up after too many write-interruptions due to such a wonky usb3-b connection....
 
Old 09-12-2017, 03:07 PM   #9
Ook
Member
 
Registered: Apr 2004
Location: Hell, Arizona
Distribution: Slackware 14.1
Posts: 535

Original Poster
Rep: Reputation: 92
Internal drives, all of them. Checked it today, and kernel log has these:

[11002.859967] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[11002.859976] ata7.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 25 pio 16392 in
opcode=0x4a 4a 01 00 00 10 00 00 00 08 00res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[11002.859977] ata7.00: status: { DRDY }
[11002.859980] ata7: hard resetting link
[11003.320968] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[11003.325000] ata7.00: configured for UDMA/100
[11003.325520] ata7: EH complete

So after all this time, I just realized that ata7 is the DVD player, not the drives involved in the copy. And there is no disk in the drive. Yet it only does this when a copy process is occurring,

Shame on me for not noticing this sooner! DOH!

But why does a copy process from one drive to another cause this to happen to the DVD player?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ubuntu: Installed to external hard drive; boot to primary hard drive gives error 22 dcorb62 Linux - General 7 09-05-2007 12:28 AM
Hard Drive Error Joes2silly Linux - Hardware 4 12-26-2006 07:11 PM
What should I do about this Hard Drive error? M$ISBS General 14 10-16-2006 12:10 AM
I/O error on new hard drive sendas4 Linux - Hardware 5 02-14-2006 01:41 AM
Hard drive failure error? (sense key Medium error) Arodef Linux - Hardware 2 11-02-2004 01:39 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 12:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration