LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Fedora
User Name
Password
Fedora This forum is for the discussion of the Fedora Project.

Notices

Reply
 
Search this Thread
Old 09-20-2009, 11:16 PM   #1
NX-01
LQ Newbie
 
Registered: May 2005
Location: Boone, NC
Distribution: Slackware, Fedora, Ubuntu
Posts: 22

Rep: Reputation: 15
SATA status {DRDY}


I have a problem with Fedora 11 X64 and some SATA drives. I have 6 1TB Western Digital hard drives in a RAID 5 array with created with mdadm. I'm run complete hardware tests on all the drives (including full sector scans) and all come back with a clean bill of health, but if I leave the machine idle for long enough it seems a couple of the drives fall asleep and won't wake back up:

Code:
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata7.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata6.00: status: { DRDY }
ata6: hard resetting link
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata8.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata8.00: status: { DRDY }
ata8: hard resetting link
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata7.00: configured for UDMA/33
ata7: EH complete
ata6.00: configured for UDMA/33
ata6: EH complete
ata8.00: configured for UDMA/33
ata8: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:10:47:66:0c/00:00:1e:00:00/e0 tag 0 dma 8192 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata7.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata6.00: status: { DRDY }
ata6: hard resetting link
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata8.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata8.00: status: { DRDY }
ata8: hard resetting link
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata7.00: configured for UDMA/33
ata7: EH complete
ata6.00: configured for UDMA/33
ata6: EH complete
ata8.00: configured for UDMA/33
ata8: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
I have two SATA controllers on board, one is an nVidia MCP51 the other an JMicron 20360 AHCI (motherboard is an ASUS P5N-E SLI) and an Adaptec 1430SA PCI Express controller. Once I get the status {DRDY} error the RAID is inaccessible until I reboot. I've done some Googling and it seems this error can be caused by anything from a bad SATA cable to a kernel/chipset problem. I've tried booting the kernel with the following options set: irqpoll, noapic and acpi=noirq. I've also tried just acpi=off, none of these options have totally prevented the problem. Although the noapic option keeps it from happening while the drives are in use.

I've tried cutting NCQ off on all the drives, no affect. My boot drive is a 74GB Raptor, so the OS is not on the array. Here's the hdparm -i output on one of the WD 1TBs, the rest are basically identical:

Code:
/dev/sdc:

 Model=WDC, FwRev=01.00A01
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=1
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode
I'm running kernel 2.6.30.5-43.fc11.x86_64. Anyone know how to solve this problem? I really don't want to loose any data, but I keep active backups of the important stuff. Just a problem with my SATA controllers?

Thanks!
 
Old 09-22-2009, 07:36 PM   #2
LoeschME
LQ Newbie
 
Registered: Mar 2004
Location: Vienna, Austria
Distribution: debian/testing
Posts: 6

Rep: Reputation: 1
hey

i came here via google while looking for some solution of the same problem you've got. seems like me (and many others too) have the same problem.

im running debian testing (amd64) with actualy kernel 2.6.31. i've read in some forums, updating to 2.6.28+ would solve the problem because of old sata_mv drivers, so i updated to newest debian kernel 2.6.30.1 but seems there need some more things to be fixed, these errors still there but occur not that often...

hardware is an asus p5q-ws board (ICH10R) and a PCI-X 8port sata controller with marvell MV88SX5081 chipset. seagate disks which, according to smart values, seem to be fully ok.

what i experienced, while using 2.6.26.2 kernel, there were these errors all the time in syslog, the whole system just stuck for about 10-20 seconds (as if it were offline) every now and then(more often in heavy workloads) but else everything worked ok ... switching to 2.6.30.1 helped much, no such "10second-lags" anymore, no syslog errors but the system stops working at some point(cant define if in workloads or in idle time)

also adding 'libata.force=noncq noapic acpi=off' to kernel in grub.cfg and disabling write-cache with 'hdparm -W0 /dev/sd?' didnt really work for me like suggested in other forums. im feeling it just suppressed the error for some longer time :P

because the system doesnt log any error in syslog when the error happens, error is printed to screen and thats it. luckily i have some old ipkvm attached and was able to catch the error to make a screenshot: (in the hope it helps someone)

http://666kb.com/i/bcl6itmtefbraxqr8.jpg

because of what i experienced im thinking its some kernel thing happening here. i would love to submit some bug report but i can't trace the problem more in detail. from all posts in other forums ive read so far it's mostly happening with marvell chipset sata controllers and/or PCI-X sata controllers in general where ICH8/9/10 is on mobo..

regards, chris

Last edited by LoeschME; 09-22-2009 at 07:56 PM.
 
Old 09-26-2009, 01:48 PM   #3
LoeschME
LQ Newbie
 
Registered: Mar 2004
Location: Vienna, Austria
Distribution: debian/testing
Posts: 6

Rep: Reputation: 1
Quote:
Originally Posted by LoeschME View Post
also adding 'libata.force=noncq noapic acpi=off' to kernel in grub.cfg and disabling write-cache with 'hdparm -W0 /dev/sd?' didnt really work for me like suggested in other forums. im feeling it just suppressed the error for some longer time :P
i had a look at it more closely and noticed if using all from above at once and not trying option by option, its working now stable for about 3days.
my last check was running kernel 2.6.31 with libata.force=noncq noapic acpi=off and some hours after booting, when i thought its running well, i turned on write cache and the errors occured again after an hour or so.
so rebooted again with the kernel options and turned off write cache and since then its running like a charm

regards, chris
 
Old 09-27-2009, 06:41 PM   #4
NX-01
LQ Newbie
 
Registered: May 2005
Location: Boone, NC
Distribution: Slackware, Fedora, Ubuntu
Posts: 22

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by LoeschME View Post
i had a look at it more closely and noticed if using all from above at once and not trying option by option, its working now stable for about 3days.
my last check was running kernel 2.6.31 with libata.force=noncq noapic acpi=off and some hours after booting, when i thought its running well, i turned on write cache and the errors occured again after an hour or so.
so rebooted again with the kernel options and turned off write cache and since then its running like a charm

regards, chris
Thanks, I'll give it a shot! When you cut off write caching with hdparm does it survive a reboot or is that something I'm going to have to shove into rc.local?

Yeah from what I've read other places it seems to be a problem with the newer WD desktop drives and newer Linux kernels (2.6.24+). I hope the kernel devs get it fixed soon!
 
Old 09-28-2009, 06:55 AM   #5
LoeschME
LQ Newbie
 
Registered: Mar 2004
Location: Vienna, Austria
Distribution: debian/testing
Posts: 6

Rep: Reputation: 1
Quote:
Originally Posted by NX-01 View Post
Thanks, I'll give it a shot! When you cut off write caching with hdparm does it survive a reboot or is that something I'm going to have to shove into rc.local?
because my mentioned system is used as server i dont want to reboot it where it runs stable now but afaik hdparm doesnt save settings so one has to add it to the startup scripts to be set after each reboot...

Quote:
Originally Posted by NX-01 View Post
Yeah from what I've read other places it seems to be a problem with the newer WD desktop drives and newer Linux kernels (2.6.24+). I hope the kernel devs get it fixed soon!
oh, i forgot mentioning disks. strange, im using seagate ST31500341AS disks with the newer(/working) firmware revision CC1H so i think its rather some kernel error...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DRDY error when fitting new motherboard (ASROCK N68-s) Fred Caro Linux - Newbie 5 07-26-2009 03:37 AM
RAID status monitoring tool for ICH5 & Marvell SATA RAIDs? Dmitry Mikhailov Linux - Hardware 0 12-10-2006 08:36 AM
Installing RedHat 8.0 onto Dell PowerEdge SC1425 - hdc: status error: status = 0x58 fishsponge Linux - General 0 07-11-2006 09:02 AM
Serial ATA (SATA) Linux status report zero0w Linux - Hardware 4 04-03-2006 09:57 PM
Status of SATA support in Linux Foxy Linux - Hardware 28 08-21-2004 12:00 AM


All times are GMT -5. The time now is 02:59 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration