FedoraThis forum is for the discussion of the Fedora Project.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a problem with Fedora 11 X64 and some SATA drives. I have 6 1TB Western Digital hard drives in a RAID 5 array with created with mdadm. I'm run complete hardware tests on all the drives (including full sector scans) and all come back with a clean bill of health, but if I leave the machine idle for long enough it seems a couple of the drives fall asleep and won't wake back up:
Code:
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata7.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata6.00: status: { DRDY }
ata6: hard resetting link
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata8.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata8.00: status: { DRDY }
ata8: hard resetting link
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata7.00: configured for UDMA/33
ata7: EH complete
ata6.00: configured for UDMA/33
ata6: EH complete
ata8.00: configured for UDMA/33
ata8: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:10:47:66:0c/00:00:1e:00:00/e0 tag 0 dma 8192 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata7.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata6.00: status: { DRDY }
ata6: hard resetting link
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata8.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata8.00: status: { DRDY }
ata8: hard resetting link
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata7.00: configured for UDMA/33
ata7: EH complete
ata6.00: configured for UDMA/33
ata6: EH complete
ata8.00: configured for UDMA/33
ata8: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 25/00:08:3f:8c:0c/00:00:1e:00:00/e0 tag 0 dma 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata9.00: cmd 35/00:08:3f:59:70/00:00:74:00:00/e0 tag 0 dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata9.00: status: { DRDY }
ata9: hard resetting link
ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata9.00: configured for UDMA/33
ata9: EH complete
I have two SATA controllers on board, one is an nVidia MCP51 the other an JMicron 20360 AHCI (motherboard is an ASUS P5N-E SLI) and an Adaptec 1430SA PCI Express controller. Once I get the status {DRDY} error the RAID is inaccessible until I reboot. I've done some Googling and it seems this error can be caused by anything from a bad SATA cable to a kernel/chipset problem. I've tried booting the kernel with the following options set: irqpoll, noapic and acpi=noirq. I've also tried just acpi=off, none of these options have totally prevented the problem. Although the noapic option keeps it from happening while the drives are in use.
I've tried cutting NCQ off on all the drives, no affect. My boot drive is a 74GB Raptor, so the OS is not on the array. Here's the hdparm -i output on one of the WD 1TBs, the rest are basically identical:
I'm running kernel 2.6.30.5-43.fc11.x86_64. Anyone know how to solve this problem? I really don't want to loose any data, but I keep active backups of the important stuff. Just a problem with my SATA controllers?
i came here via google while looking for some solution of the same problem you've got. seems like me (and many others too) have the same problem.
im running debian testing (amd64) with actualy kernel 2.6.31. i've read in some forums, updating to 2.6.28+ would solve the problem because of old sata_mv drivers, so i updated to newest debian kernel 2.6.30.1 but seems there need some more things to be fixed, these errors still there but occur not that often...
hardware is an asus p5q-ws board (ICH10R) and a PCI-X 8port sata controller with marvell MV88SX5081 chipset. seagate disks which, according to smart values, seem to be fully ok.
what i experienced, while using 2.6.26.2 kernel, there were these errors all the time in syslog, the whole system just stuck for about 10-20 seconds (as if it were offline) every now and then(more often in heavy workloads) but else everything worked ok ... switching to 2.6.30.1 helped much, no such "10second-lags" anymore, no syslog errors but the system stops working at some point(cant define if in workloads or in idle time)
also adding 'libata.force=noncq noapic acpi=off' to kernel in grub.cfg and disabling write-cache with 'hdparm -W0 /dev/sd?' didnt really work for me like suggested in other forums. im feeling it just suppressed the error for some longer time :P
because the system doesnt log any error in syslog when the error happens, error is printed to screen and thats it. luckily i have some old ipkvm attached and was able to catch the error to make a screenshot: (in the hope it helps someone)
because of what i experienced im thinking its some kernel thing happening here. i would love to submit some bug report but i can't trace the problem more in detail. from all posts in other forums ive read so far it's mostly happening with marvell chipset sata controllers and/or PCI-X sata controllers in general where ICH8/9/10 is on mobo..
also adding 'libata.force=noncq noapic acpi=off' to kernel in grub.cfg and disabling write-cache with 'hdparm -W0 /dev/sd?' didnt really work for me like suggested in other forums. im feeling it just suppressed the error for some longer time :P
i had a look at it more closely and noticed if using all from above at once and not trying option by option, its working now stable for about 3days.
my last check was running kernel 2.6.31 with libata.force=noncq noapic acpi=off and some hours after booting, when i thought its running well, i turned on write cache and the errors occured again after an hour or so.
so rebooted again with the kernel options and turned off write cache and since then its running like a charm
i had a look at it more closely and noticed if using all from above at once and not trying option by option, its working now stable for about 3days.
my last check was running kernel 2.6.31 with libata.force=noncq noapic acpi=off and some hours after booting, when i thought its running well, i turned on write cache and the errors occured again after an hour or so.
so rebooted again with the kernel options and turned off write cache and since then its running like a charm
regards, chris
Thanks, I'll give it a shot! When you cut off write caching with hdparm does it survive a reboot or is that something I'm going to have to shove into rc.local?
Yeah from what I've read other places it seems to be a problem with the newer WD desktop drives and newer Linux kernels (2.6.24+). I hope the kernel devs get it fixed soon!
Thanks, I'll give it a shot! When you cut off write caching with hdparm does it survive a reboot or is that something I'm going to have to shove into rc.local?
because my mentioned system is used as server i dont want to reboot it where it runs stable now but afaik hdparm doesnt save settings so one has to add it to the startup scripts to be set after each reboot...
Quote:
Originally Posted by NX-01
Yeah from what I've read other places it seems to be a problem with the newer WD desktop drives and newer Linux kernels (2.6.24+). I hope the kernel devs get it fixed soon!
oh, i forgot mentioning disks. strange, im using seagate ST31500341AS disks with the newer(/working) firmware revision CC1H so i think its rather some kernel error...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.