LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices

Reply
 
Search this Thread
Old 12-07-2006, 10:06 AM   #1
IanGrant
LQ Newbie
 
Registered: Dec 2006
Posts: 3

Rep: Reputation: 0
SATA timeout problems in 2.6.19


Hi,

I'm having some problems with my hard disks, I'd be grateful of any help.
This has been going on for a while now -- I was using 2.6.15.6, with 3 250GB Maxtor SATA HDs on an Intel ICH6 controller (ata_piix), using software RAID.
It used to be okay, but then my computer crashed a couple of times (this was a couple of months ago, sorta hazy...), and when it came back, there was no /home or any other partition that was on the RAID array (/ is on a separate SCSI disk).
I eventually managed to reconstruct the array, but began getting kernel messages like this:

Code:
Dec  5 23:02:49 violator kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Dec  5 23:02:49 violator kernel: ata2.00: tag 0 cmd 0xb0 Emask 0x4 stat 0x40 err 0x0 (timeout)
Dec  5 23:02:49 violator kernel: ata2: soft resetting port
Dec  5 23:02:49 violator kernel: ata2: softreset failed (port busy but CLO unavailable)
Dec  5 23:02:49 violator kernel: ata2: softreset failed, retrying in 5 secs
Dec  5 23:02:54 violator kernel: ata2: hard resetting port
Dec  5 23:03:01 violator kernel: ata2: port is slow to respond, please be patient (Status 0x80)
Dec  5 23:03:24 violator kernel: ata2: port failed to respond (30 secs, Status 0x80)
Dec  5 23:03:24 violator kernel: ata2: COMRESET failed (device not ready)
Dec  5 23:03:24 violator kernel: ata2: hardreset failed, retrying in 5 secs
Dec  5 23:03:29 violator kernel: ata2: hard resetting port
Dec  5 23:03:30 violator kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Dec  5 23:03:30 violator kernel: ata2.00: configured for UDMA/133
Dec  5 23:03:30 violator kernel: ata2: EH complete
Dec  5 23:03:30 violator kernel: SCSI device sdb: 781422768 512-byte hdwr sectors (400088 MB)
Dec  5 23:03:30 violator kernel: sdb: Write Protect is off
Dec  5 23:03:30 violator kernel: sdb: Mode Sense: 00 3a 00 00
Dec  5 23:03:30 violator kernel: SCSI device sdb: drive cache: write back
Whilst this is happening, disk access is delayed (like you can't ls dirs on that partition when in shell, for instance).
This repeats for all three SATA disks, with the 'configured for XXX' line changing through UDMA/133, UDMA/100, UDMA/66, UDMA/44, UDMA/33, UDMA/25, UDMA/16, PIO4, PIO3, PIO2, PIO1, PIO0.
After PIO0 you get messages like this:

Code:
Dec  6 01:21:46 violator kernel: ata1.00: speed down requested but no transfer mode left
And it repeats configuring for PIO0, with worsening disk performance (e.g. hdparm -t 3MB/s), and the messages like this appear:

Code:
Dec  6 02:08:18 violator kernel: raid5: Disk failure on dm-2, disabling device. Operation continuing on 2 devices
Dec  6 02:08:18 violator kernel: raid5: Disk failure on dm-1, disabling device. Operation continuing on 1 devices
Dec  6 02:08:18 violator kernel: raid5: Disk failure on dm-0, disabling device. Operation continuing on 0 devices
Dec  6 02:08:18 violator kernel: Buffer I/O error on device dm-4, logical block 3832
Dec  6 02:08:18 violator kernel: lost page write due to I/O error on dm-4
Dec  6 02:08:18 violator kernel: Buffer I/O error on device dm-4, logical block 3833
Dec  6 02:08:18 violator kernel: lost page write due to I/O error on dm-4
Dec  6 02:08:18 violator kernel: Buffer I/O error on device dm-4, logical block 3834
Dec  6 02:08:18 violator kernel: lost page write due to I/O error on dm-4
Dec  6 04:02:11 violator kernel: ReiserFS: dm-6: warning: vs-13050: reiserfs_update_sd: i/o failure occurred trying to update [1 2 0x0 SD] stat data
Dec  6 04:02:13 violator kernel: Buffer I/O error on device dm-6, logical block 7667
Dec  6 04:02:13 violator kernel: lost page write due to I/O error on dm-6
Dec  6 04:02:13 violator kernel: Buffer I/O error on device dm-6, logical block 7668
Dec  6 04:02:13 violator kernel: lost page write due to I/O error on dm-6
Dec  6 04:02:13 violator kernel: REISERFS: abort (device dm-6): Journal write error in flush_commit_list
Dec  6 04:02:13 violator kernel: REISERFS: Aborting journal for filesystem on dm-6
Dec  6 04:02:14 violator kernel: I/O error in filesystem ("dm-8") meta-data dev dm-8 block 0x1226038 ("xfs_trans_read_buf") error 5 buf count 8192
Dec  6 04:02:14 violator kernel: I/O error in filesystem ("dm-8") meta-data dev dm-8 block 0x2690810       ("xfs_trans_read_buf") error 5 buf count 8192
Dec  6 04:02:14 violator kernel: I/O error in filesystem ("dm-8") meta-data dev dm-8 block 0x385fd08       ("xfs_trans_read_buf") error 5 buf count 8192
Dec  6 04:02:14 violator kernel: I/O error in filesystem ("dm-8") meta-data dev dm-8 block 0x5d905e8       ("xfs_trans_read_buf") error 5 buf count 8192
Dec  6 04:02:14 violator kernel: I/O error in filesystem ("dm-8") meta-data dev dm-8 block 0x6de31d8       ("xfs_trans_read_buf") error 5 buf count 4096
Dec  6 04:02:14 violator kernel: ReiserFS: dm-4: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 71721 0x0 SD]
By that time, the RAID array has failed, obviously (although the data is okay if I reboot and can get it to reconstruct).

Since then, I have replaced the SATA cables, and the disks. I also tried replacing the controller (for a Silicon Image 3124 PCI-X card (sata_sil24 driver)), but this gave slightly different error messages (ata HSM violation), and although it seemed more stable (didn't crash), it turns out the data was corrupted when writing to it (I know this my virtue of the fact that FLAC files I copied across no longer decompressed, and many files had different MD5 sums to their counterparts on the older array (I had them both in, side by side, on ICH6 and sata_sil24)).
I tried upgrading to 2.6.19 (through various 2.6.17-18...), and using ahci instead of ata_piix for the ICH6 controller, but I still get the error messages above (in fact, the above error messages are 2.6.19/ahci, so they're indicative of my problem as it is now, disregarding using the sata_sil24 controller).

Here's the final twist: the disks seem pretty stable using sysresccd, which is 2.6.16.10 (and ahci), and when booted into an older kernel (2.6.9, using ata_piix). Performance still suffers, as a lag develops when logging in, but I don't see those error messages in the kernel (I think they were introduced with 2.6.18), and it more or less stays up.

I would really like to use 2.6.19, but this problem is really vexing me, especially as I don't really know what to do anymore -- I think I've ruled out any hardware problems, but basically I'm flummoxed.
I've tried searching, there is some stuff on LKML with similar error messages, but none quite like my problem.
If anyone has any suggestions, I'd be most grateful.
 
Old 02-22-2007, 08:38 PM   #2
pgf111000
LQ Newbie
 
Registered: Feb 2007
Posts: 1

Rep: Reputation: 0
I second Ian's question....

I am experiencing a very similar problem; maybe it's arcmsr, maybe not.... Although because you're hw raid is intel; it suggest that areca may not be the cause. If anyone has any suggestions....

Last edited by pgf111000; 02-22-2007 at 08:39 PM.
 
Old 03-01-2008, 11:21 AM   #3
krizzz
Member
 
Registered: Oct 2004
Location: NY
Distribution: Slackware
Posts: 198

Rep: Reputation: 30
Same problem here. I have Sony VGN-S580 laptop. Bought it new around 2 years ago and since then I haven't been able to install ANY linux distro on it. F.... SATA problem. I have no idea why, but kernel developers and libata module developers just don't do anything about it. There seems to be quite a lot of people experiencing this with different sata controllers, mostly on laptops. This thing has been driving me crazy. I tried all solutions proposed on different forums - disabling acpi, passing some other parameters to the kernel - nothing worked for me. Somehow I managed to install Fedora Core 8 on it - installation went smoothly but now the system has the same problem. Very surprising - the kernel used during the installation is exactly the same as the one installed... I just ran updatedb on it and it didn't hang... However it froze couple of times already. I give up.
 
  


Reply

Tags
corrupted, disk, hardware, impossible, raid


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LCP timeout - pppd problems leobert Linux - Hardware 3 01-28-2005 07:54 AM
VPN / IPsec problems - Phase2, timeout tvojvodi Linux - Networking 0 03-04-2004 07:34 PM
ADSL problems -- timeout for PADO packets andresurzagasti Linux - Networking 0 03-03-2004 03:21 PM
USB Timeout problems? (vuescan) [Help please] lynxgogo Linux - Software 0 07-24-2003 04:36 AM
lan problems (ping timeout) Bungholio Linux - Networking 11 07-07-2003 01:29 PM


All times are GMT -5. The time now is 09:38 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration