LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (http://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   Ubuntu 10.04 - JMicron JMB363 SATA controller - kernel freeze (http://www.linuxquestions.org/questions/linux-hardware-18/ubuntu-10-04-jmicron-jmb363-sata-controller-kernel-freeze-872264/)

^andrea^ 03-31-2011 07:53 PM

Ubuntu 10.04 - JMicron JMB363 SATA controller - kernel freeze
 
Hi everybody,

I'm struggling with a quite annoying problem.

Let me explain my configuration quickly first.

OS: Ubuntu 10.04
kernel: 2.6.32-30-generic
CPU: Intel(R) Pentium(R) D CPU 3.20GHz
MB: Asus P5VD2-X
GPU: GeForce 7300 (PCI-E 16x)

2 SATA ports 1.5Gbps work perfectly. Never had a single issue.

1 SATA 3.0Gbps controlled by a JMicron JMB363 SATA controller that freezes the computer under heavy load (copying/rsyncing GBs of files).
If I leave the computer idle or do basic tasks instead, it works fine.

Sometimes just before crashing/freezing leaves some errors in the messages and kern.log logs.

Messages like:
ata8.00: exception Emask 0x33 SAct 0xf SErr 0x0 action 0xe frozen
ata8.00: irq_stat 0xffffffff, unknown FIS 00000000 00000000 00000000 00000000, host bus
ata8.00: failed command: READ FPDMA QUEUED
ata8.00: cmd 60/60:00:49:21:3d/00:00:01:00:00/40 tag 0 ncq 49152 in
res 40/00:04:49:21:3d/00:00:01:00:00/40 Emask 0x32 (host bus error)
ata8.00: status: { DRDY }
ata8.00: failed command: READ FPDMA QUEUED
ata8.00: cmd 60/80:08:81:21:3d/00:00:01:00:00/40 tag 1 ncq 65536 in
res 40/00:04:49:21:3d/00:00:01:00:00/40 Emask 0x32 (host bus error)
ata8.00: status: { DRDY }
ata8.00: failed command: READ FPDMA QUEUED
ata8.00: cmd 60/80:10:81:20:3d/00:00:01:00:00/40 tag 2 ncq 65536 in
res 40/00:04:49:21:3d/00:00:01:00:00/40 Emask 0x32 (host bus error)
ata8.00: status: { DRDY }
ata8.00: failed command: READ FPDMA QUEUED
ata8.00: cmd 60/38:18:11:21:3d/00:00:01:00:00/40 tag 3 ncq 28672 in
res 40/00:04:49:21:3d/00:00:01:00:00/40 Emask 0x32 (host bus error)
ata8.00: status: { DRDY }
ata8: hard resetting link
ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata8.00: configured for UDMA/133
ata8: EH complete


I've been trying to sort this problem for quite a while now. At the beginning I didn't know the problem was the Jmicron SATA port/controller. I switched disks/cables a few times. The problem was always on the same port. That's why I'm sure is this Jmicron the issue.
I tried changing the settings in the BIOS putting the Jmicron controller in ahci mode (rather than IDE mode) but nothing changed.


I have to say something more about my configuration.
The system is running on Linux Software Raid (mdadm).
Raid1 for the boot partition and Raid5 for the rest.
At the moment the RAID array is in degradeted mode since connecting the 3rd hard drive (the one on the Jmicron SATA port) would limit the things I can do, freezing the system.

Last, very important I guess, piece of information worth mentioning.
When I've installed this Ubuntu (10.04) I've done it on a single disk (NO RAID) on one of my "normal" (1.5Gbps) SATA port (SO NO JMICRON EITHER).
When I've decided to put a RAID system in place I've copied the files manually and it worked fine. I have done it before.

Now the problem here might be that, having installed the system with no Jmicron controller in use the Ubuntu installation didn't load the kernel modules needed.
Now, after having moved the OS to work on other hard drives (one ow which on a Jmicron controller) those modules are still not loaded.
Or maybe some others are loaded and are causing conflicts..

I'm not sure all of this makes sense. That's why I'm here hoping someone could help me to shed some light on it. :-)

Here are the modules loaded:
Module Size Used by
btrfs 462090 0
zlib_deflate 19568 1 btrfs
crc32c 2519 1
libcrc32c 875 1 btrfs
ufs 72774 0
qnx4 6484 0
hfsplus 70800 0
hfs 40754 0
minix 25197 0
ntfs 94791 0
vfat 8933 0
msdos 6392 0
fat 47767 2 vfat,msdos
jfs 172461 0
xfs 514940 0
exportfs 3437 1 xfs
reiserfs 225481 0
usb_storage 39841 1
binfmt_misc 6587 1
snd_hda_codec_realtek 203408 1
snd_hda_intel 22037 2
snd_hda_codec 74201 2 snd_hda_codec_realtek,snd_hda_intel
snd_hwdep 5412 1 snd_hda_codec
snd_pcm_oss 35308 0
snd_mixer_oss 13746 1 snd_pcm_oss
snd_pcm 70694 3 snd_hda_intel,snd_hda_codec,snd_pcm_oss
snd_seq_dummy 1338 0
snd_seq_oss 26722 0
snd_seq_midi 4557 0
snd_rawmidi 19056 1 snd_seq_midi
snd_seq_midi_event 6003 2 snd_seq_oss,snd_seq_midi
ppdev 5259 0
snd_seq 47263 6 snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_seq_midi_event
snd_timer 19098 2 snd_pcm,snd_seq
snd_seq_device 5700 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_rawmidi,snd_seq
snd 54180 16 snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_se q_oss,snd_rawmidi,snd_seq,snd_timer,snd_seq_device
hwmon_vid 2298 0
fbcon 35102 71
tileblit 2031 1 fbcon
font 7557 1 fbcon
bitblit 4707 1 fbcon
softcursor 1189 1 bitblit
joydev 8740 0
parport_pc 25962 1
asus_atk0110 9017 0
soundcore 6620 1 snd
snd_page_alloc 7076 2 snd_hda_intel,snd_pcm
i2c_viapro 5573 0
psmouse 63245 0
serio_raw 3978 0
nvidia 9961216 28
vga16fb 11385 1
lp 7028 0
vgastate 8961 1 vga16fb
shpchp 28835 0
parport 32635 3 ppdev,parport_pc,lp
via_agp 5310 1
agpgart 31724 2 nvidia,via_agp
raid10 20629 0
raid456 52379 1
async_raid6_recov 4871 1 raid456
async_pq 3026 2 raid456,async_raid6_recov
raid6_pq 80029 2 async_raid6_recov,async_pq
async_xor 2382 3 raid456,async_raid6_recov,async_pq
xor 15028 1 async_xor
async_memcpy 1065 2 raid456,async_raid6_recov
async_tx 1996 5 raid456,async_raid6_recov,async_pq,async_xor,async_memcpy
raid1 20101 1
raid0 6804 0
hid_sunplus 1259 0
r8169 34108 0
usbhid 36110 0
hid 67096 2 hid_sunplus,usbhid
floppy 53016 0
mii 4381 1 r8169
multipath 6009 0
pata_jmicron 1843 0
ahci 32200 3
sata_via 7009 8
pata_via 7272 0
linear 3874 0

Does anyone see some modules that should be loaded to make the jmicron controller to work properly?
Some modules which should not be loaded?

If you need me to run any other command please let me know.

I really hope someone can give me a good hint.
Cheers,
Andrea

Sjonnie48 04-02-2011 06:37 AM

I have a Gigabyte mainboard with exactly the same controller, running with ubuntu 10.04 but never had difficulties with it.
But I got the same error messages regarding a harddisk on a different controller, and it seems to me that that harddisk is defective.
Please post the output after a restart from dmesg | grep ata8

^andrea^ 04-02-2011 09:35 AM

Thanks Sjonnie48 for your help.

First of all, here is the command "dmesg | grep ata8" after having restarted:
[ 0.940333] ata8: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15

I also run this other for a general overview of the disks in case is needed:
"dmesg | grep ata[0-9]"
[ 0.890884] ata1: PATA max UDMA/100 cmd 0xcc00 ctl 0xc800 bmdma 0xbc00 irq 29
[ 0.890888] ata2: PATA max UDMA/100 cmd 0xc400 ctl 0xc000 bmdma 0xbc08 irq 29
[ 0.914381] ata3: SATA max UDMA/133 cmd 0xfc00 ctl 0xf800 bmdma 0xec00 irq 21
[ 0.914385] ata4: SATA max UDMA/133 cmd 0xf400 ctl 0xf000 bmdma 0xec08 irq 21
[ 0.936855] ata5: SATA max UDMA/133 abar m8192@0xdfefe000 port 0xdfefe100 irq 28
[ 0.936861] ata6: SATA max UDMA/133 abar m8192@0xdfefe000 port 0xdfefe180 irq 28
[ 0.940329] ata7: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
[ 0.940333] ata8: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
[ 1.104460] ata7.01: ATAPI: HL-DT-STDVD-RAM GH22NP20, 1.01, max UDMA/66
[ 1.120370] ata7.01: configured for UDMA/66
[ 1.128019] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.256029] ata5: SATA link down (SStatus 0 SControl 300)
[ 1.292372] ata3.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
[ 1.292377] ata3.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 1.300390] ata3.00: configured for UDMA/133
[ 1.420031] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.426182] ata6.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
[ 1.426189] ata6.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1.432430] ata6.00: configured for UDMA/133
[ 1.504026] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.668364] ata4.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
[ 1.668370] ata4.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 1.676394] ata4.00: configured for UDMA/133


There are some "interesting" updates.
My three 2TB hard drives are SAMSUNG HD204UI and I just found out that there is a firmware patch for them:
http://www.samsung.com/global/busine...bbs_msg_id=386
I also use SMART to control their health and that doesn't seem to help the firmware problem:
http://sourceforge.net/apps/trac/sma...gF4EGBadBlocks

I haven't applied the path yet but I will soon-ish.
Anyway I'm not sure how related it is, so in the meantime, I'm doing some other tests to better understand when this issue happens.

I tried writing/reading data to/from the disk (using the Jmicron controller) and I've been surprised to notice that the computer seems to experience the problem when is READING data ONLY.

I copied 1TB+ to the disk (write action) and everything went fine, I read the data (trying to copy it to another hard disk) and once it crashed after about 10/20GBs without logging any error, whilst the second time it started logging errors in the kern.log log so I have interrupted it immediately.


Here is the /var/log/kern.log when the error happened:
Apr 2 14:11:03 tux kernel: [230635.996684] ata8: illegal qc_active transition (00000003->ffffffff)
Apr 2 14:11:03 tux kernel: [230635.996735] ata8.00: exception Emask 0x2 SAct 0x3 SErr 0x0 action 0x6 frozen
Apr 2 14:11:03 tux kernel: [230635.996742] ata8.00: failed command: READ FPDMA QUEUED
Apr 2 14:11:03 tux kernel: [230635.996749] ata8.00: cmd 60/00:00:80:68:b5/01:00:02:00:00/40 tag 0 ncq 131072 in
Apr 2 14:11:03 tux kernel: [230635.996751] res 40/00:0c:80:69:b5/00:00:02:00:00/40 Emask 0x2 (HSM violation)
Apr 2 14:11:03 tux kernel: [230635.996754] ata8.00: status: { DRDY }
Apr 2 14:11:03 tux kernel: [230635.996758] ata8.00: failed command: READ FPDMA QUEUED
Apr 2 14:11:03 tux kernel: [230635.996764] ata8.00: cmd 60/00:08:80:69:b5/01:00:02:00:00/40 tag 1 ncq 131072 in
Apr 2 14:11:03 tux kernel: [230635.996766] res 40/00:0c:80:69:b5/00:00:02:00:00/40 Emask 0x2 (HSM violation)
Apr 2 14:11:03 tux kernel: [230635.996769] ata8.00: status: { DRDY }
Apr 2 14:11:03 tux kernel: [230635.996776] ata8: hard resetting link
Apr 2 14:11:03 tux kernel: [230636.480041] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 2 14:11:03 tux kernel: [230636.492474] ata8.00: configured for UDMA/133
Apr 2 14:11:03 tux kernel: [230636.492493] ata8: EH complete

Does anyone think the firmware patch would solve this problem?

Cheers,
Andrea

Sjonnie48 04-02-2011 10:39 AM

Quote:

I copied 1TB+ to the disk (write action) and everything went fine, I read the data (trying to copy it to another hard disk) and once it crashed after about 10/20GBs without logging any error, whilst the second time it started logging errors in the kern.log log so I have interrupted it immediately.
Did your system freeze again? I ask this because I'm not certain about the cause of your system freezing; it's something usually caused by a memory failure. You won't find that in the logging.
Also I did not see any alarming lines in your latest dmesg.

^andrea^ 04-02-2011 03:36 PM

yes, if I READ (NOT IF I WRITE) data from that disk (connected trought the JMICRON controller) the system either freezes or logs the errors shown above and freezes.
To me it sounds like the same problem, also because when I use the system without touching that SATA port it's all super stable.

I tried to do some "manual" stress tests and even when the load average got to 6 with CPU idle time to 0% the system was still fine and responding.

Anyway, if it was a memory failure as you say, how do I debug it?

Thanks again for having a look into this problem.

Cheers,
Andrea

Sjonnie48 04-02-2011 09:41 PM

For testing the memory: http://www.memtest.org/

H_TeXMeX_H 04-03-2011 04:46 AM

I've had a gigabyte board with similar problems and have had a really hard time with it. I eventually just disabled the JMicron controller and used the other one, which luckily had 4 ports, plenty for what I need.

There may be a way around this, so can you post the output of:

Code:

cat /proc/interrupts
lspci -k


^andrea^ 04-03-2011 05:03 AM

Thanks Sjonnie48, I'll give memtest a try. I'm sure one more test does not hurt.. :-)

Hi H_TeXMeX_H,
unfortunately my motherboard has only 2 SATA (and two IDEs) plus another from the JMIcron controller.

I know this controller supports SATA multiplier so, in the future, (if I can make it work properly) I might get one of those and extend my computer life of another few years... :-)

Anyway, back to my problem.

Here the output of the commands you asked.
"cat /proc/interrupts":
CPU0 CPU1
0: 281 0 IO-APIC-edge timer
1: 4 0 IO-APIC-edge i8042
4: 2 0 IO-APIC-edge
6: 5 0 IO-APIC-edge floppy
7: 0 0 IO-APIC-edge parport0
8: 1 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 6 0 IO-APIC-edge i8042
14: 1068064 0 IO-APIC-edge pata_via
15: 0 0 IO-APIC-edge pata_via
17: 5722 1491 IO-APIC-fasteoi HDA Intel
20: 367507 0 IO-APIC-fasteoi uhci_hcd:usb2, eth0
21: 11863 4866062 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb4, sata_via
22: 0 0 IO-APIC-fasteoi uhci_hcd:usb3
23: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
24: 286580 674216 IO-APIC-fasteoi nvidia
28: 6629 0 IO-APIC-fasteoi ahci
29: 0 0 IO-APIC-fasteoi pata_jmicron
NMI: 0 0 Non-maskable interrupts
LOC: 4128771 3340953 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
PND: 0 0 Performance pending work
RES: 65103 72309 Rescheduling interrupts
CAL: 1209662 20174 Function call interrupts
TLB: 24906 24494 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 234 234 Machine check polls
ERR: 1
MIS: 0

"lspci -k":
00:00.0 Host bridge: VIA Technologies, Inc. P4M890 Host Bridge
Kernel driver in use: agpgart-via
Kernel modules: via-agp
00:00.1 Host bridge: VIA Technologies, Inc. P4M890 Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. P4M890 Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. P4M890 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. P4M890 Host Bridge
00:00.5 PIC: VIA Technologies, Inc. P4M890 I/O APIC Interrupt Controller
00:00.6 Host bridge: VIA Technologies, Inc. P4M890 Security Device
00:00.7 Host bridge: VIA Technologies, Inc. P4M890 Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237/VX700 PCI Bridge
Kernel modules: shpchp
00:02.0 PCI bridge: VIA Technologies, Inc. P4M890 PCI to PCI Bridge Controller
Kernel driver in use: pcieport
Kernel modules: shpchp
00:03.0 PCI bridge: VIA Technologies, Inc. P4M890 PCI to PCI Bridge Controller
Kernel driver in use: pcieport
Kernel modules: shpchp
00:0f.0 IDE interface: VIA Technologies, Inc. Device 5337 (rev 80)
Kernel driver in use: sata_via
Kernel modules: sata_via
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 07)
Kernel driver in use: pata_via
Kernel modules: pata_via
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev a0)
Kernel driver in use: uhci_hcd
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev a0)
Kernel driver in use: uhci_hcd
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev a0)
Kernel driver in use: uhci_hcd
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev a0)
Kernel driver in use: uhci_hcd
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
Kernel driver in use: ehci_hcd
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237A PCI to ISA Bridge
Kernel modules: i2c-viapro
00:11.7 Host bridge: VIA Technologies, Inc. VT8251 Ultra VLINK Controller
00:13.0 Host bridge: VIA Technologies, Inc. VT8237A Host Bridge
00:13.1 PCI bridge: VIA Technologies, Inc. VT8237A PCI to PCI Bridge
02:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 LE] (rev a1)
Kernel driver in use: nvidia
Kernel modules: nvidia-current, nvidiafb, nouveau
03:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
Kernel driver in use: ahci
Kernel modules: ahci
03:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
Kernel driver in use: pata_jmicron
Kernel modules: pata_jmicron
04:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
Kernel driver in use: r8169
Kernel modules: r8169
80:01.0 Audio device: VIA Technologies, Inc. VT1708/A [Azalia HDAC] (VIA High Definition Audio Controller) (rev 10)
Kernel driver in use: HDA Intel
Kernel modules: snd-hda-intel

Thanks once again everyone for having a look.

Cheers,
Andrea

H_TeXMeX_H 04-03-2011 05:16 AM

One thing I just remembered, look on the HDD plugged into the JMicron SATA, and use the jumpers to limit the speed to 1.5 Gbps. This may solve the problem.

If it doesn't, try putting the VIA controller in AHCI mode.

Do you have any IDE drives plugged in ? It seems like you do.

^andrea^ 04-03-2011 05:46 AM

Unfortunately I haven't found how to set the SAMSUNG HD204UI to 1.5Gbps with jumpers.
I read somewhere that it should be done via firmware... :-/ I'll have to read a bit more about it...
I'd probably try the firmware patch first...

There is no option in the BIOS to set the VIA controller to AHCI mode.
The only options related are:
- SATA controller: Enabled (options: Disabled, Enabled)
- SATA controller mode: IDE (options: IDE, RAID)

and another for the JMicron controller:
- JMicron RAID controller: IDE (options: Disabled, IDE, RAID, AHCI)
I tried setting the JMicron controller to AHCI but nothing changed. Same issue...

No, I don't have any IDE disks plugged in.
There are three 2TB SAMSUNG connected to SATA1, SATA2 and SATA_RAID1 (JMicron) and one 1TB external connected via USB.
Maybe you have seen the USB one...

Cheers,
Andrea

H_TeXMeX_H 04-03-2011 06:56 AM

That's strange because there's 'sata_via' for the SATA and 'pata_via' for the IDE. I don't see why pata_via is being used here, maybe it should be blacklisted.

^andrea^ 04-04-2011 11:20 AM

I think pata_via might be used for the CD/DVD writer since it is connected to an IDE port. I forgot to mention that... :-/

H_TeXMeX_H 04-04-2011 12:33 PM

Ah, ok, then just leave it. Unfortunately, I don't know of anything else to do. Just try to put it in 1.5 Gbps mode.

^andrea^ 04-05-2011 04:52 PM

Thanks H_TeXMeX_H anyway.

This evening I set the disk on the JMicron controller to SATA 150. (Samsung has a nice tool to do that).
Unfortunately it didn't help. Same kind of errors:

Apr 5 23:23:38 tux kernel: [ 1605.853722] ata8.00: exception Emask 0x30 SAct 0x2 SErr 0x0 action 0xe frozen
Apr 5 23:23:38 tux kernel: [ 1605.853731] ata8.00: irq_stat 0x2b42866d, host bus error, connection status changed
Apr 5 23:23:38 tux kernel: [ 1605.853739] ata8.00: failed command: READ FPDMA QUEUED
Apr 5 23:23:38 tux kernel: [ 1605.853748] ata8.00: cmd 60/00:08:80:21:90/01:00:21:00:00/40 tag 1 ncq 131072 in
Apr 5 23:23:38 tux kernel: [ 1605.853751] res 40/00:0c:80:21:90/00:00:21:00:00/40 Emask 0x30 (host bus error)
Apr 5 23:23:38 tux kernel: [ 1605.853756] ata8.00: status: { DRDY }
Apr 5 23:23:38 tux kernel: [ 1605.853765] ata8: hard resetting link
Apr 5 23:23:38 tux kernel: [ 1606.740035] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 5 23:23:38 tux kernel: [ 1606.752436] ata8.00: configured for UDMA/133
Apr 5 23:23:38 tux kernel: [ 1606.752452] ata8: EH complete
Apr 5 23:32:53 tux kernel: [ 2160.986527] ata8.00: exception Emask 0x33 SAct 0x3 SErr 0x0 action 0xe frozen
Apr 5 23:32:53 tux kernel: [ 2160.986535] ata8.00: irq_stat 0xffffffff, unknown FIS 00000000 00000000 00000000 00000000, host bus
Apr 5 23:32:53 tux kernel: [ 2160.986543] ata8.00: failed command: READ FPDMA QUEUED
Apr 5 23:32:53 tux kernel: [ 2160.986553] ata8.00: cmd 60/00:00:80:e6:c4/01:00:23:00:00/40 tag 0 ncq 131072 in
Apr 5 23:32:53 tux kernel: [ 2160.986555] res 40/00:0c:80:e7:c4/00:00:23:00:00/40 Emask 0x32 (host bus error)
Apr 5 23:32:53 tux kernel: [ 2160.986561] ata8.00: status: { DRDY }
Apr 5 23:32:53 tux kernel: [ 2160.986565] ata8.00: failed command: READ FPDMA QUEUED
Apr 5 23:32:53 tux kernel: [ 2160.986574] ata8.00: cmd 60/00:08:80:e7:c4/01:00:23:00:00/40 tag 1 ncq 131072 in
Apr 5 23:32:53 tux kernel: [ 2160.986576] res 40/00:0c:80:e7:c4/00:00:23:00:00/40 Emask 0x32 (host bus error)
Apr 5 23:32:53 tux kernel: [ 2160.986581] ata8.00: status: { DRDY }
Apr 5 23:32:53 tux kernel: [ 2160.986590] ata8: hard resetting link
Apr 5 23:32:54 tux kernel: [ 2161.872054] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 5 23:32:54 tux kernel: [ 2161.884466] ata8.00: configured for UDMA/133
Apr 5 23:32:54 tux kernel: [ 2161.884482] ata8: EH complete
Apr 5 23:35:55 tux kernel: [ 2342.846149] ata7: exception Emask 0x33 SAct 0x0 SErr 0xcc1a6dc action 0xe frozen
Apr 5 23:35:55 tux kernel: [ 2342.846157] ata7: irq_stat 0xd6e5447d, unknown FIS 00000000 00000000 00000000 00000000, host bus
Apr 5 23:35:55 tux kernel: [ 2342.846164] ata7: SError: { Persist Proto PHYRdyChg Handshk LinkSeq DevExch }
Apr 5 23:35:55 tux kernel: [ 2342.846178] ata7: hard resetting link
Apr 5 23:35:55 tux kernel: [ 2343.572042] ata7: SATA link down (SStatus 0 SControl 300)
Apr 5 23:35:55 tux kernel: [ 2343.572060] ata7: EH complete

As you can see also ata8 is now recognized as "SATA link up 1.5 Gbps" and not 3.0 as earlier...

There is one thing I've never seen before. There is an "ata7" error now. ata7?!?
There is no such a device called ata7.

In fact, also running "dmesg | grep ata[0-9]" gives me:
[ 0.867603] ata1: PATA max UDMA/100 cmd 0xcc00 ctl 0xc800 bmdma 0xbc00 irq 29
[ 0.867607] ata2: PATA max UDMA/100 cmd 0xc400 ctl 0xc000 bmdma 0xbc08 irq 29
[ 0.874109] ata3: SATA max UDMA/133 cmd 0xfc00 ctl 0xf800 bmdma 0xec00 irq 21
[ 0.874113] ata4: SATA max UDMA/133 cmd 0xf400 ctl 0xf000 bmdma 0xec08 irq 21
[ 0.907104] ata5: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xe400 irq 14
[ 0.907108] ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xe408 irq 15
[ 0.936299] ata7: SATA max UDMA/133 abar m8192@0xdfefe000 port 0xdfefe100 irq 28
[ 0.936304] ata8: SATA max UDMA/133 abar m8192@0xdfefe000 port 0xdfefe180 irq 28
[ 1.088463] ata5.01: ATAPI: HL-DT-STDVD-RAM GH22NP20, 1.01, max UDMA/66
[ 1.092029] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.104349] ata5.01: configured for UDMA/66
[ 1.256364] ata3.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
[ 1.256370] ata3.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 1.260045] ata7: SATA link down (SStatus 0 SControl 300)
[ 1.264387] ata3.00: configured for UDMA/133
[ 1.424033] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.430177] ata8.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
[ 1.430184] ata8.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[ 1.436426] ata8.00: configured for UDMA/133
[ 1.468016] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.632363] ata4.00: ATA-8: SAMSUNG HD204UI, 1AQ10001, max UDMA/133
[ 1.632369] ata4.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 1.640470] ata4.00: configured for UDMA/133
[ 1605.853722] ata8.00: exception Emask 0x30 SAct 0x2 SErr 0x0 action 0xe frozen
[ 1605.853731] ata8.00: irq_stat 0x2b42866d, host bus error, connection status changed
[ 1605.853739] ata8.00: failed command: READ FPDMA QUEUED
[ 1605.853748] ata8.00: cmd 60/00:08:80:21:90/01:00:21:00:00/40 tag 1 ncq 131072 in
[ 1605.853756] ata8.00: status: { DRDY }
[ 1605.853765] ata8: hard resetting link
[ 1606.740035] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1606.752436] ata8.00: configured for UDMA/133
[ 1606.752452] ata8: EH complete
[ 2160.986527] ata8.00: exception Emask 0x33 SAct 0x3 SErr 0x0 action 0xe frozen
[ 2160.986535] ata8.00: irq_stat 0xffffffff, unknown FIS 00000000 00000000 00000000 00000000, host bus
[ 2160.986543] ata8.00: failed command: READ FPDMA QUEUED
[ 2160.986553] ata8.00: cmd 60/00:00:80:e6:c4/01:00:23:00:00/40 tag 0 ncq 131072 in
[ 2160.986561] ata8.00: status: { DRDY }
[ 2160.986565] ata8.00: failed command: READ FPDMA QUEUED
[ 2160.986574] ata8.00: cmd 60/00:08:80:e7:c4/01:00:23:00:00/40 tag 1 ncq 131072 in
[ 2160.986581] ata8.00: status: { DRDY }
[ 2160.986590] ata8: hard resetting link
[ 2161.872054] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2161.884466] ata8.00: configured for UDMA/133
[ 2161.884482] ata8: EH complete
[ 2342.846149] ata7: exception Emask 0x33 SAct 0x0 SErr 0xcc1a6dc action 0xe frozen
[ 2342.846157] ata7: irq_stat 0xd6e5447d, unknown FIS 00000000 00000000 00000000 00000000, host bus
[ 2342.846164] ata7: SError: { Persist Proto PHYRdyChg Handshk LinkSeq DevExch }
[ 2342.846178] ata7: hard resetting link
[ 2343.572042] ata7: SATA link down (SStatus 0 SControl 300)
[ 2343.572060] ata7: EH complete

If you read a few lines above it says:
"[ 1.260045] ata7: SATA link down (SStatus 0 SControl 300)"
Am I missing something simple?!? :-/

Another thing I tried is disabling the smartd service which might give problems with these disks but nothing changed.

I'm running out of options here... :-/

H_TeXMeX_H 04-06-2011 04:59 AM

Yeah, I think it's a problem with interrupts. Notice that ata7 and ata8 and on IRQ 28, so even tho there's no device at ata7, interrupts are being generated that give errors referring to it.

Try putting it in SATA mode. If that doesn't work, all I can say is to disable the JMicron controller and find another solution. That's the only solution that worked for me. It's just a bad controller.

Maybe you could try a new kernel version, maybe they have some workarounds for this issue.


All times are GMT -5. The time now is 05:35 PM.