Kernel panic due to dma problems ?
Hello there !
i ve installed Mandrake 8.2 and have serious troubles. At least once a day i ve got a kernel panic that most of the time freezes the machine !!! It was very difficult to trace the source of the problem because after reboot, any log could give me a clue ... Fortunately, i had the chance to see a semi-crash where the system had a chance to write something to the logs !!! Here is the result : Jun 12 19:26:25 orky kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000036 Jun 12 19:26:25 orky kernel: printing eip: Jun 12 19:26:25 orky kernel: c01a149a Jun 12 19:26:25 orky kernel: *pde = 00000000 Jun 12 19:26:25 orky kernel: Oops: 0000 Jun 12 19:26:25 orky kernel: CPU: 0 Jun 12 19:26:25 orky kernel: EIP: 0010:[ide_build_sglist+154/400] Not tainted Jun 12 19:26:25 orky kernel: EIP: 0010:[<c01a149a>] Not tainted Jun 12 19:26:25 orky kernel: EFLAGS: 00010202 Jun 12 19:26:25 orky kernel: eax: 0000002a ebx: cff6212c ecx: 00000000 edx: 00000002 Jun 12 19:26:25 orky kernel: esi: 00000009 edi: cff6212c ebp: 00000021 esp: ca229da8 Jun 12 19:26:25 orky kernel: ds: 0018 es: 0018 ss: 0018 Jun 12 19:26:25 orky kernel: Process proftpd (pid: 2606, stackpage=ca229000) Jun 12 19:26:25 orky kernel: Stack: 0000000f cff62000 cff65000 0000d008 00000000 c02e7314 c01a1761 c02e7314 Jun 12 19:26:25 orky kernel: cff536e0 00000000 00000000 c02e7314 0000d008 00000000 c02e7358 c01a1d33 Jun 12 19:26:25 orky kernel: c02e7358 00000000 cf23f8fc 00000008 00d879cc cff536e0 c02e7314 000000e0 Jun 12 19:26:25 orky kernel: Call Trace: [ide_build_dmatable+113/496] [ide_dmaproc+227/624] [do_rw_disk+714/1296] [tcp_push_one+122/272] [ide_wait_stat+192/272] Jun 12 19:26:25 orky kernel: Call Trace: [<c01a1761>] [<c01a1d33>] [<c01ad16a>] [<c01fe32a>] [<c0196bd0>] Jun 12 19:26:25 orky kernel: [start_request+425/528] [ide_do_request+660/736] [do_ide_request+15/32] [generic_unplug_device+30/48] [__run_task_queue+72/96] [block_sync_page+25/32] Jun 12 19:26:25 orky kernel: [<c0196f69>] [<c01972d4>] [<c019735f>] [<c018159e>] [<c011b678>] [<c01385b9>] Jun 12 19:26:25 orky kernel: [inet_sendmsg+53/64] [__lock_page+94/144] [lock_page+20/32] [do_generic_file_read+675/1104] [generic_file_read+124/304] [file_read_actor+0/96] Jun 12 19:26:25 orky kernel: [<c020f2d5>] [<c012723e>] [<c0127284>] [<c0127953>] [<c0127dcc>] [<c0127cf0>] Jun 12 19:26:25 orky kernel: [sys_read+150/256] [sys_alarm+50/80] [system_call+51/64] Jun 12 19:26:25 orky kernel: [<c01345e6>] [<c011edf2>] [<c0106f23>] Jun 12 19:26:25 orky kernel: Jun 12 19:26:25 orky kernel: Code: 3b 42 34 74 e6 b9 05 00 00 00 89 df 31 c0 f3 ab 89 2b 89 73 Can anyone help me in interpreting the problem ??? Thanx ! |
You forgot to mention which kernel you're using.
Regards |
If its stock Mandy 8.2, then its 2.4.18... I can't see him downgrading kernels as that's still current.
What's the rest of the hardware like? Hopefully with isn't anything as simple as the AMD, agp bug. Cheers, Finegan |
Additionnal data ...
yes, finegan, you are right this is the 2.4.18 kernel with no modification at all from the original distro ...
As well, this machine was running perfectly the Mandy 8.0 , since then there were no hardware modification ... the record by then was 100+ days without a reboot ... So i guess, it doesn t come from an hardware problem unless something is broken ... This could be, but nothing let me think this way ... This is my /var/log/dmesg Linux version 2.4.18-6mdk (quintela@bi.mandrakesoft.com) (gcc version 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)) #1 Fri Mar 15 02:59:08 CET 2002 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000000fff0000 (usable) BIOS-e820: 000000000fff0000 - 000000000fff3000 (ACPI NVS) BIOS-e820: 000000000fff3000 - 0000000010000000 (ACPI data) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) hm, page 0fff0000 reserved twice. On node 0 totalpages: 65520 zone(0): 4096 pages. zone(1): 61424 pages. zone(2): 0 pages. Kernel command line: auto BOOT_IMAGE=linux ro root=306 devfs=mount Local APIC disabled by BIOS -- reenabling. Found and enabled local APIC! Initializing CPU#0 Detected 850.057 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 1697.38 BogoMIPS Memory: 255444k/262080k available (1170k kernel code, 6248k reserved, 332k data, 260k init, 0k highmem) Dentry-cache hash table entries: 32768 (order: 6, 262144 bytes) Inode-cache hash table entries: 16384 (order: 5, 131072 bytes) Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes) Page-cache hash table entries: 65536 (order: 6, 262144 bytes) CPU: Before vendor init, caps: 0183fbff c1c7fbff 00000000, vendor = 2 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 64K (64 bytes/line) CPU: After vendor init, caps: 0183fbff c1c7fbff 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0183fbff c1c7fbff 00000000 00000000 CPU: Common caps: 0183fbff c1c7fbff 00000000 00000000 CPU: AMD Duron(tm) Processor stepping 01 Enabling fast FPU save and restore... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au) mtrr: detected mtrr type: Intel PCI: PCI BIOS revision 2.10 entry at 0xfb430, last bus=1 PCI: Using configuration type 1 PCI: Probing PCI hardware Unknown bridge resource 0: assuming transparent Unknown bridge resource 1: assuming transparent Unknown bridge resource 2: assuming transparent PCI: Using IRQ router VIA [1106/0686] at 00:07.0 Applying VIA southbridge workaround. PCI: Disabling Via external APIC routing isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16) Starting kswapd VFS: Diskquotas version dquot_6.5.0 initialized devfs: v1.10 (20020120) Richard Gooch (rgooch@atnf.csiro.au) devfs: boot_options: 0x1 pty: 256 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with HUB-6 MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A block: 128 slots per queue, batch=32 RAMDISK driver initialized: 16 RAM disks of 32000K size 1024 blocksize Uniform Multi-Platform E-IDE driver Revision: 6.31 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: IDE controller on PCI bus 00 dev 39 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1 ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:DMA hda: QUANTUM FIREBALLP AS20.5, ATA DISK drive hdb: IBM-DTLA-307020, ATA DISK drive hdc: IC35L040AVER07-0, ATA DISK drive hdd: ATAPI CDROM 48X, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 hda: 40132503 sectors (20548 MB) w/1902KiB Cache, CHS=2498/255/63, UDMA(33) hdb: 40188960 sectors (20577 MB) w/1916KiB Cache, CHS=2501/255/63, UDMA(33) hdc: 80418240 sectors (41174 MB) w/1916KiB Cache, CHS=79780/16/63, UDMA(33) hdd: ATAPI 193X CD-ROM drive, 128kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.12 Partition check: /dev/ide/host0/bus0/target0/lun0: p1 < p5 p6 > /dev/ide/host0/bus0/target1/lun0: p1 /dev/ide/host0/bus1/target0/lun0: p1 < p5 > Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 2048 buckets, 16Kbytes TCP: Hash tables configured (established 16384 bind 16384) Linux IP multicast router 0.06 plus PIM-SM NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. RAMDISK: Compressed image found at block 0 Uncompressing....done. Freeing initrd memory: 74k freed VFS: Mounted root (ext2 filesystem). Mounted devfs on /dev Freeing unused kernel memory: 260k freed Real Time Clock Driver v1.10e Adding Swap: 401552k swap-space (priority -1) ip_conntrack (2047 buckets, 16376 max) ip_tables: (C) 2000-2002 Netfilter core team ne2k-pci.c:v1.02 10/19/2000 D. Becker/P. Gortmaker http://www.scyld.com/network/ne2k-pci.html PCI: Found IRQ 11 for device 00:0d.0 PCI: Sharing IRQ 11 with 00:0f.0 eth0: RealTek RTL-8029 found at 0xdc00, IRQ 11, 00:20:18:2B:BC:59. PCI: Found IRQ 11 for device 00:0f.0 PCI: Sharing IRQ 11 with 00:0d.0 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html 00:0f.0: 3Com PCI 3c900 Boomerang 10Mbps Combo at 0xe000. Vers LK1.1.16 The motherboard is a abit kt7a ... but it was running fine till now .... Do you need more infos ? Thanx for helping !!! |
Interesting, can't be the athy AGP bug as that had to do with a caching issue the durons don't get... odd too because I've got Mandy 8.2 running un-reliably as well on the same mobo, but I just chalked i up to the fact I had probably corrupted ext3 due to all of the crashes I sent that thing through hacking on Mplayer. I haven't worked on my issue much as it only happens in X and I have 4 other boots to go into on the machine.
2 Nics means this is probably the house NAT/MASQ crate so you need it to be reliable... if you weren't using it for much else, I'm beginning to call Mandy 8.2 a stinker anyway. I'll post back if I find anything fun. I don't think its anything HD related as all that is all probably paging from ext3(?) The only line of that panic that looks suspicious is: Jun 12 19:26:25 orky kernel: Process proftpd (pid: 2606, stackpage=ca229000) Offhand, and I wish I knew klogd better so this is just guess-land, try disabling proftd for the time being and see if it hiccups again. Cheers, Finegan |
the integral story /\/\Oo/\/\ ...
Yes,
finegan, when this first occured i almost immediatly focused on proftpd ... I verified the version provided by the distro and it turned to be a release candidate of proftpd (as always with mandy distro) ... I saw in that a fast answer to my problem ... but ... once proftpd upgraded to 1.2.5 stable release (the latest), it turned to be the same situation ... very desapointing ... And then the bug "theory" of a guilty software failure was begining to fly over ... What 's up then ? Here is a small story telling how the kernel panic comes, because it s easily reproductible : ) ... When there is a big transfert between a LAN station and the linux box, the "kernel panic"fastly comes ... it occurs when the hard drive is used "intensively", let s say as soon as 100 Megas are transfered ... And the most strange behavior is that after such a kernel panic, during reboot, when hard drive checks are forced, i ve got a new kind of kernel panic (typed by hand): ....... ....... /dev/hdc5 was not clearly mounted, check forced Unable to handle kernel paging reaquest at virtual address 20a722cf................../9.3% *pde = 00000000 ............. .......... .......... Process swapper (pid0, stackpage = c0279000) stack = ......... Call Trace = ............. Code : BAD EIP VALUE ........That s all.......... And when i reboot again it could works fine again or crash at the same 9.3% or crash at another 63 % .......... Well, this is not a proftpd issue does'nt it ??? What do you think it could be ? Thanx !!! |
No if this is in transfers this isn't proftp... the filesystem? Ext3 went off the EXPERIMENTAL listing at exactly 2.4.18. I'm almost certain if you back-up and re-install to ext2 it'll be fine, or better yet, if Mandrake still gives you the option, ReiserFS is really stable. Between LAN station and Linux box, do you mean over Samba? That could be a testbed for filesystem conflicts.
Proftpd was basically the only non-sleeping process during that panic then, but its running fine... filesystem, memory, Hard drive, but I doubt that... that's all I can really think of. Luck, Finegan |
ext 3 ?
Finegan,
your ext3 theory looks interesting, but how can i be sure that my current drives use this filesystem ? Thanx again ! |
i almost forgot ...
when i upgraded to 8.2, the only drive that was formatted was the root filesystem on wich there is nothing else ... and i see :
VFS: Mounted root (ext2 filesystem). /var/log/dmesg ... ... so is the ext3 field of search valid anymore ? Can wait for new clues ; ) ... |
Okay,
here is some more infos : -> i ve tested local tranferts from one hard drive to another, it crashes ... -> Also local transferts from a hard drive to the same (whatever it is) crashes also ... I am planning to test the memory now, even if it was runnning fine till now ... What i don t understand that it is a possible kernel/driver issue but the previous 8.0 didn t show such a lack with exactly the same hardware ... I am downloading till a few hours, the latest debian distro ... Anyone to suggest an alternate ending ? |
What do you use to do your filetransfers? I got problems with konqueror getting messed up when i do large transfers (over 100 megs). It doesn't result in a kernel panic. Had that on all distros that where on my box for longer time (connectiva,suse and debian) and different kernels. Never found out what causes it.
|
Transfert method ...
Well,
to test the file transfert i just use a console window with the following command : between drives : cp /hdd0/bigfile.xx /hdd1 on the same local drive cp /hdd0/bigfile.xxx /hdd0/bigfilecopy.xxx that s it ... thanx for helping ; ) !!! |
I'll try it on my box that way and see what happens.
|
I sleep in a lot on days off. Er... look at /etc/fstab and the type of filesystem should be printed next to the mount point.
Cheers, Finegan |
Thanx finegan !
All my disks use ext2 filesystem ... Cheers too : ) ... |
All times are GMT -5. The time now is 09:23 AM. |