LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices

Reply
 
Search this Thread
Old 08-20-2004, 08:55 AM   #1
kklier
Newbie
 
Registered: Aug 2003
Posts: 6

Rep: Reputation: 0
How To interpret kernel stack trace


I have been unable to find any kind of tutorial or clue as to determine why a crash occured in the kernel. I am running a Red Hat EE 3.0 kernel
and received the following crash, which appears to be in kswapd:
Code:
Aug 19 17:30:57 host1 login(pam_unix)[9816]: session closed for user someuser
Aug 19 17:30:59 host1 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000107
Aug 19 17:30:59 host1 kernel:  printing eip:
Aug 19 17:30:59 host1 kernel: c017c767
Aug 19 17:30:59 host1 kernel: *pde = 00003001
Aug 19 17:30:59 host1 kernel: *pte = 00000000
Aug 19 17:30:59 host1 kernel: Oops: 0000
Aug 19 17:30:59 host1 kernel: nfs nfsd lockd sunrpc lp parport autofs tg3 e100 floppy sg microcode keybdev mousedev hid input usb-ohci usbcore ext3 jbd mptscsih mptbase sd_mod scsi_mod
Aug 19 17:30:59 host1 kernel: CPU:    1
Aug 19 17:30:59 host1 kernel: EIP:    0060:[<c017c767>]    Not tainted
Aug 19 17:30:59 host1 kernel: EFLAGS: 00010202
Aug 19 17:30:59 host1 kernel:
Aug 19 17:30:59 host1 kernel: EIP is at iput [kernel] 0x37 (2.4.21-9.ELsmp/i686)
Aug 19 17:30:59 host1 kernel: eax: 000000ef   ebx: f3120a80   ecx: f3120a90   edx: e8b72c80
Aug 19 17:30:59 host1 kernel: esi: 000000ef   edi: df7ba800   ebp: 0000035e   esp: f7fa3f6c
Aug 19 17:30:59 host1 kernel: ds: 0068   es: 0068   ss: 0068
Aug 19 17:30:59 host1 kernel: Process kswapd (pid: 7, stackpage=f7fa3000)
Aug 19 17:31:00 host1 kernel: Stack: 00000000 c0179610 f8d99ad7 e8b72c98 e8b72c80 f3120a80 c0179b1a f3120a80
Aug 19 17:31:00 host1 kernel:        f3120a80 c03a3d00 00003108 00000040 000001d0 c0179ee8 0000038b 00000040
Aug 19 17:31:00 host1 kernel:        c015388a 00000006 000001d0 00000014 0000312c 00000000 00040f42 ffffffff
Aug 19 17:31:00 host1 kernel: Call Trace:   [<c0179610>] dput [kernel] 0x30 (0xf7fa3f70)
Aug 19 17:31:00 host1 kernel: [<f8d99ad7>] nfs_dentry_iput [nfs] 0x57 (0xf7fa3f74)
Aug 19 17:31:00 host1 kernel: [<c0179b1a>] prune_dcache [kernel] 0x18a (0xf7fa3f84)
Aug 19 17:31:00 host1 kernel: [<c0179ee8>] shrink_dcache_memory [kernel] 0x68 (0xf7fa3fa0)
Aug 19 17:31:00 host1 kernel: [<c015388a>] do_try_to_free_pages_kswapd [kernel] 0x13a (0xf7fa3fac)
Aug 19 17:31:00 host1 kernel: [<c0153a38>] kswapd [kernel] 0x68 (0xf7fa3fd0)
Aug 19 17:31:00 host1 kernel: [<c01539d0>] kswapd [kernel] 0x0 (0xf7fa3fe4)
Aug 19 17:31:00 host1 kernel: [<c010958d>] kernel_thread_helper [kernel] 0x5 (0xf7fa3ff0)
Aug 19 17:31:00 host1 kernel:
Aug 19 17:31:00 host1 kernel: Code: 8b 46 18 85 c0 0f 85 b1 02 00 00 c7 44 24 04 9c 86 3a c0 8d
Aug 19 17:31:00 host1 kernel:
Aug 19 17:31:00 host1 kernel: Kernel panic: Fatal exception
Aug 19 17:31:00 host1 kernel:
But, I cannot figure out why this happened.

The load was pretty high at:
Code:
            kbmemfree kbmemused  %memused kbmemshrd kbbuffers  kbcached kbswpfree kbswpused  %swpused
16:50:00       391080   3734860     90.52         0    230052   2293084   1569572   6816220     81.28
17:00:02       329112   3796828     92.02         0    230080   2295696   1623204   6762588     80.64
17:10:01       324004   3801936     92.15         0    230084   2295600   1623480   6762312     80.64
17:20:01       322776   3803164     92.18         0    230104   2295768   1623524   6762268     80.64
But it does not look like all the resources were completely exhausted.

Any clue or any pointers to howto info would be great help.

Thanks!

Last edited by kklier; 08-22-2004 at 11:55 AM.
 
Old 08-20-2004, 11:34 AM   #2
chort
Senior Member
 
Registered: Jul 2003
Location: Silicon Valley, USA
Distribution: OpenBSD 4.6, OS X 10.6.2, CentOS 4 & 5
Posts: 3,660

Rep: Reputation: 69
I'm no kernel guru, but it does appear that your system was attempting to free some swap space to allocate it to NFS. Perhaps the pointer being referred to was supposed to point to the next free block of memory, or something like that. In any case, null pointer dereferences are quite bad and IMHO that shows a bug in the kernel.

You'll see a similar report here that has a lot of similarities (minus NFS, but otherwise the branch followed by kswapd looks almost identical). That was in 2002 and there's a post by Andrew Morton that most of the developers thought it was just bad RAM, but due to the overwhelming number of reports they were getting he was starting to think it was a kernel bug.

Sounds like your best bet is to get the most recent kernel. If the problems persist, test your RAM with memtest86 and/or consider swaping out the RAM sticks with known good RAM. Anothing thing to point out is that you had nearly exhausted your swap space, which should really never happen. It seems like one or more of the applications you're running has some severe memory leaks in it. Another option would be to create more swap space.
 
Old 08-20-2004, 01:38 PM   #3
kklier
Newbie
 
Registered: Aug 2003
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally posted by chort
I'm no kernel guru,....
Thanks chort. Any bit of info helps. We are limited on the kernels that we can use. We are forced to use the updates from Red Hat as the come out. I will however be switching from 2.4.21-9.EL to 2.4.21-15.0.2.EL, the one provided in Update 2.

Now to find out if this was fixed or not in the newer kernel!

Korey
 
Old 08-20-2004, 10:36 PM   #4
chort
Senior Member
 
Registered: Jul 2003
Location: Silicon Valley, USA
Distribution: OpenBSD 4.6, OS X 10.6.2, CentOS 4 & 5
Posts: 3,660

Rep: Reputation: 69
It should be noted that even if the new kernel solves the crash issue, you're going to need a lot more RAM to continue running that load since you're swaping out a ton of memory. Like I said, one of your applications probably is leaking memory.
 
Old 08-21-2004, 10:52 PM   #5
kklier
Newbie
 
Registered: Aug 2003
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally posted by chort
It should be noted that even if the new kernel solves the crash issue, you're going to need a lot more RAM to continue running that load since you're swaping out a ton of memory. Like I said, one of your applications probably is leaking memory.
speaking of ram...these are duel processor Xeon's with 4gb of ram,8gb of swap. Near as I can tell there were two simulations running( one on each proc), but we are not sure if the remaining RAM was or swap space was sucked up during the last 10min before sar stopped reporting. These sims are known to eat RAM, so no surprise.

Can both processors address 4gb of physical RAM? or is it bound to the kernels addressing capabilities? We were using the SMP kernel from redhat.
 
Old 08-22-2004, 01:40 AM   #6
chort
Senior Member
 
Registered: Jul 2003
Location: Silicon Valley, USA
Distribution: OpenBSD 4.6, OS X 10.6.2, CentOS 4 & 5
Posts: 3,660

Rep: Reputation: 69
Whoops, shows how much I was paying attention... Now that I looked at the numbers, yes that's quite an impressive battery of RAM.

So once again, I'm not that great with Linux kernel internals, but from what I can tell the limit it 4GB per process. Apparently the memory limitation doesn't have anything to do with the number of CPUs, it's either what the kernel's max is, or what the hardware memory controller can handle.
 
Old 08-22-2004, 09:24 AM   #7
frob23
Senior Member
 
Registered: Jan 2004
Location: Roughly 29.467N / 81.206W
Distribution: Ubuntu, FreeBSD, NetBSD
Posts: 1,449

Rep: Reputation: 47
Okay, we are going to need to play around a little here. The problem may have started in kswapd but we can tell exactly where it actually happened.

We are going to have to use gdb (and I am not positive off the top of my head if we need to take action for a compressed kernel... I'll check on that).

gdb -k /path/to/kernel

This should spit out the introduction and leave you with a prompt of:
(kgdb)

Now, try
(kgdb) disas 0xc017c767

That number is the address of the instruction pointer where the problem occured. It should spit out the function -- starting from the top in assembly. The assembly might not help you but at least you will know the name of the function that "broke."

If you have a core dump and a debugging kernel there is a lot more we can do. With a proper core dump we can examine the exact data that cause the problem and the exact state of the machine. Sadly, it is far more likely you don't have a core dump (I've been bitten more than once and every time fate conspires to do it when I have the core dump ability turned off).

I have done some very brief looking about the compressed kernel question but don't have the ability to try anything at work. For all I know, it could be a non-issue. It won't hurt anything to try the steps above.

Also... a very minor thing... when posting output could you please use the [.code.] and [./code.] tags around the output? (without the .'s) My window here is very small and it wraps lines in horrible places... and messes with the format in other subtle ways. It is a minor thing but it makes the output easier to read.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Difference b/t Kernel stack and User stack hazzyb Linux - Software 2 09-29-2008 07:40 PM
Where to find a Kernel/Kernel Patch with support for stack sizes >8K Valhalla Linux - Software 2 05-24-2005 05:39 PM
java: stack trace eantoranz Programming 3 04-25-2005 11:38 AM
stack trace of a process node047 Linux - Newbie 2 04-01-2005 09:11 PM
Stack trace ust Linux - General 0 02-27-2004 02:30 AM


All times are GMT -5. The time now is 12:25 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration