LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 06-17-2012, 02:17 PM   #1
rrlangly
Member
 
Registered: Dec 2009
Posts: 47

Rep: Reputation: 0
How to troubleshoot 'kernel paging request' oops


I've got a problem that I have been unable to trouble-shoot in my kernel module for going on a few weeks.

I've put together a networking KM that I'm playing with on two VM guests. My KM's on both guests seem to run fine as I transfer simple messages from one node to the other.

So after sending from one guest, and receiving it on the second guest, the VM'd kernel instance just idles. I have /var/log/messages being tailed via netconsole to my host OS. But usually about 2 minutes after I send a msg and the VM just idles, the followng output appears in /var/log/messages. I'm having difficulty tracing any of this as the traceback doesn't "seem" to originate w/ my KM (though I know it does). It doesn't happen when I run my KM functions, but only appears several minutes after the run.

Any help much appreciated.

Code:
[  217.952082] BUG: unable to handle kernel paging request at 000000011e0fc6c0
[  217.953026] IP: [<ffffffff814e0d73>] nf_nat_cleanup_conntrack+0x4a/0x71
[  217.953026] PGD 1e564067 PUD 0
[  217.953026] Oops: 0002 [#1] SMP
[  217.953026] CPU 0
[  217.953026] Modules linked in: testkm1(O) testkm2(O)
[  217.953026]
[  217.953026] Pid: 0, comm: swapper/0 Tainted: G           O 3.2.1-gentoo-r2 #2 Bochs Bochs
[  217.953026] RIP: 0010:[<ffffffff814e0d73>]  [<ffffffff814e0d73>] nf_nat_cleanup_conntrack+0x4a/0x71
[  217.953026] RSP: 0018:ffff88001fa03d70  EFLAGS: 00010246
[  217.953026] RAX: 0000000000000000 RBX: ffff88001e1367f8 RCX: ffffffff81053f1f
[  217.953026] RDX: 000000011e0fc6c0 RSI: 0000000000000006 RDI: ffffffff81c79bd8
[  217.953026] RBP: ffff88001fa03d80 R08: ffff88001fa0d980 R09: 0000000000000001
[  217.953026] R10: ffff88001fa03f08 R11: ffff88001fa0d900 R12: ffff88001e1367e1
[  217.953026] R13: ffff88001e082138 R14: ffff88001fa03e90 R15: ffffffff81a01fd8
[  217.953026] FS:  0000000000000000(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000
[  217.953026] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  217.953026] CR2: 000000011e0fc6c0 CR3: 000000001f748000 CR4: 00000000000006f0
[  217.953026] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  217.953026] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  217.953026] Process swapper/0 (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a0d020)
[  217.953026] Stack:
[  217.953026]  ffff88001fa12bc0 ffffffff81c77dc8 ffff88001fa03db0 ffffffff81498e3f
[  217.953026]  7fffffffffffffff ffff88001e082138 ffffffff81c76280 0000000000000100
[  217.953026]  ffff88001fa03dd0 ffffffff814945db ffff88001e082138 ffffffff81c76280
[  217.953026] Call Trace:
[  217.953026]  <IRQ>
[  217.953026]  [<ffffffff81498e3f>] __nf_ct_ext_destroy+0x3b/0x53
[  217.953026]  [<ffffffff814945db>] nf_conntrack_free+0x20/0x4f
[  217.953026]  [<ffffffff814946b2>] destroy_conntrack+0xa8/0xad
[  217.953026]  [<ffffffff8149122c>] nf_conntrack_destroy+0x16/0x18
[  217.953026]  [<ffffffff81493a5a>] nf_ct_put+0x18/0x1a
[  217.953026]  [<ffffffff81494a64>] death_by_timeout+0x22/0x26
[  217.953026]  [<ffffffff8107aba4>] run_timer_softirq+0x1c6/0x295
[  217.953026]  [<ffffffff81494a42>] ? nf_ct_delete_from_lists+0x89/0x89
[  217.953026]  [<ffffffff81090013>] ? ktime_get+0x59/0x93
[  217.953026]  [<ffffffff81073b6f>] __do_softirq+0xc8/0x1a4
[  217.953026]  [<ffffffff8108c3e1>] ? hrtimer_interrupt+0x10d/0x19f
[  217.953026]  [<ffffffff815b6b2c>] call_softirq+0x1c/0x30
[  217.953026]  [<ffffffff81035a99>] do_softirq+0x41/0x7e
[  217.953026]  [<ffffffff81073932>] irq_exit+0x44/0xb4
[  217.953026]  [<ffffffff8104c4b9>] smp_apic_timer_interrupt+0x86/0x94
[  217.953026]  [<ffffffff815b539e>] apic_timer_interrupt+0x6e/0x80
[  217.953026]  <EOI>
[  217.953026]  [<ffffffff810535c4>] ? native_safe_halt+0x6/0x8
[  217.953026]  [<ffffffff8103b460>] default_idle+0x4b/0x85
[  217.953026]  [<ffffffff81033dd4>] cpu_idle+0x6e/0xa5
[  217.953026]  [<ffffffff8158edd9>] rest_init+0x6d/0x6f
[  217.953026]  [<ffffffff81aa9bcc>] start_kernel+0x350/0x35b
[  217.953026]  [<ffffffff81aa92b1>] x86_64_start_reservations+0xb8/0xbc
[  217.953026]  [<ffffffff81aa93b6>] x86_64_start_kernel+0x101/0x110
[  217.953026] Code: c2 85 d2 74 49 0f b6 58 11 48 01 c3 74 40 48 83 7b 20 00 74 39 48 c7 c7 d8 9b c7 81 e8 0c d6 0c 00 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 bf 00 02 20 00 00 00 ad de 48 89 
[  217.953026] RIP  [<ffffffff814e0d73>] nf_nat_cleanup_conntrack+0x4a/0x71
[  217.953026]  RSP <ffff88001fa03d70>
[  217.953026] CR2: 000000011e0fc6c0
[  218.039169] ---[ end trace c8420f05dc384e8a ]---
[  218.040439] Kernel panic - not syncing: Fatal exception in interrupt
[  218.042159] Pid: 0, comm: swapper/0 Tainted: G      D    O 3.2.1-gentoo-r2 #2
[  218.044068] Call Trace:
[  218.044717]  <IRQ>  [<ffffffff815ac249>] panic+0x8c/0x19e
[  218.046245]  [<ffffffff815af0a4>] oops_end+0xb1/0xc1
[  218.047590]  [<ffffffff81057b76>] no_context+0x202/0x211
[  218.049023]  [<ffffffff81053f1f>] ? pvclock_clocksource_read+0x4b/0xb4
[  218.050765]  [<ffffffff81057d3e>] __bad_area_nosemaphore+0x1b9/0x1d9
[  218.052471]  [<ffffffff81053592>] ? kvm_clock_read+0x19/0x1b
[  218.053992]  [<ffffffff81057d6c>] bad_area_nosemaphore+0xe/0x10
[  218.055580]  [<ffffffff815b1205>] do_page_fault+0x1c1/0x389
[  218.057077]  [<ffffffff81053f1f>] ? pvclock_clocksource_read+0x4b/0xb4
[  218.058822]  [<ffffffff81053592>] ? kvm_clock_read+0x19/0x1b
[  218.060383]  [<ffffffff81069c68>] ? enqueue_task_fair+0x2ab/0x414
[  218.061992]  [<ffffffff815b0c49>] do_async_page_fault+0x49/0x6b
[  218.063607]  [<ffffffff815ae7b5>] async_page_fault+0x25/0x30
[  218.065132]  [<ffffffff81053f1f>] ? pvclock_clocksource_read+0x4b/0xb4
[  218.066874]  [<ffffffff814e0d73>] ? nf_nat_cleanup_conntrack+0x4a/0x71
[  218.068696]  [<ffffffff81498e3f>] __nf_ct_ext_destroy+0x3b/0x53
[  218.070290]  [<ffffffff814945db>] nf_conntrack_free+0x20/0x4f
[  218.071831]  [<ffffffff814946b2>] destroy_conntrack+0xa8/0xad
[  218.073378]  [<ffffffff8149122c>] nf_conntrack_destroy+0x16/0x18
[  218.074995]  [<ffffffff81493a5a>] nf_ct_put+0x18/0x1a
[  218.076357]  [<ffffffff81494a64>] death_by_timeout+0x22/0x26
[  218.077889]  [<ffffffff8107aba4>] run_timer_softirq+0x1c6/0x295
[  218.079475]  [<ffffffff81494a42>] ? nf_ct_delete_from_lists+0x89/0x89
[  218.081202]  [<ffffffff81090013>] ? ktime_get+0x59/0x93
[  218.082611]  [<ffffffff81073b6f>] __do_softirq+0xc8/0x1a4
[  218.084060]  [<ffffffff8108c3e1>] ? hrtimer_interrupt+0x10d/0x19f
[  218.085698]  [<ffffffff815b6b2c>] call_softirq+0x1c/0x30
[  218.087121]  [<ffffffff81035a99>] do_softirq+0x41/0x7e
[  218.088503]  [<ffffffff81073932>] irq_exit+0x44/0xb4
[  218.089879]  [<ffffffff8104c4b9>] smp_apic_timer_interrupt+0x86/0x94
[  218.091591]  [<ffffffff815b539e>] apic_timer_interrupt+0x6e/0x80
[  218.093218]  <EOI>  [<ffffffff810535c4>] ? native_safe_halt+0x6/0x8
[  218.094945]  [<ffffffff8103b460>] default_idle+0x4b/0x85
[  218.096378]  [<ffffffff81033dd4>] cpu_idle+0x6e/0xa5
[  218.097719]  [<ffffffff8158edd9>] rest_init+0x6d/0x6f
[  218.099099]  [<ffffffff81aa9bcc>] start_kernel+0x350/0x35b
[  218.100575]  [<ffffffff81aa92b1>] x86_64_start_reservations+0xb8/0xbc
[  218.102320]  [<ffffffff81aa93b6>] x86_64_start_kernel+0x101/0x110
 
Old 06-22-2012, 09:25 PM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,649
Blog Entries: 4

Rep: Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934Reputation: 3934
If you follow the traceback from the bottom up, you can more-or-less see where the exception happened, and you can see that it happened as a result of a timer-interrupt, which is of course why it is sporadic. The reference to death_by_timeout suggests that what happened next was destroy_conntrack and, shortly thereafter, a page_fault occurred which, we must presume, shouldn't have happened, i.e. shouldn't have been possible. In most debugging situations of this kind, "the root cause of the problem" happened (or is indicated) fairly early-on in the traceback, and the entire rest of it reflects the system crashing to the ground.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Soft Kernel Panic Oops Troubleshoot CincinnatiKid Linux - Kernel 6 09-05-2010 08:59 AM
Unable to handle kernel paging request!! greplinux Linux - Newbie 4 10-30-2008 12:01 AM
Unable to handle kernel paging request amit_bst Linux - Kernel 0 10-06-2006 06:35 AM
2.6.8.1 -- kernel paging request? KMcD Slackware 3 10-01-2004 06:15 AM
How to troubleshoot this? Unable to handle kernel paging request black hole sun Linux - Software 1 08-17-2004 12:51 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 10:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration