LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 07-03-2015, 03:12 PM   #1
alirezan1
Member
 
Registered: Nov 2004
Location: Vancouver
Distribution: Ubunty, CentOS ,Mandriva, Gentoo, RedHat, Fedora, Knoppix
Posts: 150

Rep: Reputation: 15
Reboot on corruption/panic not working?!


Hi guys

I have a headless embedded system running debian wheezy with kernel 3.12.

I have (or so I thought) set it up so that on kernel panic or oops it restarts after 5 seconds:

Quote:
sysctl -w kernel.panic=5
sysctl -w kernel.panic_on_oops=1
sysctl -w vm.panic_on_oom=1
These are set in rcS.

Also I have the following in /etc/sysctl.conf:

Quote:
kernel.panic = 5
kernel.panic_on_oops=1
vm.panic_on_oom=1

But to my surprise, i came to a dead system and plugged in my debugger and looked at the console logs and it was stuck in this mode:


Quote:
[ 326.179980] CPU: 0 PID: 4987 Comm: ifconfig Not tainted 3.12.10-svn14 #5
[ 326.187033] Backtrace:
[ 326.189621] <c00179fc> (dump_backtrace+0x0/0x10c) from <c0017b98> (show_stack+0x18/0x1c)
[ 326.198505] r6:c087b838 r5:c087b838 r4:c087bb20 r3:00000000
[ 326.204501] <c0017b80> (show_stack+0x0/0x1c) from <c05d5ee4> (dump_stack+0x20/0x28)
[ 326.212948] <c05d5ec4> (dump_stack+0x0/0x28) from <c0094894> (rcu_check_callbacks+0x2a0/0x6f8)
[ 326.222402] <c00945f4> (rcu_check_callbacks+0x0/0x6f8) from <c0051b60> (update_process_times+0x44/0x70)
[ 326.232678] <c0051b1c> (update_process_times+0x0/0x70) from <c00849fc> (tick_sched_handle+0x50/0x5c)
[ 326.242660] r7:ddf17c80 r6:c087b340 r5:00020fcd r4:9bcc8cae
[ 326.248657] <c00849ac> (tick_sched_handle+0x0/0x5c) from <c0084bbc> (tick_sched_timer+0x48/0x78)
[ 326.258287] <c0084b74> (tick_sched_timer+0x0/0x78) from <c0065f8c> (__run_hrtimer.isra.22+0x60/0xfc)
[ 326.268276] r7:00000000 r6:c0084b74 r5:c0879e00 r4:c087b340
[ 326.274272] <c0065f2c> (__run_hrtimer.isra.22+0x0/0xfc) from <c0066770> (hrtimer_interrupt+0x108/0x2f0)
[ 326.284535] r6:c0879e00 r5:00020fcd r4:9bcc8919 r3:00020fcd
[ 326.290533] <c0066668> (hrtimer_interrupt+0x0/0x2f0) from <c002b678> (omap2_gp_timer_interrupt+0x2c/0x3c)
[ 326.300984] <c002b64c> (omap2_gp_timer_interrupt+0x0/0x3c) from <c00766d4> (handle_irq_event_percpu+0x54/0x1b8)
[ 326.311982] <c0076680> (handle_irq_event_percpu+0x0/0x1b8) from <c0076890> (handle_irq_event+0x58/0x80)
[ 326.322249] <c0076838> (handle_irq_event+0x0/0x80) from <c0078eb4> (handle_level_irq+0x90/0x108)
[ 326.331871] r5:00000054 r4:dd806cc0
[ 326.335658] <c0078e24> (handle_level_irq+0x0/0x108) from <c0075f70> (generic_handle_irq+0x28/0x38)
[ 326.345464] r4:00000054 r3:c0078e24
[ 326.349250] <c0075f48> (generic_handle_irq+0x0/0x38) from <c00156c0> (handle_IRQ+0x38/0x8c)
[ 326.358415] r4:c0882f44 r3:00000110
[ 326.362200] <c0015688> (handle_IRQ+0x0/0x8c) from <c00087cc> (omap3_intc_handle_irq+0x68/0x7c)
[ 326.371633] r6:c08b6970 r5:ddf17c80 r4:fa200000 r3:00000080
[ 326.377628] <c0008764> (omap3_intc_handle_irq+0x0/0x7c) from <c05da580> (__irq_svc+0x40/0x74)
[ 326.386968] Exception stack(0xddf17c80 to 0xddf17cc8)
[ 326.392297] 7c80: 00000000 c08b7a40 00000000 00000100 00000202 00000054 c08b7a84 c08b7a80
[ 326.400914] 7ca0: ddf16000 ddf16000 00000001 ddf17d14 ddf17cc8 ddf17cc8 c004b300 c004b314
[ 326.409523] 7cc0: 20000113 ffffffff
[ 326.413197] r7:ddf17cb4 r6:ffffffff r5:20000113 r4:c004b314
[ 326.419193] <c004b288> (__do_softirq+0x0/0x1c4) from <c004b4ec> (do_softirq+0x54/0x60)
[ 326.427906] <c004b498> (do_softirq+0x0/0x60) from <c004b79c> (irq_exit+0xac/0xf4)
[ 326.436149] r4:ddf16000 r3:00000000
[ 326.439936] <c004b6f0> (irq_exit+0x0/0xf4) from <c00156c4> (handle_IRQ+0x3c/0x8c)
[ 326.448186] r4:c0882f44 r3:00000110
[ 326.451974] <c0015688> (handle_IRQ+0x0/0x8c) from <c00087cc> (omap3_intc_handle_irq+0x68/0x7c)
[ 326.461407] r6:c08b6970 r5:ddf17d88 r4:fa200000 r3:00000080
[ 326.467399] <c0008764> (omap3_intc_handle_irq+0x0/0x7c) from <c05da580> (__irq_svc+0x40/0x74)
[ 326.476749] Exception stack(0xddf17d88 to 0xddf17dd0)
[ 326.482072] 7d80: ddd6ec40 00000000 fa1cc000 c03a7c00 ddd6e800 ddd6ec40
[ 326.490689] 7da0: 00000001 00040081 00008914 ddf16000 00000001 ddf17de4 ddf17d98 ddf17dd0
[ 326.499300] 7dc0: c03a7c38 c03a7580 60000013 ffffffff
[ 326.504616] r7:ddf17dbc r6:ffffffff r5:60000013 r4:c03a7580
[ 326.510612] <c03a74e0> (c_can_close+0x0/0x114) from <c04f2bc0> (__dev_close_many+0x90/0xd8)
[ 326.519771] r5:ddf17e00 r4:ddd6e800
[ 326.523555] <c04f2b30> (_dev_close_many+0x0/0xd8) from <c04f2c38> (_dev_close+0x30/0x48)
[ 326.532621] r5:000000c0 r4:ddd6e800
[ 326.536408] <c04f2c08> (_dev_close+0x0/0x48) from <c04f70ac> (_dev_change_flags+0x90/0x140)
[ 326.545761] <c04f701c> (__dev_change_flags+0x0/0x140) from <c04f71ec> (dev_change_flags+0x18/0x50)
[ 326.555567] r7:00000000 r6:00000000 r5:00040081 r4:ddd6e800
[ 326.561562] <c04f71d4> (dev_change_flags+0x0/0x50) from <c0567634> (devinet_ioctl+0x620/0x6d8)
[ 326.571001] r6:00000000 r5:ddcdfe8c r4:00000000 r3:00008914
[ 326.576993] <c0567014> (devinet_ioctl+0x0/0x6d8) from <c056862c> (inet_ioctl+0x1b4/0x1c8)
[ 326.585981] <c0568478> (inet_ioctl+0x0/0x1c8) from <c04e16a0> (sock_ioctl+0x70/0x29c)
[ 326.594603] <c04e1630> (sock_ioctl+0x0/0x29c) from <c00f0c30> (do_vfs_ioctl+0x84/0x5c4)
[ 326.603395] r6:00008914 r5:ddb1b8c0 r4:00000000 r3:c04e1630
[ 326.609390] <c00f0bac> (do_vfs_ioctl+0x0/0x5c4) from <c00f11e4> (SyS_ioctl+0x74/0x84)
[ 326.618018] <c00f1170> (SyS_ioctl+0x0/0x84) from <c00147c0> (ret_fast_syscall+0x0/0x30)


and it didn't restart.

Any ideas why it didn't restart and how I can ensure it does restart on situations like this?
 
Old 07-03-2015, 10:09 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,127

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Check what's actually there
Code:
cat /proc/sys/kernel/panic
cat /proc/sys/kernel/panic_on_oops
My reading of the kernel parameters suggests "oops=panic", not a time_value or boolean. Note also the short name.
Not a Debian user, and I haven't played with this, so merely conjecture on my part.
 
Old 07-03-2015, 10:15 PM   #3
alirezan1
Member
 
Registered: Nov 2004
Location: Vancouver
Distribution: Ubunty, CentOS ,Mandriva, Gentoo, RedHat, Fedora, Knoppix
Posts: 150

Original Poster
Rep: Reputation: 15
Thanks for the reply. I will check the outputs and will get back with the results shortly.

I don't believe it is Debian specific. Reading the following link, I think the arguments I set are correct. Please correct me if I am misunderstanding:

https://www.kernel.org/doc/Documenta...ctl/kernel.txt

Maybe I need to set the panic on stackoverflow as well?
 
Old 07-03-2015, 10:56 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,127

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Forget I said anything - the sysctl will only accept "1", not "panic".
Sorry, can't help.
 
Old 07-08-2015, 06:23 PM   #5
alirezan1
Member
 
Registered: Nov 2004
Location: Vancouver
Distribution: Ubunty, CentOS ,Mandriva, Gentoo, RedHat, Fedora, Knoppix
Posts: 150

Original Poster
Rep: Reputation: 15
Thanks guys!



I had another unit that failed with the same problem. Upon closer inspection, I found a line that I missed copying the first time...which is a huge clue as to what happened:



Quote:

[ 17.375650] INFO: rcu_preempt self-detected stall on CPU { 0} (t=54852456 jiffies g=5901 c=5900 q=35)

[ 17.385483] CPU: 0 PID: 2988 Comm: ifconfig Not tainted 3.12.10-svn14 #5
[ 17.392539] Backtrace:
[ 17.395129] [<c00179fc>] (dump_backtrace+0x0/0x10c) from [<c0017b98>] (show_stack+0x18/0x1c)
[ 17.404023] r6:c087b838 r5:c087b838 r4:c087bb20 r3:00000000
[ 17.410018] [<c0017b80>] (show_stack+0x0/0x1c) from [<c05d5ee4>] (dump_stack+0x20/0x28)
[ 17.418467] [<c05d5ec4>] (dump_stack+0x0/0x28) from [<c0094894>] (rcu_check_callbacks+0x2a0/0x6f8)
[ 17.427922] [<c00945f4>] (rcu_check_callbacks+0x0/0x6f8) from [<c0051b60>] (update_process_times+0x44/0x70)
[ 17.438192] [<c0051b1c>] (update_process_times+0x0/0x70) from [<c00849fc>] (tick_sched_handle+0x50/0x5c)
[ 17.448176] r7:dc0e3c80 r6:c087b340 r5:0000f987 r4:0b91cbf7
[ 17.454173] [<c00849ac>] (tick_sched_handle+0x0/0x5c) from [<c0084bbc>] (tick_sched_timer+0x48/0x78)
[ 17.463856] [<c0084b74>] (tick_sched_timer+0x0/0x78) from [<c0065f8c>] (__run_hrtimer.isra.22+0x60/0xfc)
[ 17.473887] r7:00000000 r6:c0084b74 r5:c0879e00 r4:c087b340
[ 17.479906] [<c0065f2c>] (__run_hrtimer.isra.22+0x0/0xfc) from [<c0066770>] (hrtimer_interrupt+0x108/0x2f0)
[ 17.490221] r6:c0879e00 r5:0000f987 r4:0b91c863 r3:0000f987
[ 17.496244] [<c0066668>] (hrtimer_interrupt+0x0/0x2f0) from [<c002b678>] (omap2_gp_timer_interrupt+0x2c/0x3c)
[ 17.506747] [<c002b64c>] (omap2_gp_timer_interrupt+0x0/0x3c) from [<c00766d4>] (handle_irq_event_percpu+0x54/0x1b8)
[ 17.517799] [<c0076680>] (handle_irq_event_percpu+0x0/0x1b8) from [<c0076890>] (handle_irq_event+0x58/0x80)
[ 17.528116] [<c0076838>] (handle_irq_event+0x0/0x80) from [<c0078eb4>] (handle_level_irq+0x90/0x108)
[ 17.537778] r5:00000054 r4:dd806cc0
[ 17.541580] [<c0078e24>] (handle_level_irq+0x0/0x108) from [<c0075f70>] (generic_handle_irq+0x28/0x38)
[ 17.551426] r4:00000054 r3:c0078e24
[ 17.555229] [<c0075f48>] (generic_handle_irq+0x0/0x38) from [<c00156c0>] (handle_IRQ+0x38/0x8c)
[ 17.564438] r4:c0882f44 r3:00000110
[ 17.568242] [<c0015688>] (handle_IRQ+0x0/0x8c) from [<c00087cc>] (omap3_intc_handle_irq+0x68/0x7c)
[ 17.577675] r6:c08b6970 r5:dc0e3c80 r4:fa200000 r3:00000080
[ 17.583669] [<c0008764>] (omap3_intc_handle_irq+0x0/0x7c) from [<c05da580>] (__irq_svc+0x40/0x74)
[ 17.593023] Exception stack(0xdc0e3c80 to 0xdc0e3cc8)
[ 17.598353] 3c80: 00000000 c08b7a40 00000000 00000100 00000202 00000054 c08b7a84 c08b7a80
[ 17.606980] 3ca0: dc0e2000 dc0e2000 00000001 dc0e3d14 dc0e3cc8 dc0e3cc8 c004b300 c004b314
[ 17.615594] 3cc0: 20000113 ffffffff
[ 17.619265] r7:dc0e3cb4 r6:ffffffff r5:20000113 r4:c004b314
[ 17.625262] [<c004b288>] (__do_softirq+0x0/0x1c4) from [<c004b4ec>] (do_softirq+0x54/0x60)
[ 17.633977] [<c004b498>] (do_softirq+0x0/0x60) from [<c004b79c>] (irq_exit+0xac/0xf4)
[ 17.642228] r4:dc0e2000 r3:00000000
[ 17.646016] [<c004b6f0>] (irq_exit+0x0/0xf4) from [<c00156c4>] (handle_IRQ+0x3c/0x8c)
[ 17.654267] r4:c0882f44 r3:00000110
[ 17.658054] [<c0015688>] (handle_IRQ+0x0/0x8c) from [<c00087cc>] (omap3_intc_handle_irq+0x68/0x7c)
[ 17.667488] r6:c08b6970 r5:dc0e3d88 r4:fa200000 r3:00000080
[ 17.673481] [<c0008764>] (omap3_intc_handle_irq+0x0/0x7c) from [<c05da580>] (__irq_svc+0x40/0x74)
[ 17.682827] Exception stack(0xdc0e3d88 to 0xdc0e3dd0)
[ 17.688152] 3d80: ddd5ec40 00000000 fa1cc000 c03a7c00 ddd5e800 ddd5ec40
[ 17.696771] 3da0: 00000001 00040081 00008914 dc0e2000 00000001 dc0e3de4 dc0e3d98 dc0e3dd0
[ 17.705386] 3dc0: c03a7c38 c03a7580 60000013 ffffffff
[ 17.710708] r7:dc0e3dbc r6:ffffffff r5:60000013 r4:c03a7580
[ 17.716705] [<c03a74e0>] (c_can_close+0x0/0x114) from [<c04f2bc0>] (__dev_close_many+0x90/0xd8)
[ 17.725866] r5:dc0e3e00 r4:ddd5e800
[ 17.729651] [<c04f2b30>] (__dev_close_many+0x0/0xd8) from [<c04f2c38>] (__dev_close+0x30/0x48)
[ 17.738720] r5:000000c0 r4:ddd5e800
[ 17.742506] [<c04f2c08>] (__dev_close+0x0/0x48) from [<c04f70ac>] (__dev_change_flags+0x90/0x140)
[ 17.751862] [<c04f701c>] (__dev_change_flags+0x0/0x140) from [<c04f71ec>] (dev_change_flags+0x18/0x50)
[ 17.761663] r7:00000000 r6:00000000 r5:00040081 r4:ddd5e800
[ 17.767660] [<c04f71d4>] (dev_change_flags+0x0/0x50) from [<c0567634>] (devinet_ioctl+0x620/0x6d8)
[ 17.777096] r6:00000000 r5:ddcdfe8c r4:00000000 r3:00008914
[ 17.783091] [<c0567014>] (devinet_ioctl+0x0/0x6d8) from [<c056862c>] (inet_ioctl+0x1b4/0x1c8)
[ 17.792087] [<c0568478>] (inet_ioctl+0x0/0x1c8) from [<c04e16a0>] (sock_ioctl+0x70/0x29c)
[ 17.800710] [<c04e1630>] (sock_ioctl+0x0/0x29c) from [<c00f0c30>] (do_vfs_ioctl+0x84/0x5c4)
[ 17.809505] r6:00008914 r5:dc140b40 r4:00000000 r3:c04e1630
[ 17.815502] [<c00f0bac>] (do_vfs_ioctl+0x0/0x5c4) from [<c00f11e4>] (SyS_ioctl+0x74/0x84)
[ 17.824125] [<c00f1170>] (SyS_ioctl+0x0/0x84) from [<c00147c0>] (ret_fast_syscall+0x0/0x30)

The first line:
[ 17.375650] INFO: rcu_preempt self-detected stall on CPU { 0} (t=54852456 jiffies g=5901 c=5900 q=35)

shows that it was a RCU stall detection that caused the lock up. I read a lot about it and there is a way to suppress this. I'm not 100 positive if this is a MUST feature to have and what disabling it could do to a system and what issues it could cause.
Here is what I found that suppresses it:

echo 1 > /sys/module/rcutree/parameters/rcu_cpu_stall_suppress







Can someone help me understand this feature and why I may need this turned out and what I could run into if I suppress it?

Also if someone knows how to force a system reboot on RCU Stall detection and could help me, it would be great!



Thanks!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
6.7 - all done - reboot - kernel panic! What to do now? cybermann Linux From Scratch 3 02-10-2011 08:09 AM
Hard disk light stays on. No processes, no apps running. reboot=corruption. wtf? GrapefruiTgirl Linux - Hardware 31 05-10-2007 03:14 AM
Panic; reboot/poweroff; 2.4.33.3 Tralce Linux - Kernel 0 12-04-2006 02:54 PM
reboot corruption memory (slack 9.1) muhkuhmasta Slackware 4 07-21-2004 02:21 AM
Kernel Panic after reboot Spock Linux - General 8 08-27-2002 09:38 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 02:56 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration