LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (https://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   8250 serial driver causing NMI 2.6.35.14 smp (https://www.linuxquestions.org/questions/linux-kernel-70/8250-serial-driver-causing-nmi-2-6-35-14-smp-944131/)

helpmelearn 05-09-2012 04:54 PM

8250 serial driver causing NMI 2.6.35.14 smp
 
Hi,
I am trying to understand if this is a known issue and/or how I could go about debugging this.

The moment I have lot of data going through the serial console (klogd -2 -c 7), I see NMI watchdog kick in.

[ 132.925785] Freeing cmd ffff88023d343050 (tag 1155072)
[ 132.925793] Calling target's on_free_cmd(ffff88023d343050)
[ 132.925806] Calling xmit_response(ffff88023eaee730)
[ 132.925832] tmp ffff88023ea32da0, cache num 0, pages 1, sg_count 1
[ 132.925840] Adding in ffff88013a6138a0 (list ffff88013a6138a0)
[ 169.128288] INFO: rcu_sched_state detected stall on CPU 0 (t=36552 jiffies)
[ 169.128951] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 0} (detected by 1, t=36552 jiffies)
[ 169.128955] sending NMI to all CPUs:
[ 169.128961] NMI backtrace for cpu 0
[ 169.128962] CPU 0
[ 169.128963] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler iscsi_scst crc32c_intel iptable_filter bonding xt_NOTRACK iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_raw ip_tables x_tables nf_conntrack ib_srpt ib_uverbs ib_umad ib_cm ib_sa mlx4_ib ib_mad mlx4_core ib_core qla2x00tgt qla2xxx scsi_transport_fc scst_vdisk scst vtms softdog coretemp hwmon dm_mod video backlight output serio_raw rtc_cmos rtc_core rtc_lib button ixgbe i2c_i801 i2c_core uhci_hcd pcspkr
[ 169.128984]
[ 169.128987] Pid: 3440, comm: scstd15 Not tainted 2.6.35-14EIsmp g6os X8DTH/X8DTH-i/6/iF/6F
[ 169.128988] RIP: 0010:[<ffffffff81260347>] [<ffffffff81260347>] io_serial_in+0xc/0x10
[ 169.128996] RSP: 0018:ffff880002403be0 EFLAGS: 00000002
[ 169.128997] RAX: 000000000000f100 RBX: ffffffff821b3020 RCX: 0000000000000000
[ 169.128999] RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff821b3020
[ 169.129001] RBP: 0000000000000000 R08: 000000004be38846 R09: 0000000000000000
[ 169.129002] R10: 0000000000000006 R11: ffffffff811e7b2a R12: 00000000000025fd
[ 169.129004] R13: 0000000000000020 R14: 000000000000004f R15: ffffffff81ca4bd2
[ 169.129006] FS: 0000000000000000(0000) GS:ffff880002400000(0000) knlGS:0000000000000000
[ 169.129008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 169.129009] CR2: 00000000006bb424 CR3: 0000000218ec2000 CR4: 00000000000006f0
[ 169.129011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 169.129012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 169.129014] Process scstd15 (pid: 3440, threadinfo ffff880104328000, task ffff8801043be2c0)
[ 169.129016] Stack:
[ 169.129016] ffffffff8125e6bf ffffffff0000ffff ffffffff821b3020 0000000000000002
[ 169.129018] <0> 0000000000000020 ffffffff821b3020 ffffffff8125e74d 0000000000000003
[ 169.129020] <0> ffffffff81ca4bd4 ffffffff8125e736 ffffffff8125b693 0000000000000005
[ 169.128963] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler iscsi_scst crc32c_intel iptable_filter bonding xt_NOTRACK iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_raw ip_tables x_tables nf_conntrack ib_srpt ib_uverbs ib_umad ib_cm ib_sa mlx4_ib ib_mad mlx4_core ib_core qla2x00tgt qla2xxx scsi_transport_fc scst_vdisk scst vtms softdog coretemp hwmon dm_mod video backlight output serio_raw rtc_cmos rtc_core rtc_lib button ixgbe i2c_i801 i2c_core uhci_hcd pcspkr
[ 169.129023] Call Trace:
[ 169.627756] <IRQ> [<ffffffff8125e6b2>] ? wait_for_xmitr+0x43/0x9b
[ 169.627759] [<ffffffff8125e72e>] ? serial8250_console_putchar+0x17/0x2b
[ 169.627762] [<ffffffff8125e717>] ? serial8250_console_putchar+0x0/0x2b
[ 169.627764] [<ffffffff8125b693>] ? uart_console_write+0x36/0x44
[ 169.627766] [<ffffffff81260243>] ? serial8250_console_write+0xa3/0xe6
[ 169.627769] [<ffffffff8103a6b5>] ? __call_console_drivers+0x64/0x75
[ 169.627771] [<ffffffff8103aabe>] ? release_console_sem+0x111/0x18e
[ 169.627773] [<ffffffff8103b046>] ? vprintk+0x2e3/0x316
[ 169.627775] [<ffffffff8103b0c7>] ? printk+0x4e/0x56
[ 169.627777] [<ffffffff8103b0c7>] ? printk+0x4e/0x56
[ 169.627779] [<ffffffff81053057>] ? enqueue_hrtimer+0x93/0xa2

from gdb, I see the culprit is:
http://lxr.linux.no/#linux+v2.6.35.1...l/8250.c#L1855

(gdb) list *(wait_for_xmitr+0x43)
0x5bb is in wait_for_xmitr (drivers/serial/8250.c:1862).
1857 {
1858 unsigned int status, tmout = 10000;
1859
1860 /* Wait up to 10ms for the character(s) to be sent. */
1861 do {
1862 status = serial_in(up, UART_LSR);
1863
1864
1865 up->lsr_saved_flags |= status & LSR_SAVE_FLAGS;
1866 touch_nmi_watchdog();

The NMI said CPU stuck for 36552 jiffies while this loop only loops for 10ms. I even added a few touch_nmi_watchdog() in wait_for_xmitr() loop but still hit NMI in the same loop.

I am lost. Could someone please help? How could I debug this? Or does it make sense to report to kernel.org bugzilla?

Thanks.

nini09 05-14-2012 02:23 PM

If you check 2.6.31.13 linux kernel, the kernel doesn't touch NMI during 10ms loop. But when doing 1s loop, NMI is touched.


All times are GMT -5. The time now is 07:57 PM.