LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Network/kernel crash on database server (https://www.linuxquestions.org/questions/linux-server-73/network-kernel-crash-on-database-server-929735/)

Phaethar 02-16-2012 11:32 AM

Network/kernel crash on database server
 
Hey all,

I'm looking for some help in trying to figure out why a CentOS 6.2 system has been having it's network service completely crash since applying the new kernel (2.6.32-220.4.2.el6.x86_64) on Tuesday.

The system is a database server, so it's pretty hefty:

(2) Xeon X5650 6-core CPUs (12 cores total, 24 threads)
Asus Z8NA-D6C Server board
48GB ECC Registered Memory
LSI 9260-8i storage controller, SAS array

The system has been running for a while up until Tuesday. When the new kernel came out, I applied it to the system and rebooted it without any issues. However, since then, the system has appeared to go down over night on each of the past 2 nights. As it turns out, it's the network that's down. The system is still up, but no traffic will pass on the network adapter at all. Restarting the network service will not fix it either. A full reboot is required to get the system back up.

Here is the kernel trace that shows up in the logs:

Code:

Feb 15 21:57:37 MySQL kernel: ------------[ cut here ]------------
Feb 15 21:57:37 MySQL kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
Feb 15 21:57:37 MySQL kernel: Hardware name: System Product Name
Feb 15 21:57:37 MySQL kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Feb 15 21:57:37 MySQL kernel: Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_
ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 microcode i2c_i801 i2c_core sg iTCO_wdt
iTCO_vendor_support e1000e snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm
 snd_timer snd soundcore snd_page_alloc ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif
sr_mod cdrom megaraid_sas ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_w
ait_scan]
Feb 15 21:57:37 MySQL kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.2.el6.x86_64 #1
Feb 15 21:57:37 MySQL kernel: Call Trace:
Feb 15 21:57:37 MySQL kernel: <IRQ>  [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0
Feb 15 21:57:37 MySQL kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50
Feb 15 21:57:37 MySQL kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280
Feb 15 21:57:37 MySQL kernel: [<ffffffff8108b3fd>] ? insert_work+0x6d/0xb0
Feb 15 21:57:37 MySQL kernel: [<ffffffff8107bbe5>] ? internal_add_timer+0xb5/0x110
Feb 15 21:57:37 MySQL kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280
Feb 15 21:57:37 MySQL kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340
Feb 15 21:57:37 MySQL kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0
Feb 15 21:57:37 MySQL kernel: [<ffffffff8102ad6d>] ? lapic_next_event+0x1d/0x30
Feb 15 21:57:37 MySQL kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0
Feb 15 21:57:37 MySQL kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250
Feb 15 21:57:37 MySQL kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Feb 15 21:57:37 MySQL kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Feb 15 21:57:37 MySQL kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90
Feb 15 21:57:37 MySQL kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b
Feb 15 21:57:37 MySQL kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
Feb 15 21:57:37 MySQL kernel: <EOI>  [<ffffffff812c49de>] ? intel_idle+0xde/0x170
Feb 15 21:57:37 MySQL kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170
Feb 15 21:57:37 MySQL kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140
Feb 15 21:57:37 MySQL kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
Feb 15 21:57:37 MySQL kernel: [<ffffffff814e5ebc>] ? start_secondary+0x202/0x245
Feb 15 21:57:37 MySQL kernel: ---[ end trace 3d978b072936f556 ]---

I'm not much of a debugger, so I'm not really sure what to make of this report. I do seem to recall seeing that the counts for errors, drops, and overruns get pretty high after this happens. I've yet to see if they start rising before things crash. Right now, after a reboot, everything looks great.

I have replaced the network cable and moved to a different port on the switch.

I've also applied this new kernel to other systems without any issues. Seems to be related to this system only. This is the only system running this kind of hardware however.

Any thoughts or suggestions on what I might be able to look for to solve this?

Thanks!

cbtshare 02-16-2012 03:06 PM

your solution is here, it seems you need to change the initmode

http://sourceforge.net/tracker/index...02&atid=447449


All times are GMT -5. The time now is 05:53 AM.