LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Crashing "BUG: soft lockup - CPU#1 stuck for 11s" (https://www.linuxquestions.org/questions/linux-server-73/crashing-bug-soft-lockup-cpu-1-stuck-for-11s-730024/)

DavidDiggs 06-01-2009 11:01 PM

Crashing "BUG: soft lockup - CPU#1 stuck for 11s"
 
I was running Ubuntu 8.04.2 and it started crashing after 12H or more of operation. So I switched to Debian 5.0.1 only to find the same errors.

Currently running Debian 5.0.1 amd64 with the default kernel

kern.log shows numerous errors like the one below.
After the a crash the first line of the error will repeat the same CPU# and what I assume is the process and pid.

Each crash the CPU and process listed may be different.

I've updated the bios the latest and updated all my packeges.

Currently I'm running memtest to see if its a memory issues. Any help would be wonderful.

Attached is a more detailed listing of the errors

Code:

May 29 10:36:11 Server-x kernel: [30105.568899] BUG: soft lockup - CPU#1 stuck for 11s! [gnome-screensav:10913]
May 29 10:36:11 Server-x kernel: [30105.568905] CPU 1:
May 29 10:36:11 Server-x kernel: [30105.568907] Modules linked in: vmnet vsock(F) vmci vmmon rfcomm l2cap bluetooth nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ppdev ipv6 cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_ondemand freq_table cpufreq_powersave sbs video output sbshc dock container battery iptable_filter ip_tables x_tables reiserfs ac it87 hwmon_vid sbp2 lp loop af_packet snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device serio_raw snd psmouse i2c_nforce2 button parport_pc parport shpchp pci_hotplug k8temp i2c_core soundcore evdev pcspkr ext3 jbd mbcache sg sr_mod cdrom usbhid hid sd_mod pata_amd sata_nv floppy sata_sil24 ohci_hcd ehci_hcd ohci1394 ata_generic ieee1394 forcedeth usbcore pata_acpi libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
May 29 10:36:11 Server-x kernel: [30105.569001] Pid: 10913, comm: gnome-screensav Tainted: GF      2.6.24-24-generic #1
May 29 10:36:11 Server-x kernel: [30105.569003] RIP: 0010:[do_page_fault+0xc0/0x840]  [do_page_fault+0xc0/0x840] do_page_fault+0xc0/0x840
May 29 10:36:11 Server-x kernel: [30105.569010] RSP: 0018:ffff810100007d58  EFLAGS: 00000246
May 29 10:36:11 Server-x kernel: [30105.569012] RAX: ffffffff80629640 RBX: ffff81016fce86e0 RCX: 0000000000001000
May 29 10:36:11 Server-x kernel: [30105.569014] RDX: ffff810080a12000 RSI: 0000000000000000 RDI: ffff810100007e58
May 29 10:36:11 Server-x kernel: [30105.569016] RBP: 0000000000000018 R08: 0000000000001000 R09: 000000000000000a
May 29 10:36:11 Server-x kernel: [30105.569018] R10: 0000000000000000 R11: 0000000000000246 R12: ffff810100007cd8
May 29 10:36:11 Server-x kernel: [30105.569020] R13: 0000000000000246 R14: 0000000000000010 R15: ffffffff8029644a
May 29 10:36:11 Server-x kernel: [30105.569023] FS:  00007f1716c677a0(0000) GS:ffff81017b401900(0000) knlGS:0000000000000000
May 29 10:36:11 Server-x kernel: [30105.569025] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 29 10:36:11 Server-x kernel: [30105.569027] CR2: 00007f171362ca52 CR3: 000000015cbf7000 CR4: 00000000000006e0
May 29 10:36:11 Server-x kernel: [30105.569029] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 29 10:36:11 Server-x kernel: [30105.569031] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400


Matir 06-03-2009 09:13 PM

At work we had an xSeries 460 throwing the same sort of errors: turned out to be a bad CPU. May or may not be the same issue, but ours was fixed upon removing that CPU.

DavidDiggs 06-05-2009 12:43 AM

Sigh. No joy. Memtest gave no errors.

Remove one CPU and still goe the "cpu stuck" message. swapped cpu's now the system just hangs a little after the grub screen.

And the latest Debian and Ubuntu installers both crash immediately and the error messages scroll too fast to be of any use.

I have a feeling its the motherboard, it's been RMA's 3 times and cost me nearly $200 in shipping. It's an obsolete AMD FX system so I'm going to give up on it for now. Unless i can fine a replacement MOBO for super cheap.

Sigh, 1200$ in parts wasting away, for now.


All times are GMT -5. The time now is 01:01 PM.