More and more crashes. PLEASE tell me what this error means!

jeffreybluml · 05-26-2004, 05:44 PM

Seriously, this is getting bad. I just had to try to reboot 14 times before it succeeded. This was after an out-of-the-blue system freeze. This is all I ever find in /var/log/messages:

May 26 04:02:03 localhost kernel: smb_request: result -104, setting invalid
May 26 04:02:03 localhost kernel: smb_retry: successful, new pid=4373, generation=2
May 26 04:05:09 localhost kernel: ------------[ cut here ]------------
May 26 04:05:09 localhost kernel: kernel BUG at page_alloc.c:235!
May 26 04:05:09 localhost kernel: invalid operand: 0000
May 26 04:05:09 localhost kernel: cmpci gameport soundcore agpgart nvidia parport_pc lp parport autofs smbfs via-rhine mii ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables floppy sg
May 26 04:05:09 localhost kernel: CPU: 0
May 26 04:05:09 localhost kernel: EIP: 0060:[<c013b0ad>] Tainted: P
May 26 04:05:09 localhost kernel: EFLAGS: 00010202
May 26 04:05:09 localhost kernel:
May 26 04:05:09 localhost kernel: EIP is at rmqueue [kernel] 0x1fd (2.4.22-1.2188.nptl)
May 26 04:05:09 localhost kernel: eax: 00000040 ebx: c1091af0 ecx: 00001000 edx: 0000308f
May 26 04:05:09 localhost kernel: esi: c033bc58 edi: c033bc90 ebp: c033bc58 esp: d9371e5c
May 26 04:05:09 localhost kernel: ds: 0068 es: 0068 ss: 0068
May 26 04:05:09 localhost kernel: Process prelink (pid: 7314, stackpage=d9371000)
May 26 04:05:09 localhost kernel: Stack: 00000002 c1091af0 d9371e80 0000208f 00000296 00000000 c033bc58 c033bc58
May 26 04:05:09 localhost kernel: c033be2c 00000001 d8f28dfc c013b344 006033d4 00000011 d9371f0c e0afd408
May 26 04:05:09 localhost kernel: dfb90000 006033d4 c033bc58 c033be28 00000000 000001d2 c63d8800 00104025
May 26 04:05:09 localhost kernel: Call Trace: [<c013b344>] __alloc_pages [kernel] 0x64 (0xd9371e88)
May 26 04:05:09 localhost kernel: [<e0afd408>] _nv000183rm [nvidia] 0x750 (0xd9371e98)
May 26 04:05:09 localhost kernel: [<c012ff9c>] do_anonymous_page [kernel] 0x5c (0xd9371ec8)
May 26 04:05:09 localhost kernel: [<c01302a7>] handle_mm_fault [kernel] 0x77 (0xd9371ee0)
May 26 04:05:09 localhost kernel: [<c01178b8>] do_page_fault [kernel] 0x128 (0xd9371f0c)
May 26 04:05:09 localhost kernel: [<e099d37e>] _nv000897rm [nvidia] 0x4e (0xd9371f34)
May 26 04:05:09 localhost kernel: [<c0137bf3>] do_mremap [kernel] 0x6e3 (0xd9371f48)
May 26 04:05:09 localhost kernel: [<e097e9d9>] nv_kern_isr [nvidia] 0x1d (0xd9371f60)
May 26 04:05:09 localhost kernel: [<c0137cec>] sys_mremap [kernel] 0x7c (0xd9371fa0)
May 26 04:05:09 localhost kernel: [<c0117790>] do_page_fault [kernel] 0x0 (0xd9371fb0)
May 26 04:05:09 localhost kernel: [<c01096e8>] error_code [kernel] 0x34 (0xd9371fb8)
May 26 04:05:09 localhost kernel:
May 26 04:05:09 localhost kernel:
May 26 04:05:09 localhost kernel: Code: 0f 0b eb 00 e3 c0 27 c0 8b 43 18 a9 80 00 00 00 74 08 0f 0b
May 26 10:18:46 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 10:18:46 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 10:18:46 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 10:18:46 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:41:50 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:41:50 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:41:50 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:41:50 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:46:07 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:46:07 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:46:07 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:46:07 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:46:18 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:46:18 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:46:18 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:46:18 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:46:23 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:46:23 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:46:23 localhost modprobe: modprobe: Can't locate module sound-slot-1
May 26 13:46:23 localhost modprobe: modprobe: Can't locate module sound-service-1-0
May 26 13:51:05 localhost kernel: <5>smb_trans2_request: result=-104, setting invalid
May 26 13:51:05 localhost kernel: smb_retry: successful, new pid=4373, generation=3

BTW, my sound is working fine...so I don't know what the heck that modprobe error is... about.

PLEASE help out her folks, I finally dumped windows altogether and was doing well with my Fedora Core 1...until now...

sharper · 05-26-2004, 10:36 PM

Winging it here but, it sounds like your system was working and all of a suden started hanging.

If so, did you make any changes to the system shortly before ithe problem started? Like recompiling the kernel, adding or removing some packages, or changing your hardware?

If you have that would be a place to start looking.

darthtux · 05-26-2004, 10:53 PM

I think Sharper is on the right track with did you recompile your kernel. Did you try to update the sound drivers? Install a new pci card?

jeffreybluml · 05-27-2004, 09:57 AM

Near as I can tell, the only thing I did was install some new gtk2 themes from freshmeat.

Could this mess things up? Can a theme be made poorly and create problems?

THanks for replying...

nwhite · 05-27-2004, 06:19 PM

OK, I'm new to all of this myself, so don't take my word as gospel.
Have you checked that you only have the sound cards/services you need in /etc/modules.conf? Commenting out those you don't might be an idea.
I don't know why this could cause a system crash, but then, there's a lot I don't know...

Hope this helped.

darthtux · 05-27-2004, 06:44 PM

If you do a google for "Can't locate module sound-slot-1" there is some info on that. I don't know if that is causing the problem. If you know someone that has a nvidia driver you might see if they are getting similar messages in the relevant nvidia lines above. I don't have your driver but don't have any messages like that. Seems to be an error code while loading nvidia.

TigerOC · 05-28-2004, 02:07 AM

Quote:

Originally posted by jeffreybluml

May 26 04:05:09 localhost kernel: ------------[ cut here ]------------
May 26 04:05:09 localhost kernel: kernel BUG at page_alloc.c:235!
May 26 04:05:09 localhost kernel: invalid operand: 0000
May 26 04:05:09 localhost kernel: cmpci gameport soundcore agpgart nvidia parport_pc lp parport autofs smbfs via-rhine mii ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables floppy sg
May 26 04:05:09 localhost kernel: CPU: 0
May 26 04:05:09 localhost kernel: EIP: 0060:[<c013b0ad>] Tainted: P
May 26 04:05:09 localhost kernel: EFLAGS: 00010202
May 26 04:05:09 localhost kernel:
May 26 04:05:09 localhost kernel: EIP is at rmqueue [kernel] 0x1fd (2.4.22-1.2188.nptl)
May 26 04:05:09 localhost kernel: eax: 00000040 ebx: c1091af0 ecx: 00001000 edx: 0000308f
May 26 04:05:09 localhost kernel: esi: c033bc58 edi: c033bc90 ebp: c033bc58 esp: d9371e5c
May 26 04:05:09 localhost kernel: ds: 0068 es: 0068 ss: 0068
May 26 04:05:09 localhost kernel: Process prelink (pid: 7314, stackpage=d9371000)
May 26 04:05:09 localhost kernel: Stack: 00000002 c1091af0 d9371e80 0000208f 00000296 00000000 c033bc58 c033bc58
May 26 04:05:09 localhost kernel: c033be2c 00000001 d8f28dfc c013b344 006033d4 00000011 d9371f0c e0afd408
May 26 04:05:09 localhost kernel: dfb90000 006033d4 c033bc58 c033be28 00000000 000001d2 c63d8800 00104025
May 26 04:05:09 localhost kernel: Call Trace: [<c013b344>] __alloc_pages [kernel] 0x64 (0xd9371e88)
May 26 04:05:09 localhost kernel: [<e0afd408>] _nv000183rm [nvidia] 0x750 (0xd9371e98)
May 26 04:05:09 localhost kernel: [<c012ff9c>] do_anonymous_page [kernel] 0x5c (0xd9371ec8)
May 26 04:05:09 localhost kernel: [<c01302a7>] handle_mm_fault [kernel] 0x77 (0xd9371ee0)
May 26 04:05:09 localhost kernel: [<c01178b8>] do_page_fault [kernel] 0x128 (0xd9371f0c)
May 26 04:05:09 localhost kernel: [<e099d37e>] _nv000897rm [nvidia] 0x4e (0xd9371f34)
May 26 04:05:09 localhost kernel: [<c0137bf3>] do_mremap [kernel] 0x6e3 (0xd9371f48)
May 26 04:05:09 localhost kernel: [<e097e9d9>] nv_kern_isr [nvidia] 0x1d (0xd9371f60)
May 26 04:05:09 localhost kernel: [<c0137cec>] sys_mremap [kernel] 0x7c (0xd9371fa0)
May 26 04:05:09 localhost kernel: [<c0117790>] do_page_fault [kernel] 0x0 (0xd9371fb0)
May 26 04:05:09 localhost kernel: [<c01096e8>] error_code [kernel] 0x34 (0xd9371fb8)
.

The report is that there is a kernel bug;

May 26 04:05:09 localhost kernel: ------------[ cut here ]------------
May 26 04:05:09 localhost kernel: kernel BUG at page_alloc.c:235!
May 26 04:05:09 localhost kernel: invalid operand: 0000

This should be submitted as a bug report. The messages lower down that I have highlighted I have also never seen before.

Which kernel is this?

Googled around and it seems this could be a hardware problem ?? memory; have a look at this thread.

jeffreybluml · 05-28-2004, 11:00 AM

Thanks for the insight. How do I handle submitting a bug report?

I haven't made any hardware changes since I installed Fedora back in Feb. Strangely, I also haven't had this problme re-occur since the 25th (it;'s now the 28th).
I had tried moving around some configuration files from my home directory (in respose to another thread I had posted) and things seem back to normal. I never really ended up changing anything, so I don't know what the heck "fixed" the problem I was having....Man I hate when that happens...

Anyways, thanks for replying. Pleasae advise on how to submit the bug report, and whatever else you think I ought do here.

THanks again...

TigerOC · 05-28-2004, 12:01 PM

First off I do not think this is a software problem. I deal a lot with hardware and I would take a guess that you may have a problem with your power supply. Most modern processors have very high demand on the 5V line and this is especially so during the boot phase. A modern processor can demand very high currents at 100% load and if there is a problem on the 5V line then it leads to computational errors. Under normal circumstances you would not have any problems because the processor never approaches a 100% usage. What I find is that PSU's can degrade over about a year and then funny things start happening that cannot really be explained. It happened in my own system recently. Ideally you need to monitor your system with an app like gkrellm under full load with an app like cpuburn. If your 5V line starts to fall below 4.85V then more than likely you have a problem. If you don't have gkrellm then go into your bios and have a look at the health status. If you are getting idle readings around 4.9-4.95 I would be very suspicious. You can also run checks on your memory with memtest86.

jeffreybluml · 05-28-2004, 12:25 PM

Wow, that's the kind of in depth analysis I like to see!

I'm an electronic tech, so here's a question for you...

Any adverse effects that you can imagine from sticking my DMM (digital multi-meter) on the 5v line and monitoring that way? It's a bit more real-time and I'd trust it more than the interpretation of a program.

GREAT reply, and makes a tremendous amount of sense to me. Good to see somebody thinking outside the usual...

Thanks!

Genesee · 05-28-2004, 12:29 PM

TigerOC - excellent post, thanks for the insight

jeffreybluml · 05-28-2004, 12:53 PM

Oh boy, that's not what I was expecting...

Got my DMM sitting on there now. Currently, just after a reboot, it's at 5.06V. The problem is, in the early stages of the boot process (from POST to new hardware check), it was ramping up and down from ~7.5 all the way to ~13V!! Any ideas as to what could be causing this? Are there some bi-direct level-shifters on the Mboard that are getting set in the wrong direction? Any other thoughts?

It has yet to dip below 5, so I'm ruling out supply droop. This line being pumped up by something has me concerned though. Having never dealt with this much, I don't know whether my concerns are warranted or not. I know that the line will obviously have a good amount of ESD protection, so I'm not really worried about it damaging anything, but I'm worried that it is a symptom of something else.

I'd really appreciate anybody/everybody's experience here. Let me know what ya'll think.

THANKS again to TigerOC for leading me into this!

TigerOC · 05-28-2004, 02:20 PM

If you don't have monitoring software around then using a MM you can get readings off the mosfets. The mosfets are the black things with 3 legs on them and they are in pairs around the area of the processor. This is NOT A RECOMMENDED THING TO DO - IF YOU CAUSE A SHORT CIRCUIT YOU COULD DAMAGE M/BOARD & PROCESSOR. Apply one probe to right leg and the other to the centre leg (facing the legs). If the reading is negative switch the probes around. The mosfets stabilise the current to the processor.
If you are getting massive swings in the 12V line this can also be a major problem because that feeds your hard drives and can cause corruptions.

jeffreybluml · 05-28-2004, 02:29 PM

hmmm, my 12v line is at a constant 12.46V. A bit high, obviously, but a problem?

BTW, I'm just probing on the extra power connector (like that which goes onto a CD Rom drive) to get the voltages. I assume these are all more or less in parrellel with everything on the board, so is this a sufficient spot to probe?

Thanks again...

TigerOC · 05-28-2004, 02:37 PM

As long as the voltages are within 10% of spec then its not a problem. If you were getting readings of 7.5 - 13V on the 5V line then there is something seriously wrong. There is nothing that would feed back on the line. The PSU is designed to stabilise the supply within limits. I have done a lot of overclocking and this is the one area that is critical to getting max performance out of a processor. Have even played with 2 PSU's feeding a m/board one for the hard drives and the other just for the cpu. One of the biggest failures of commercial manufacturers is decent PSU's. If you are running an Athlon or PIV the PSU should have a minimum of 200W on the 3.3&5V line.