LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-26-2003, 11:40 PM   #1
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Rep: Reputation: 30
Kernel bug or hardware problem? What do you think?


uh-oh.

My RH 9 box has frozen twice in the past two days. That's twice more than I've ever had linux freeze on me in the past and I don't even know where to start solving this problem - please help me track it down.

Firstly, it froze last night at almost 1am, then it froze again today sometime between 3pm and 1am.

Secondly, the messages in /var/log/messages before the crash indicated that there was a kernel bug - I'll post the text in another message in this thread.

Thirdly, I tried compiling another version of my current kernel 2.4.20-8 but had errors so it did not complete successfully. (Something about devlist.h not found but it was needed by names.o - I haven't had time to research that one yet)

Lastly, after rebooting yet again, I notice my memory check only runs up to ~383MB when this machine has 512MB.

Now here are some possibilities...
1. I screwed something when compiling a new kernel which caused instability of my existing kernel.
2. My RAM is starting to burn out wreaking havoc in my system.
3. syslogd bailed and took out the whole box (errrr.... long shot)
4. Alien hackers used their freeze-death-ray on my poor linux router leaving no trace behind them.

*AH* Where do I start to fix this problem? H/W? Kernel??
Any help is appreciated,
J.

(text from logs to follow)
 
Old 10-26-2003, 11:45 PM   #2
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Original Poster
Rep: Reputation: 30
Last and first messages from reboot today...

Oct 26 14:39:14 Alpha syslogd 1.4.1: restart.
Oct 27 01:14:26 Alpha syslogd 1.4.1: restart.
Oct 27 01:14:26 Alpha syslog: syslogd startup succeeded
 
Old 10-26-2003, 11:47 PM   #3
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Original Poster
Rep: Reputation: 30
Log from last night...

Oct 26 00:21:59 Alpha kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000074
Oct 26 00:21:59 Alpha kernel: printing eip:
Oct 26 00:21:59 Alpha kernel: c0140d9b
Oct 26 00:21:59 Alpha kernel: *pde = 00000000
Oct 26 00:21:59 Alpha kernel: Oops: 0000
Oct 26 00:21:59 Alpha kernel: udf ipt_limit sg emu10k1 ac97_codec sound soundcore printer sr_mod agpgart nvidia parport_pc lp parport autofs ipt_MASQUERADE ipt_state ipt_LOG iptable_mangle
Oct 26 00:21:59 Alpha kernel: CPU: 0
Oct 26 00:21:59 Alpha kernel: EIP: 0060:[<c0140d9b>] Tainted: P
Oct 26 00:21:59 Alpha kernel: EFLAGS: 00210202
Oct 26 00:21:59 Alpha kernel:
Oct 26 00:21:59 Alpha kernel: EIP is at page_referenced [kernel] 0x227 (2.4.20-8)
Oct 26 00:21:59 Alpha kernel: eax: c1000030 ebx: 00000001 ecx: 00000000 edx: 00000001
Oct 26 00:21:59 Alpha kernel: esi: 0000000d edi: dc893a40 ebp: 00000001 esp: dffb3f84
Oct 26 00:21:59 Alpha kernel: ds: 0068 es: 0068 ss: 0068
Oct 26 00:21:59 Alpha kernel: Process kscand/Normal (pid: 7, stackpage=dffb3000)
Oct 26 00:21:59 Alpha kernel: Stack: dc893680 00000000 00000000 dffb3fb4 c1532650 c1532650 c0303a0c c11116dc
Oct 26 00:21:59 Alpha kernel: 00000003 c0139ade dffb2000 c0124b2c 00000001 00000003 dffb2000 c0303900
Oct 26 00:21:59 Alpha kernel: dffb2000 c013a924 c0303900 00000003 00000001 c025618c 000009c4 c013a868
Oct 26 00:21:59 Alpha kernel: Call Trace: [<c0139ade>] scan_active_list [kernel] 0x36 (0xdffb3fa8))
Oct 26 00:21:59 Alpha kernel: [<c0124b2c>] process_timeout [kernel] 0x0 (0xdffb3fb0))
Oct 26 00:21:59 Alpha kernel: [<c013a924>] kscand [kernel] 0xbc (0xdffb3fc8))
Oct 26 00:21:59 Alpha kernel: [<c013a868>] kscand [kernel] 0x0 (0xdffb3fe0))
Oct 26 00:21:59 Alpha kernel: [<c0107389>] kernel_thread_helper [kernel] 0x5 (0xdffb3ff0))
Oct 26 00:21:59 Alpha kernel:
Oct 26 00:21:59 Alpha kernel:
Oct 26 00:21:59 Alpha kernel: Code: 8b 41 74 39 41 60 0f 43 54 24 04 45 4e 89 54 24 04 0f 89 3e
Oct 26 00:31:49 Alpha modprobe: modprobe: Can't locate module sound-slot-1
Oct 26 00:31:49 Alpha modprobe: modprobe: Can't locate module sound-service-1-0
Oct 26 00:31:49 Alpha modprobe: modprobe: Can't locate module sound-slot-1
Oct 26 00:31:49 Alpha modprobe: modprobe: Can't locate module sound-service-1-0
Oct 26 00:47:10 Alpha kernel: ------------[ cut here ]------------
Oct 26 00:47:10 Alpha kernel: kernel BUG at page_alloc.c:139!
Oct 26 00:47:10 Alpha kernel: invalid operand: 0000
Oct 26 00:47:10 Alpha kernel: udf ipt_limit sg emu10k1 ac97_codec sound soundcore printer sr_mod agpgart nvidia parport_pc lp parport autofs ipt_MASQUERADE ipt_state ipt_LOG iptable_mangle
Oct 26 00:47:10 Alpha kernel: CPU: 0
Oct 26 00:47:10 Alpha kernel: EIP: 0060:[<c013b57d>] Tainted: P
Oct 26 00:47:10 Alpha kernel: EFLAGS: 00210282
Oct 26 00:47:10 Alpha kernel:
Oct 26 00:47:10 Alpha kernel: EIP is at __free_pages_ok [kernel] 0xdd (2.4.20-8)
Oct 26 00:47:10 Alpha kernel: eax: 01000018 ebx: c1532650 ecx: c1000030 edx: dc893a40
Oct 26 00:47:10 Alpha kernel: esi: 00000000 edi: 00000000 ebp: 00000000 esp: d43bddec
Oct 26 00:47:10 Alpha kernel: ds: 0068 es: 0068 ss: 0068
Oct 26 00:47:10 Alpha kernel: Process wish (pid: 4180, stackpage=d43bd000)
Oct 26 00:47:10 Alpha kernel: Stack: 000075ff 00200296 c0303b84 00200296 c0303900 c1038030 c0303b0c cf806374
Oct 26 00:47:10 Alpha kernel: cf806374 00100000 c1532650 cf806374 00100000 17c1c045 c012c6c8 c1532650
Oct 26 00:47:10 Alpha kernel: 00093000 c012eab7 c4afc0c0 08893000 cf806374 c0118ce7 00000094 08c00000
Oct 26 00:47:10 Alpha kernel: Call Trace: [<c012c6c8>] __free_pte [kernel] 0x4c (0xd43bde24))
Oct 26 00:47:10 Alpha kernel: [<c012eab7>] zap_pte_range [kernel] 0x12f (0xd43bde30))
Oct 26 00:47:10 Alpha kernel: [<c0118ce7>] sys_sched_yield [kernel] 0x73 (0xd43bde40))
Oct 26 00:47:10 Alpha kernel: [<c012cd1b>] zap_page_range [kernel] 0xc7 (0xd43bde58))
Oct 26 00:47:10 Alpha kernel: [<c012ffcf>] exit_mmap [kernel] 0xb3 (0xd43bde98))
Oct 26 00:47:10 Alpha kernel: [<c01196bb>] mmput [kernel] 0x47 (0xd43bdebc))
Oct 26 00:47:10 Alpha kernel: [<c011e991>] do_exit [kernel] 0xf1 (0xd43bdecc))
Oct 26 00:47:10 Alpha kernel: [<c011ec08>] do_group_exit [kernel] 0x50 (0xd43bdee8))
Oct 26 00:47:10 Alpha kernel: [<c012674d>] get_signal_to_deliver [kernel] 0x19d (0xd43bdef8))
Oct 26 00:47:10 Alpha kernel: [<c0109184>] do_signal [kernel] 0x68 (0xd43bdf20))
Oct 26 00:47:11 Alpha kernel: [<e081d03d>] ext3_file_write [ext3] 0x39 (0xd43bdf78))
Oct 26 00:47:11 Alpha kernel: [<c01268f8>] sys_rt_sigprocmask [kernel] 0xc8 (0xd43bdf94))
Oct 26 00:47:11 Alpha kernel: [<c01093ec>] signal_return [kernel] 0x14 (0xd43bdfc0))
Oct 26 00:47:11 Alpha kernel:
Oct 26 00:47:11 Alpha kernel:
Oct 26 00:47:11 Alpha kernel: Code: 0f 0b 8b 00 9b 61 25 c0 8b 43 18 89 f9 89 de 83 e0 eb 89 43
Oct 26 00:47:16 Alpha gdm(pam_unix)[2445]: session closed for user <name removed>
Oct 26 00:47:18 Alpha su(pam_unix)[6145]: session closed for user <name removed>
Oct 26 00:47:18 Alpha su(pam_unix)[14803]: session closed for user <name removed>
Oct 26 00:47:20 Alpha gdm[2445]: gdm_slave_xioerror_handler: Fatal X error - Restarting :0
Oct 26 00:49:16 Alpha gconfd (<name removed>-5848): GConf server is not in use, shutting down.
Oct 26 00:49:16 Alpha gconfd (<name removed>-5848): Exiting
 
Old 10-27-2003, 10:28 AM   #4
tgflynn
Member
 
Registered: Oct 2003
Location: Rochester, New York (USA)
Distribution: Debian
Posts: 119

Rep: Reputation: 15
If you only did make bzImage (saw this from your post on the USB thread) it wouldn't have affected your running kernel. (If you had done make install, it might have).

My guess would be bad RAM.

Where are you seeing the memory size message that changed ? Try running the command free and checking if the memory total line agrees with the amount of installed memory.

There's a program called memtest that runs thorough tests on your memory. I can't find a homepage for it but here's the freshmeat url :

http://freshmeat.net/projects/memtest/?topic_id=136

If memory serves you need to install it on a floppy and then boot the floppy. I think the tarball contains detailed instructions.

If it does turn out to be a RAM problem you might want to try reseating the DIMM's before buying new memory.

Tim
 
Old 10-27-2003, 12:57 PM   #5
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Original Poster
Rep: Reputation: 30
Thanks for the reply.

Currently, RAM is my best guess too since the memory check at bootup only sees 393,###kb (~383MB) which should be kernel independent.

However, the timing of the memory, kernel BUG and kernel compilation is too close for comfort...

Looking forward, is there anything special I need to do to the kernel if I decide to add or remove RAM? If I'm forced to run 3x128MB instead of the expected 4x128MB, is there anything I need to change or recompile? (arg, I can't believe I have to ask this question. I feel 'new' all over again.)
 
Old 10-27-2003, 01:28 PM   #6
tgflynn
Member
 
Registered: Oct 2003
Location: Rochester, New York (USA)
Distribution: Debian
Posts: 119

Rep: Reputation: 15
The kernel bug may very well be a symptom of bad memory.

Again if all you did was compile (no install) that really shouldn't have affected anything.

Did you check the memory size given by free ?

You don't have to do anything to the kernel if you change RAM DIMM's. Its purely a hardware matter.

Tim
 
Old 10-27-2003, 01:31 PM   #7
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Original Poster
Rep: Reputation: 30
Sorry, I'm still at work and won't be able to check the memory or memory free until I am home this evening. I'll let you know ASAP my results. I'll also be pulling, pushing, prodding, shoving, nudging, reseating and swearing at the RAM and will post those results as well (perhaps, not the swearing )

Thanks for clearing up the RAM question... I figured as much but wasn't sure if that was a possible cause for the Kernel Bug.

I will keep you posted on my trials and tribulations.
J.
 
Old 10-27-2003, 07:14 PM   #8
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Original Poster
Rep: Reputation: 30
Oh wow... totally pooched.

It turns out I have 2x256MB of ram... How only 128MB didn't register is a new one to me. However, after my poking and prodding, both chips registered correctly and the memtest showed correctly.

Now it gets fun...

I ran free but the Total ram was about 501MB... ok, so mayb 11MB is off hiding someplace; either way, I downloaded and tried to run that memtest from freshmeat (as per above)... BIG MISTAKE I installed and ran it from /tmp/memtest/ and it corrupted the whole /tmp tree! *AH* I have no idea what else it has corrupted but fsck just went NUTS when running in maintenance mode. The errors were too numerous to list here...

Several reboots and different attempts later, I can't boot into X. I've had at least one Kernel Panic and right now, it's just blinking the nVidia splash screen as if it is trying to reload itself everytime it crashes. Oye.

Another couple tries and then a re-install... I hope I didn't lose any /etc configs or /home data... o_O *eek*
 
Old 10-27-2003, 08:23 PM   #9
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Original Poster
Rep: Reputation: 30
edit: removed my comments. The problem has been narrowed down to a partially working stick of RAM.

Last edited by JordanH; 10-27-2003 at 08:49 PM.
 
Old 10-27-2003, 08:53 PM   #10
tgflynn
Member
 
Registered: Oct 2003
Location: Rochester, New York (USA)
Distribution: Debian
Posts: 119

Rep: Reputation: 15
I'm very sorry about memtest. It turns out I pointed you to the wrong program.

It turns out the program I was talking about is now called memtest86 (I think it used to be called just memtest). memtest86 doesn't even run under Linux. Its a stand alone program you run from a floppy that just runs RAM tests.

I should have read the Freshmeat description more carefully but it never occured to me there would be an entirely different program with such a similar name.

Tim
 
Old 10-27-2003, 09:24 PM   #11
moeminhtun
Member
 
Registered: Dec 2002
Location: Singapore
Distribution: Fedora Core 6
Posts: 647

Rep: Reputation: 30
I'm also having some problems with Redhat 9.0 personal using as a server. It's already hang 2 times.
I've found that it's gradually increasing the memory usage. I'm only running the default applications and servers comes with the redhat 9. I don't know which application or server has got memory leckage. Still finding out.
 
Old 10-27-2003, 10:42 PM   #12
JordanH
Member
 
Registered: Oct 2003
Location: Toronto, Canada
Distribution: Ubuntu, FC3, RHEL 3-4 AS Retired: SuSE 9.1 Pro, RedHat 6-9, FC1-2
Posts: 360

Original Poster
Rep: Reputation: 30
Don't be sorry 'bout the memtest, I should have read about it before just blasting it at my system.

I was able to save my /home & /etc directories, however, /tmp was beyond repair (how that happened, I don't understand). There was something wrong with the /home tree as well and I had to cp the directories to a new directory before I could get a successful tarball. What a PITA.

Time to blow away the machine and start again... maybe I'll try Fedora core or United. *sigh*
 
Old 10-28-2003, 12:55 AM   #13
Robert0380
LQ Guru
 
Registered: Apr 2002
Location: Atlanta
Distribution: Gentoo
Posts: 1,280

Rep: Reputation: 47
GENTOO!!!!!
 
Old 10-28-2003, 06:42 AM   #14
tgflynn
Member
 
Registered: Oct 2003
Location: Rochester, New York (USA)
Distribution: Debian
Posts: 119

Rep: Reputation: 15
Quote:
Originally posted by JordanH


I was able to save my /home & /etc directories, however, /tmp was beyond repair (how that happened, I don't understand).

Well that memtest program is designed to stress test the kernel's memory management system. Doing this with bad RAM is probably a good recipe for making the kernel misbehave badly and file system corruption is certainly a possibility. Its the last thing you'd want to be running in such a situation.

I think I'll try to contact the maintainer of memtest to see if he'd be willing to put a big visible warning in the README about this not being memtest86 and not for testing RAM. Maybe that would help keep this kind of thing from happening to someone else.

Tim
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
National Language Support = utf8 and vfat problem (possible kernel bug) Not now, John! Linux - General 1 08-31-2007 09:52 PM
Kernel 2.6 bug?!! oldi Slackware 1 05-30-2004 02:26 PM
kernel upgrade -- bug?? KMcD Slackware 1 03-10-2004 08:43 AM
kernel compilation - bug? NarutoKun Fedora 2 01-22-2004 08:33 AM
a bug maybe in the new kernel 2.4.19? Frustin Linux - General 2 08-06-2002 04:20 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration