Kernel bug or hardware problem? What do you think?
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Kernel bug or hardware problem? What do you think?
uh-oh.
My RH 9 box has frozen twice in the past two days. That's twice more than I've ever had linux freeze on me in the past and I don't even know where to start solving this problem - please help me track it down.
Firstly, it froze last night at almost 1am, then it froze again today sometime between 3pm and 1am.
Secondly, the messages in /var/log/messages before the crash indicated that there was a kernelbug - I'll post the text in another message in this thread.
Thirdly, I tried compiling another version of my current kernel 2.4.20-8 but had errors so it did not complete successfully. (Something about devlist.h not found but it was needed by names.o - I haven't had time to research that one yet)
Lastly, after rebooting yet again, I notice my memory check only runs up to ~383MB when this machine has 512MB.
Now here are some possibilities...
1. I screwed something when compiling a new kernel which caused instability of my existing kernel.
2. My RAM is starting to burn out wreaking havoc in my system.
3. syslogd bailed and took out the whole box (errrr.... long shot)
4. Alien hackers used their freeze-death-ray on my poor linux router leaving no trace behind them.
*AH* Where do I start to fix this problem? H/W? Kernel??
Any help is appreciated,
J.
If you only did make bzImage (saw this from your post on the USB thread) it wouldn't have affected your running kernel. (If you had done make install, it might have).
My guess would be bad RAM.
Where are you seeing the memory size message that changed ? Try running the command free and checking if the memory total line agrees with the amount of installed memory.
There's a program called memtest that runs thorough tests on your memory. I can't find a homepage for it but here's the freshmeat url :
Currently, RAM is my best guess too since the memory check at bootup only sees 393,###kb (~383MB) which should be kernel independent.
However, the timing of the memory, kernelBUGandkernel compilation is too close for comfort...
Looking forward, is there anything special I need to do to the kernel if I decide to add or remove RAM? If I'm forced to run 3x128MB instead of the expected 4x128MB, is there anything I need to change or recompile? (arg, I can't believe I have to ask this question. I feel 'new' all over again.)
Sorry, I'm still at work and won't be able to check the memory or memory free until I am home this evening. I'll let you know ASAP my results. I'll also be pulling, pushing, prodding, shoving, nudging, reseating and swearing at the RAM and will post those results as well (perhaps, not the swearing )
Thanks for clearing up the RAM question... I figured as much but wasn't sure if that was a possible cause for the KernelBug.
I will keep you posted on my trials and tribulations.
J.
It turns out I have 2x256MB of ram... How only 128MB didn't register is a new one to me. However, after my poking and prodding, both chips registered correctly and the memtest showed correctly.
Now it gets fun...
I ran free but the Total ram was about 501MB... ok, so mayb 11MB is off hiding someplace; either way, I downloaded and tried to run that memtest from freshmeat (as per above)... BIG MISTAKE I installed and ran it from /tmp/memtest/ and it corrupted the whole /tmp tree! *AH* I have no idea what else it has corrupted but fsck just went NUTS when running in maintenance mode. The errors were too numerous to list here...
Several reboots and different attempts later, I can't boot into X. I've had at least one Kernel Panic and right now, it's just blinking the nVidia splash screen as if it is trying to reload itself everytime it crashes. Oye.
Another couple tries and then a re-install... I hope I didn't lose any /etc configs or /home data... o_O *eek*
I'm very sorry about memtest. It turns out I pointed you to the wrong program.
It turns out the program I was talking about is now called memtest86 (I think it used to be called just memtest). memtest86 doesn't even run under Linux. Its a stand alone program you run from a floppy that just runs RAM tests.
I should have read the Freshmeat description more carefully but it never occured to me there would be an entirely different program with such a similar name.
I'm also having some problems with Redhat 9.0 personal using as a server. It's already hang 2 times.
I've found that it's gradually increasing the memory usage. I'm only running the default applications and servers comes with the redhat 9. I don't know which application or server has got memory leckage. Still finding out.
Don't be sorry 'bout the memtest, I should have read about it before just blasting it at my system.
I was able to save my /home & /etc directories, however, /tmp was beyond repair (how that happened, I don't understand). There was something wrong with the /home tree as well and I had to cp the directories to a new directory before I could get a successful tarball. What a PITA.
Time to blow away the machine and start again... maybe I'll try Fedora core or United. *sigh*
I was able to save my /home & /etc directories, however, /tmp was beyond repair (how that happened, I don't understand).
Well that memtest program is designed to stress test the kernel's memory management system. Doing this with bad RAM is probably a good recipe for making the kernel misbehave badly and file system corruption is certainly a possibility. Its the last thing you'd want to be running in such a situation.
I think I'll try to contact the maintainer of memtest to see if he'd be willing to put a big visible warning in the README about this not being memtest86 and not for testing RAM. Maybe that would help keep this kind of thing from happening to someone else.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.