Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
We are running CentOS 4.4 (64 bit) kernel 2.6.9-34.ELsmp. There are 2 RAID arrays on the box 1 72 gig array root partition( / ) at RAID level 1 and a 1.8 Tera byte partition at RAID level 5.
Approximately every 2-3 weeks we get a kernel panic. Originally we were having drive issues but have since swapped out to a newer drives and according to the RAID controller the arrays are optimal now but the kernel panic keeps occuring.
The box does a lot of I/O as it is a production web backup machine.
I have not yet had the chance to setup a netdump server so I don't have the whole console but I do have a screen shot of the final lines. It appears to be a syncing issue within ext3. BTW LVM is in use if that is important.
Here is a link to the console image.
Any help is GREATLY appreciated. Even if it is directing me to a different place to ask the question.
From what I can see, it looks like a paging error.
1- Are your swap areas properly initialized?
Not quite sure I know how to tell. I let CentOS handle that configuration when I installed it. It has 2 gigs of swap space. The swap space is part of LVM however. Not sure if that is a good idea or not.
Quote:
Originally Posted by macemoneta
2- Did you run out of swap?
How would I know that? Is there something I can use to monitor swap usage? When I do a swapon -s I see that I have only used 160 however it is running fine now. The machine locks down hard when it panics and you have to power cycle it so postmortem info is hard to come by. I am really not that familiar with knowing what to do in this case anyway.
Not quite sure I know how to tell. I let CentOS handle that configuration when I installed it. It has 2 gigs of swap space. The swap space is part of LVM however. Not sure if that is a good idea or not.
If there was an interruption during the install and you restarted, the installer may think the swap is already initialized and not complete that step if you re-run the install. In that case, you need to manually re-initialize the swap space:
Code:
swapoff -a
mkswap swapdevice
swapon -a
Quote:
How would I know that? Is there something I can use to monitor swap usage? When I do a swapon -s I see that I have only used 160 however it is running fine now. The machine locks down hard when it panics and you have to power cycle it so postmortem info is hard to come by. I am really not that familiar with knowing what to do in this case anyway.
There are many ways to monitor swap. The simplest is to just open a terminal and run:
I will try and get a netdump server configured and a whole punched through the firewall so I can get the full console when in it crashes.
You can also force the system to reboot cleanly. Some one-time preparation is required. Add to /etc/sysctl.conf:
Code:
kernel.sysrq = 1
and also run (as root):
Code:
echo "1" > /proc/sys/kernel/sysrq
This will enable the magic sysrq function. The next time the system panics, press and hold Alt and Sys Req (usually the same key as PrtSc). While holding those down, press the following keys in this sequence, with about a 5 second delay between keys: reisub
If the magic sysreq function was able to get control, at the end of that sequence the system will reboot cleanly. With any luck, you will have your panic in /var/log/messages, or at least some more data to go on.
Last edited by macemoneta; 10-11-2007 at 09:56 PM.
Thank you very much. Never heard of that magic sequence before I will give it a try. I reinitialized swap. I will setup a monitoring script and record the info somewhere so I can the next time it crashes if swap is full or not.
Mmmm - if swap filling was the issue you'd probably hear from the oom_killer. I suspect you have other problems.
Centos should have systat installed - see if you have that running; sar will give you all the info you'll need, and a good history. Running vmstat in background is another option in addition to the commands above.
We are running CentOS 4.4 (64 bit) kernel 2.6.9-34.ELsmp.
Back then there was a bug in SATA support that only affected 64 bit versions. I don't know that this bug affected anything other than SATA (and have no idea whether it affected LVM and/or SCSI), but I'd do a bit of research on that and see if CentOS offers any newer/patched kernels and/or bug reports just in case it had wider ramifications.
From memory, the bug was fixed around 2.6.10, and it wouldn't let you install 64 bit/SATA at all, so may not be relevant in any way to your circumstance, but I'd still want to spend a few minutes checking into it.
Back then there was a bug in SATA support that only affected 64 bit versions. I don't know that this bug affected anything other than SATA (and have no idea whether it affected LVM and/or SCSI), but I'd do a bit of research on that and see if CentOS offers any newer/patched kernels and/or bug reports just in case it had wider ramifications.
From memory, the bug was fixed around 2.6.10, and it wouldn't let you install 64 bit/SATA at all, so may not be relevant in any way to your circumstance, but I'd still want to spend a few minutes checking into it.
Thank you for that tidbit. I will take the time to investigate. One of our options was to wipe and reinstall CentOS 5.0. I would rather just upgrade if that path is available to me.
Mmmm - if swap filling was the issue you'd probably hear from the oom_killer. I suspect you have other problems.
Centos should have systat installed - see if you have that running; sar will give you all the info you'll need, and a good history. Running vmstat in background is another option in addition to the commands above.
Never used any of these commands before. I don't seem to have systat but I do have sar and vmstat. What would sar tell me? Can I look back in time with it? vmstat seems to give up to the minute info, if the kernel panics how would it help then?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.