Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Today my server completely froze and required a hard reboot. The /var/log/messages log have the following traces:
Code:
Feb 26 14:45:39 decatur kernel: SKB BUG: Invalid truesize (488) len=16384, sizeof(sk_buff)=232
Feb 26 14:48:29 decatur kernel: httpd[5675]: segfault at 00007fff96e54fc8 rip 00002aaab0ae127a rsp 00007fff96e54fd0 error 6
The first occurs every so often, and the second occurs very often, up to 5-6 times a minute. Before reboot, the console had very many lines of the first error message, but would not respond to any key prompts. The second message did not appear on the console.
The server is running Fedora Core 5 with Apache 2.2, and is a 64-bit machine.
Can someone please recommend some further steps to take to further diagnose this issue? Running ksymoops seems like an option, but from what I understand, that is for soft kernel panics, and this one definitely seems like a hard one (machine totally frozen). Any suggestions would be most appreciated.
According to that second line, it looks like Apache is the program that is segfaulting, not the kernel itself.
Try shutting Apache down, and see if you still see the log filling up with those error messages. If not, you will at least know where to start your search for the problem.
Is it actually possible that a segfaulting Apache could bring the whole machine down, causing it to freeze as I mentioned? If not, how about a skb bug (problem with the Linux network buffers, from what I understand)? I'd like to target the freezing culprit first, and then tackle the remaining issue(s) afterward.
While Linux is generally very stable, it is still possible for a malfunctioning application to bring the whole machine down. Or at least run the CPU usage so high that the server is for all intents and purposes unable to function and must be powered down manually.
Or it could be that the SKB bug is actually what is causing Apache to segfault in the first place, and there is actually nothing wrong with Apache. That sounds like would could be happening when you said:
Quote:
Before reboot, the console had very many lines of the first error message, but would not respond to any key prompts. The second message did not appear on the console.
If the SKB is the problem, then I am not sure where you would want to go from there. As I understand it, the cause could be in the kernel itself or a buggy network driver. If that is the case, you could first try running with another NIC that uses a different driver (if that is possible in your situation), and if all else fails you could try to switch to another kernel version.
You may also want to try running the machine with a live CD for a few hours (if you can manage the downtime for the server) to see if the error shows up there. That could help rule out a hardware issue at least.
I thought of something else that may have some bearing:
Today we're doing a pretty good amount of traffic (approx 20MB/sec). Is it possible that Apache getting more requests than available threads could cause the machine to completely lock up (I've already bumped up the MaxClients and ThreadsPerChild just in case)? By the way, CPU and memory are doing alright, with lots left of each.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.