Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
11-30-2010, 04:27 PM
|
#16
|
LQ Guru
Registered: Dec 2007
Distribution: Centos
Posts: 5,286
|
Quote:
Originally Posted by nomb
If you are refering to the memory.txt I posted, it had stayed hung. I had to reboot the box.
|
I looked at "Slab: 805164 kB" in the second to last line of the file you posted and "Slab: 104040 kB" in the last line. Did I misunderstand: Was that last line after reboot?
Staying hung does not necessarily mean the resource hasn't been released. Enough damage may have been done to the system state while the kernel was out of memory, that releasing the memory does not make the system functional again.
What applications are running on the system? You may want to experimentally kill processes yourself to find out which one is driving the memory leak.
If you suspect a specific process, you might also try
ls -l /proc/ pid/fd/
replacing pid with the pid of the process you suspect.
That will tell you the open files of that process, which may directly or indirectly tell you what resource it might be leaking.
Last edited by johnsfine; 11-30-2010 at 04:33 PM.
|
|
|
11-30-2010, 07:04 PM
|
#17
|
Member
Registered: Jan 2008
Distribution: RHEL, CentOS, Ubuntu
Posts: 379
Rep:
|
I may be completely talking out my back end but here goes...
We had a problem running apache on a linux system about 6-7 years ago. It was running redhat on a non-x86 architecture (maybe alpha?). But we found a semaphore problem that we could see killing the machine using ipcm -a. The apache process would create "Semaphore Arrays" until the machine choked and died. We could remove the large number of semaphore arrays with ipcrm -s semid once every couple of weeks to keep the machine from killing itself until we just moved the service elsewhere, never found a solution.
This is similar to the problem we had. http://archive.apache.org
Good luck, sorry if this is totally misguided.
|
|
|
11-30-2010, 07:30 PM
|
#18
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,439
|
Been busy while I was asleep I see.
You cannot do anything about slab errors. This is *always* a kernel bug. I see there was a problem in 2.6.13 that mirrors this pretty well. What kernel are you using ?.
There has been a lot of changes to the slab allocator (now SLUB for most users) - mostly efficiency of packing and reclaim. I'd recommend as recent a kernel as you can get.
|
|
|
11-30-2010, 07:37 PM
|
#19
|
LQ Guru
Registered: Dec 2007
Distribution: Centos
Posts: 5,286
|
Quote:
Originally Posted by nomb
RHEL 5.5
|
That is more recent than my Centos version. I don't know the exact kernel version of RHEL 5.5 nor how recent it is. But I sure hope syg00 is not saying RHEL 5.5 isn't recent enough.
Quote:
Originally Posted by syg00
You cannot do anything about slab errors. This is *always* a kernel bug.
|
I'm pretty sure that is incorrect. Open files and many other kinds of "resource" at the application level take up memory in the kernel slab allocator.
If an application leaks such resources, then the kernel slab allocation may grow unacceptably large. No kernel bug is needed. If you stop the application resource leak, you fix the excess use of slab memory.
Quote:
Originally Posted by syg00
I see there was a problem in 2.6.13 that mirrors this pretty well.
|
Are you talking about the problem that got a lot of discussion five years ago, including this link which is on top of the google hits for this topic (if you don't specifically ask for recent)?
http://lkml.indiana.edu/hypermail/li...10.0/1540.html
That sure would sound relevant if I believed RHEL 5.5 includes a bug that got a lot of discussion five years ago.
The dramatic lack of such discussion when you tell google to restrict to this year, is probably a hint that someone did a good job of fixing that problem. So I don't think RHEL 5.5 still (or even again) includes that bug. So despite superficial similarity, I think the problem in this thread has some other cause.
Last edited by johnsfine; 11-30-2010 at 07:57 PM.
|
|
|
12-01-2010, 11:52 AM
|
#20
|
Member
Registered: Jan 2006
Distribution: Debian Testing
Posts: 675
Original Poster
Rep:
|
I deleted the contents because it didn't really flow with the below.
Last edited by nomb; 12-01-2010 at 12:33 PM.
|
|
|
12-01-2010, 12:22 PM
|
#21
|
Member
Registered: Jan 2006
Distribution: Debian Testing
Posts: 675
Original Poster
Rep:
|
Wow, somehow I missed a whole page of replies...
Quote:
Originally Posted by johnsfine
I looked at "Slab: 805164 kB" in the second to last line of the file you posted and "Slab: 104040 kB" in the last line. Did I misunderstand: Was that last line after reboot?
|
Yes the very last line was right after the reboot.
Quote:
Originally Posted by johnsfine
What applications are running on the system? You may want to experimentally kill processes yourself to find out which one is driving the memory leak.
|
Really nothing, at this point is is just a RHEL installation with IA done to it. Things like auditing, password requirements, etc.
Quote:
Originally Posted by syg00
I'd recommend as recent a kernel as you can get.
|
I am running 2.6.28-194.26.1.el5PAE.
Quote:
Originally Posted by johnsfine
If an application leaks such resources, then the kernel slab allocation may grow unacceptably large. No kernel bug is needed. If you stop the application resource leak, you fix the excess use of slab memory.
|
I think that is why systemtap was recomended to me. I am suppose to be able to use it to, I think, look into the names_cache slab and determine which userspace process is leaking memory. Problem is I have no experience with it.
This was given to me in IRC which prints all of the gets and puts of namei.c. But it was uncertian if that would show the problem or not. And to be honest I'm not sure what I'm looking for.
Last edited by nomb; 12-01-2010 at 12:31 PM.
|
|
|
12-01-2010, 01:40 PM
|
#22
|
LQ Guru
Registered: Dec 2007
Distribution: Centos
Posts: 5,286
|
Quote:
Originally Posted by nomb
Really nothing, at this point is is just a RHEL installation with IA done to it. Things like auditing, password requirements, etc.
|
Now you're making it sound a lot more like a kernel bug and especially like that kernel bug in the five year old discussion I linked above.
I can't believe it is that same five year old bug, but programmers sometimes reintroduce bugs. So it could be a new kernel bug similar to that old one.
If whoever is supporting you from Red Hat believed it was a kernel bug, they could escalate it to someone at Red Hat who could diagnose that bug. (If it is a resource leak in one of your application programs, Red Hat would never get real work done if they routinely escalated such problems to someone who could diagnose a kernel bug).
|
|
|
12-01-2010, 01:57 PM
|
#23
|
Member
Registered: Jan 2006
Distribution: Debian Testing
Posts: 675
Original Poster
Rep:
|
Quote:
Originally Posted by johnsfine
Now you're making it sound a lot more like a kernel bug and especially like that kernel bug in the five year old discussion I linked above.
I can't believe it is that same five year old bug, but programmers sometimes reintroduce bugs. So it could be a new kernel bug similar to that old one.
If whoever is supporting you from Red Hat believed it was a kernel bug, they could escalate it to someone at Red Hat who could diagnose that bug. (If it is a resource leak in one of your application programs, Red Hat would never get real work done if they routinely escalated such problems to someone who could diagnose a kernel bug).
|
The problem with that though is that that bug seems to be only affected by auditing? I turned auditing off and the problem still persists.
|
|
|
12-01-2010, 05:45 PM
|
#24
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,439
|
Systemtap puts probes into the kernel itself. Not generally recommended for production environments - but there are exceptions. Like here.
You'll need to install systemtap itself, and you'll also need a kernel with debug info compiled in - Redhat provide them for all their supported kernels. It'll be big - better than 200 Meg on my F12 system.
<Edit:> The size comment is only relevant for download - you don't boot this kernel, it merely needs to be available for the debug info </Edit:>
Save that stp file - it'll show all the entry and exit calls to the module that was compiled from the code in namei.c. It'll indent the print-out so you can see nested calls and their returns. It has no end test, so you'll need to stop it yourself - <Ctrl>-C if running foreground.
That will produce a *lot* of output - I just tested it on a quiet system running F/F and claws client. Three second generated 375 lines - 225 calls, but only 150 returns. So a simple mis-match isn't necessarily going to indicate a problem - you'd have to have enough data (across enough time) to smooth out the calls. Even 30 seconds showed the same disparity.Run the stp (as root) by
Code:
stap names1-trace-indent.stp > trace.out
You may be able to analyze it, or see if your friendly RH support folks want it.
Last edited by syg00; 12-01-2010 at 06:01 PM.
|
|
|
12-01-2010, 07:40 PM
|
#25
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,439
|
Just for interest, some quick awk over the (roughly) 30 sec sample shows this
Code:
calls
303 irqbalance(1326):
87 clock-applet(2513):
21 nautilus(2238):
42 udisks-daemon(2262):
9 firefox(6641):
24 upowerd(1931):
3 firefox(4109):
126 gnome-panel(2224):
87 gvfs-afc-volume(2499):
24 gconfd-2(2206):
18 sendmail(4033):
96 gnome-settings-(2214):
36 hald(1509):
1890 firefox(2564):
3 firefox(4110):
42 gnome-screensav(2519):
returns
202 irqbalance(1326):
58 clock-applet(2513):
14 nautilus(2238):
28 udisks-daemon(2262):
6 firefox(6641):
16 upowerd(1931):
2 firefox(4109):
84 gnome-panel(2224):
58 gvfs-afc-volume(2499):
16 gconfd-2(2206):
12 sendmail(4033):
64 gnome-settings-(2214):
24 hald(1509):
1260 firefox(2564):
2 firefox(4110):
28 gnome-screensav(2519):
The summation would be better done in the stap code - saves writing boat loads of output. I may have time to knock something up later.
|
|
|
All times are GMT -5. The time now is 09:19 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|