LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices



Reply
 
Search this Thread
Old 11-30-2010, 05:27 PM   #16
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,139

Rep: Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127

Quote:
Originally Posted by nomb View Post
If you are refering to the memory.txt I posted, it had stayed hung. I had to reboot the box.
I looked at "Slab: 805164 kB" in the second to last line of the file you posted and "Slab: 104040 kB" in the last line. Did I misunderstand: Was that last line after reboot?

Staying hung does not necessarily mean the resource hasn't been released. Enough damage may have been done to the system state while the kernel was out of memory, that releasing the memory does not make the system functional again.

What applications are running on the system? You may want to experimentally kill processes yourself to find out which one is driving the memory leak.

If you suspect a specific process, you might also try
ls -l /proc/pid/fd/
replacing pid with the pid of the process you suspect.
That will tell you the open files of that process, which may directly or indirectly tell you what resource it might be leaking.

Last edited by johnsfine; 11-30-2010 at 05:33 PM.
 
Old 11-30-2010, 08:04 PM   #17
frndrfoe
Member
 
Registered: Jan 2008
Distribution: RHEL, CentOS
Posts: 375

Rep: Reputation: 38
I may be completely talking out my back end but here goes...
We had a problem running apache on a linux system about 6-7 years ago. It was running redhat on a non-x86 architecture (maybe alpha?). But we found a semaphore problem that we could see killing the machine using ipcm -a. The apache process would create "Semaphore Arrays" until the machine choked and died. We could remove the large number of semaphore arrays with ipcrm -s semid once every couple of weeks to keep the machine from killing itself until we just moved the service elsewhere, never found a solution.

This is similar to the problem we had. http://archive.apache.org

Good luck, sorry if this is totally misguided.
 
Old 11-30-2010, 08:30 PM   #18
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,491

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
Been busy while I was asleep I see.
You cannot do anything about slab errors. This is *always* a kernel bug. I see there was a problem in 2.6.13 that mirrors this pretty well. What kernel are you using ?.

There has been a lot of changes to the slab allocator (now SLUB for most users) - mostly efficiency of packing and reclaim. I'd recommend as recent a kernel as you can get.
 
Old 11-30-2010, 08:37 PM   #19
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,139

Rep: Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127
Quote:
Originally Posted by nomb View Post
RHEL 5.5
That is more recent than my Centos version. I don't know the exact kernel version of RHEL 5.5 nor how recent it is. But I sure hope syg00 is not saying RHEL 5.5 isn't recent enough.

Quote:
Originally Posted by syg00 View Post
You cannot do anything about slab errors. This is *always* a kernel bug.
I'm pretty sure that is incorrect. Open files and many other kinds of "resource" at the application level take up memory in the kernel slab allocator.

If an application leaks such resources, then the kernel slab allocation may grow unacceptably large. No kernel bug is needed. If you stop the application resource leak, you fix the excess use of slab memory.

Quote:
Originally Posted by syg00 View Post
I see there was a problem in 2.6.13 that mirrors this pretty well.
Are you talking about the problem that got a lot of discussion five years ago, including this link which is on top of the google hits for this topic (if you don't specifically ask for recent)?
http://lkml.indiana.edu/hypermail/li...10.0/1540.html

That sure would sound relevant if I believed RHEL 5.5 includes a bug that got a lot of discussion five years ago.

The dramatic lack of such discussion when you tell google to restrict to this year, is probably a hint that someone did a good job of fixing that problem. So I don't think RHEL 5.5 still (or even again) includes that bug. So despite superficial similarity, I think the problem in this thread has some other cause.

Last edited by johnsfine; 11-30-2010 at 08:57 PM.
 
Old 12-01-2010, 12:52 PM   #20
nomb
Member
 
Registered: Jan 2006
Distribution: Debian Testing
Posts: 675

Original Poster
Rep: Reputation: 58
I deleted the contents because it didn't really flow with the below.

Last edited by nomb; 12-01-2010 at 01:33 PM.
 
Old 12-01-2010, 01:22 PM   #21
nomb
Member
 
Registered: Jan 2006
Distribution: Debian Testing
Posts: 675

Original Poster
Rep: Reputation: 58
Wow, somehow I missed a whole page of replies...

Quote:
Originally Posted by johnsfine View Post
I looked at "Slab: 805164 kB" in the second to last line of the file you posted and "Slab: 104040 kB" in the last line. Did I misunderstand: Was that last line after reboot?
Yes the very last line was right after the reboot.

Quote:
Originally Posted by johnsfine View Post
What applications are running on the system? You may want to experimentally kill processes yourself to find out which one is driving the memory leak.
Really nothing, at this point is is just a RHEL installation with IA done to it. Things like auditing, password requirements, etc.

Quote:
Originally Posted by syg00 View Post
I'd recommend as recent a kernel as you can get.
I am running 2.6.28-194.26.1.el5PAE.

Quote:
Originally Posted by johnsfine View Post
If an application leaks such resources, then the kernel slab allocation may grow unacceptably large. No kernel bug is needed. If you stop the application resource leak, you fix the excess use of slab memory.
I think that is why systemtap was recomended to me. I am suppose to be able to use it to, I think, look into the names_cache slab and determine which userspace process is leaking memory. Problem is I have no experience with it.

This was given to me in IRC which prints all of the gets and puts of namei.c. But it was uncertian if that would show the problem or not. And to be honest I'm not sure what I'm looking for.

Last edited by nomb; 12-01-2010 at 01:31 PM.
 
Old 12-01-2010, 02:40 PM   #22
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,139

Rep: Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127Reputation: 1127
Quote:
Originally Posted by nomb View Post
Really nothing, at this point is is just a RHEL installation with IA done to it. Things like auditing, password requirements, etc.
Now you're making it sound a lot more like a kernel bug and especially like that kernel bug in the five year old discussion I linked above.

I can't believe it is that same five year old bug, but programmers sometimes reintroduce bugs. So it could be a new kernel bug similar to that old one.

If whoever is supporting you from Red Hat believed it was a kernel bug, they could escalate it to someone at Red Hat who could diagnose that bug. (If it is a resource leak in one of your application programs, Red Hat would never get real work done if they routinely escalated such problems to someone who could diagnose a kernel bug).
 
Old 12-01-2010, 02:57 PM   #23
nomb
Member
 
Registered: Jan 2006
Distribution: Debian Testing
Posts: 675

Original Poster
Rep: Reputation: 58
Quote:
Originally Posted by johnsfine View Post
Now you're making it sound a lot more like a kernel bug and especially like that kernel bug in the five year old discussion I linked above.

I can't believe it is that same five year old bug, but programmers sometimes reintroduce bugs. So it could be a new kernel bug similar to that old one.

If whoever is supporting you from Red Hat believed it was a kernel bug, they could escalate it to someone at Red Hat who could diagnose that bug. (If it is a resource leak in one of your application programs, Red Hat would never get real work done if they routinely escalated such problems to someone who could diagnose a kernel bug).
The problem with that though is that that bug seems to be only affected by auditing? I turned auditing off and the problem still persists.
 
Old 12-01-2010, 06:45 PM   #24
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,491

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
Systemtap puts probes into the kernel itself. Not generally recommended for production environments - but there are exceptions. Like here.
You'll need to install systemtap itself, and you'll also need a kernel with debug info compiled in - Redhat provide them for all their supported kernels. It'll be big - better than 200 Meg on my F12 system.
<Edit:> The size comment is only relevant for download - you don't boot this kernel, it merely needs to be available for the debug info</Edit:>

Save that stp file - it'll show all the entry and exit calls to the module that was compiled from the code in namei.c. It'll indent the print-out so you can see nested calls and their returns. It has no end test, so you'll need to stop it yourself - <Ctrl>-C if running foreground.
That will produce a *lot* of output - I just tested it on a quiet system running F/F and claws client. Three second generated 375 lines - 225 calls, but only 150 returns. So a simple mis-match isn't necessarily going to indicate a problem - you'd have to have enough data (across enough time) to smooth out the calls. Even 30 seconds showed the same disparity.Run the stp (as root) by
Code:
stap names1-trace-indent.stp > trace.out
You may be able to analyze it, or see if your friendly RH support folks want it.

Last edited by syg00; 12-01-2010 at 07:01 PM.
 
Old 12-01-2010, 08:40 PM   #25
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,491

Rep: Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077Reputation: 1077
Just for interest, some quick awk over the (roughly) 30 sec sample shows this
Code:
calls

303	irqbalance(1326):
87	clock-applet(2513):
21	nautilus(2238):
42	udisks-daemon(2262):
9	firefox(6641):
24	upowerd(1931):
3	firefox(4109):
126	gnome-panel(2224):
87	gvfs-afc-volume(2499):
24	gconfd-2(2206):
18	sendmail(4033):
96	gnome-settings-(2214):
36	hald(1509):
1890	firefox(2564):
3	firefox(4110):
42	gnome-screensav(2519):

returns

202	irqbalance(1326):
58	clock-applet(2513):
14	nautilus(2238):
28	udisks-daemon(2262):
6	firefox(6641):
16	upowerd(1931):
2	firefox(4109):
84	gnome-panel(2224):
58	gvfs-afc-volume(2499):
16	gconfd-2(2206):
12	sendmail(4033):
64	gnome-settings-(2214):
24	hald(1509):
1260	firefox(2564):
2	firefox(4110):
28	gnome-screensav(2519):
The summation would be better done in the stap code - saves writing boat loads of output. I may have time to knock something up later.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicating a RHEL 5.1 32bit server on RHEL 5.4 64 bit wernox Red Hat 1 12-09-2009 03:15 PM
java Xmx limitation on 32bit with PAE enabled michelangelo Programming 3 11-24-2009 08:24 PM
Suse 32bit PAE enabled. C program runs with 3.5Gb but java cannot use more than 1.8Gb michelangelo Suse/Novell 2 11-24-2009 06:27 PM
Out of memory on 16gb ram, PAE 32bit 2.6.23 heson Linux - Server 4 11-04-2007 06:54 AM


All times are GMT -5. The time now is 10:45 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration