LinuxQuestions.org - memory leak...

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - memory leak... (https://www.linuxquestions.org/questions/linux-software-2/memory-leak-558378/)

Hello:

I just upgraded a server from fedora-core-2 to red-hat-enterprise-5
and am experiencing a rather bizarre memory leak with one of my
processes. When the process is running, the total 'memory used' (as
seen by top) keeps on going up, BUT what's rather interesting is that
the memory related to this particular process is not changing (I've
looked at VIRT, RES, SHR, SWAP,CODE,DATA). I'm pretty sure the problem
is related to this particular process, because if/when I shut it down,
the 'total memory used' stays fairly constant. If I leave the process
running, eventually the system runs out of memory and needs to be
restarted.

The same binary works fine on the fc2 server.

The issues are:

* why is the memory of the process not increasing? but yet the 'total
memory used' is?

* could this be related to the different versions of the libs that the
code is linking against (see below)?

* what do i do next, given that I need this code operational. I
suppose copying libs over from old to new server might be an option? I
can note that did use valgrind to try to see what's going on, and did
end up with a few errors - but they're the same regarding weather
running on fc2 or on rhel5.

Here is some info regarding lib versions that are used:

oldHost (fc2 - working fine):
----------------------------

ldd process info:
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x009d6000)
libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x001de000)
libm.so.6 => /lib/tls/libm.so.6 (0x0080b000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x001d4000)
libc.so.6 => /lib/tls/libc.so.6 (0x006e7000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x006ce000)

ls -la /lib/tls/libpthread.so.0
lrwxrwxrwx 1 root root 18 Nov 12 2004 /lib/tls/libpthread.so.0 ->
libpthread-0.61.so
ls -la /usr/lib/libstdc++.so.5
lrwxrwxrwx 1 root root 18 Nov 9 2004 /usr/lib/libstdc++.so.5 ->
libstdc++.so.5.0.5
ls -la /lib/tls/libm.so.6
lrwxrwxrwx 1 root root 13 Nov 12 2004 /lib/tls/libm.so.6 ->
libm-2.3.3.so
ls -la /lib/libgcc_s.so.1
lrwxrwxrwx 1 root root 28 Nov 9 2004 /lib/libgcc_s.so.1 ->
libgcc_s-3.3.3-20040413.so.1
ls -la /lib/tls/libc.so.6
lrwxrwxrwx 1 root root 13 Nov 12 2004 /lib/tls/libc.so.6 ->
libc-2.3.3.so
ls -la /lib/ld-linux.so.2
lrwxrwxrwx 1 root root 11 Nov 12 2004 /lib/ld-linux.so.2 ->
ld-2.3.3.so

newHost (rhel5 - memory leaking):
----------------------------------

ldd process info:
linux-gate.so.1 => (0x00ed4000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00405000)
libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x00cea000)
libm.so.6 => /lib/libm.so.6 (0x003d6000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00765000)
libc.so.6 => /lib/libc.so.6 (0x00297000)
/lib/ld-linux.so.2 (0x00275000)

ls -la /lib/libpthread.so.0
lrwxrwxrwx 1 root root 17 May 7 07:12 /lib/libpthread.so.0 ->
libpthread-2.5.so
ls -la /usr/lib/libstdc++.so.5
lrwxrwxrwx 1 root root 18 May 7 07:18 /usr/lib/libstdc++.so.5 ->
libstdc++.so.5.0.7
ls -la /lib/libm.so.6
lrwxrwxrwx 1 root root 11 May 7 07:12 /lib/libm.so.6 -> libm-2.5.so
ls -lka /lib/libgcc_s.so.1
lrwxrwxrwx 1 root root 1 May 7 07:12 /lib/libgcc_s.so.1 ->
libgcc_s-4.1.1-20070105.so.1
ls -la /lib/libc.so.6
lrwxrwxrwx 1 root root 11 May 7 07:12 /lib/libc.so.6 -> libc-2.5.so
ls -la /lib/ld-linux.so.2
lrwxrwxrwx 1 root root 9 May 7 07:12 /lib/ld-linux.so.2 -> ld-2.5.so

thank you.

It may not be leaking.

There was a point at which threads were not displayed individually. Top essentially took the memory of the processes it saw (like you'd see with ps -ef). However, at some point the "Linux community" decided to show all threads as if they were individual processes. So ps -ef on later systems shows it this way and if one adds all the memory for these apparent processes it would seem they are using the number of threads times the amount of memory for each when if fact all the threads of the given process are sharing the same memory.

We saw this on some processes we were running on a RHEL 2.1 system vs. one we were running on a RHEL 3 system. (Of course how I could possibly have installed two different versions at the same time is a mystery to me to this day).

You CAN make the older system show the threads by using the "-m" option with ps.

This all has to do with the NPTL (Native Posix Thread Library).

To verify the "threads" (a/k/a clone processes) are actually using the same memory you can cat /proc/<PID>/maps into a file for each PID then diff the output files.

When I did this in the setup discovered above for two separate sets of processes I actually found two different maps. There is a parent process that has a map then multiple children that all share a single map.

The best way to see "memory" usage on Linux is with the "free" command rather than top.

yea alot of people get hung up on this one. Linux is showing you how much memory the process could take up in "virtual" memory as if that process were the only process running on the system. Every process gets the benifit of using all memory accept the bit alocated to the kernel. then all the threads get reported seperately and all the shared memory and forks and extra processes of the process even the share libs linked to the process all get repoted again and again for each thread and process even though the shared lib only takes up one space in actual memory.

Basically if you have a memory leak it will cause degraded performance and high swap use and eventual memory thrashing as the swap gets full and kswapd starts taking up all the processor power.

thanks for your replies.

i realize that the kernel does a lot of magic behind the scenes with how it allocates memory, but ultimately what happens here is that the system runs out.. so something must be leaking. Once it runs out, the console is spammed with stuff like:

cpu 1 hot: high 186, batch 31 used:98
cpu 1 cold: high 62, batch 15 used:12
cpu 2 hot: high 186, batch 31 used:145
cpu 2 cold: high 62, batch 15 used:14
cpu 3 hot: high 186, batch 31 used:156
cpu 3 cold: high 62, batch 15 used:9
Free pages: 2934312kB (2929472kB HighMem)
Active:48013 inactive:3586 dirty:0 writeback:5 unstable:0 free:733578 slab:217462
mapped:5762 pagetables:809
DMA free:3548kB min:68kB low:84kB high:100kB active:16kB inactive:0kB present:16384kB
pages_scanned:791917820 all_unreclaimable? yes
lowmem_reserve[]: 0 0 880 4080
DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 880 4080
Normal free:1292kB min:3756kB low:4692kB high:5632kB active:152kB inactive:128kB
present:901120kB pages_scanned:2272739400 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 25600
HighMem free:2929472kB min:512kB low:3928kB high:7344kB active:191780kB
inactive:14320kB present:3276800kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB
0*4096kB = 3548kB
DMA32: empty
Normal: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB
0*4096kB = 1292kB
HighMem: 10382*4kB 13681*8kB 12750*16kB 11271*32kB 8651*64kB 5506*128kB 2346*256kB
559*512kB 61*1024kB 3*2048kB 0*4096kB = 2929472kB
Swap cache: add 24, delete 24, find 0/0

[...]

it might be that what's growing is really cache (this process is actually doing i/o non-stop):

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 3210664 153984 4435668 0 0 1 7 74 74 1 2 97 0 0

... if someone could confirm that the kernel does not see 'cache' as belonging to any one process, then the business of individual process' memory not growing but the total free decreasing would make sense.

... still not sure why run out at the end though.

i'm not exactly sure what you mean but if this helps "Resident Set Size" RSS
tracks the number of incore (resident) pages belonging to the virtual address space of a process.
when kswapd runs around deciding how to free pages it does it acording to RSS.
beyond that the vm system will caache any and all file data contained in memory pages as well as anything else rejected back to disk, even data to unaltered swap pages on disk it will cache until it is full so as not to waste memory. eventually this has to lead to memory preasure and the system cleaning itself up but its not a problem at all. It makes the system run better.

you didn't say if you had a swap partition or not or if it is filing up.

Cache isn't the problem, but you've got the right idea.
Have a read of this
Be worth checking the whole thread out.

(forgot to mention - if you want to keep an eye on slab allocation try slabtop)