memory leak...
Hello:
I just upgraded a server from fedora-core-2 to red-hat-enterprise-5 and am experiencing a rather bizarre memory leak with one of my processes. When the process is running, the total 'memory used' (as seen by top) keeps on going up, BUT what's rather interesting is that the memory related to this particular process is not changing (I've looked at VIRT, RES, SHR, SWAP,CODE,DATA). I'm pretty sure the problem is related to this particular process, because if/when I shut it down, the 'total memory used' stays fairly constant. If I leave the process running, eventually the system runs out of memory and needs to be restarted. The same binary works fine on the fc2 server. The issues are: * why is the memory of the process not increasing? but yet the 'total memory used' is? * could this be related to the different versions of the libs that the code is linking against (see below)? * what do i do next, given that I need this code operational. I suppose copying libs over from old to new server might be an option? I can note that did use valgrind to try to see what's going on, and did end up with a few errors - but they're the same regarding weather running on fc2 or on rhel5. Here is some info regarding lib versions that are used: oldHost (fc2 - working fine): ---------------------------- ldd process info: libpthread.so.0 => /lib/tls/libpthread.so.0 (0x009d6000) libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x001de000) libm.so.6 => /lib/tls/libm.so.6 (0x0080b000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x001d4000) libc.so.6 => /lib/tls/libc.so.6 (0x006e7000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x006ce000) ls -la /lib/tls/libpthread.so.0 lrwxrwxrwx 1 root root 18 Nov 12 2004 /lib/tls/libpthread.so.0 -> libpthread-0.61.so ls -la /usr/lib/libstdc++.so.5 lrwxrwxrwx 1 root root 18 Nov 9 2004 /usr/lib/libstdc++.so.5 -> libstdc++.so.5.0.5 ls -la /lib/tls/libm.so.6 lrwxrwxrwx 1 root root 13 Nov 12 2004 /lib/tls/libm.so.6 -> libm-2.3.3.so ls -la /lib/libgcc_s.so.1 lrwxrwxrwx 1 root root 28 Nov 9 2004 /lib/libgcc_s.so.1 -> libgcc_s-3.3.3-20040413.so.1 ls -la /lib/tls/libc.so.6 lrwxrwxrwx 1 root root 13 Nov 12 2004 /lib/tls/libc.so.6 -> libc-2.3.3.so ls -la /lib/ld-linux.so.2 lrwxrwxrwx 1 root root 11 Nov 12 2004 /lib/ld-linux.so.2 -> ld-2.3.3.so newHost (rhel5 - memory leaking): ---------------------------------- ldd process info: linux-gate.so.1 => (0x00ed4000) libpthread.so.0 => /lib/libpthread.so.0 (0x00405000) libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x00cea000) libm.so.6 => /lib/libm.so.6 (0x003d6000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00765000) libc.so.6 => /lib/libc.so.6 (0x00297000) /lib/ld-linux.so.2 (0x00275000) ls -la /lib/libpthread.so.0 lrwxrwxrwx 1 root root 17 May 7 07:12 /lib/libpthread.so.0 -> libpthread-2.5.so ls -la /usr/lib/libstdc++.so.5 lrwxrwxrwx 1 root root 18 May 7 07:18 /usr/lib/libstdc++.so.5 -> libstdc++.so.5.0.7 ls -la /lib/libm.so.6 lrwxrwxrwx 1 root root 11 May 7 07:12 /lib/libm.so.6 -> libm-2.5.so ls -lka /lib/libgcc_s.so.1 lrwxrwxrwx 1 root root 1 May 7 07:12 /lib/libgcc_s.so.1 -> libgcc_s-4.1.1-20070105.so.1 ls -la /lib/libc.so.6 lrwxrwxrwx 1 root root 11 May 7 07:12 /lib/libc.so.6 -> libc-2.5.so ls -la /lib/ld-linux.so.2 lrwxrwxrwx 1 root root 9 May 7 07:12 /lib/ld-linux.so.2 -> ld-2.5.so thank you. |
It may not be leaking.
There was a point at which threads were not displayed individually. Top essentially took the memory of the processes it saw (like you'd see with ps -ef). However, at some point the "Linux community" decided to show all threads as if they were individual processes. So ps -ef on later systems shows it this way and if one adds all the memory for these apparent processes it would seem they are using the number of threads times the amount of memory for each when if fact all the threads of the given process are sharing the same memory. We saw this on some processes we were running on a RHEL 2.1 system vs. one we were running on a RHEL 3 system. (Of course how I could possibly have installed two different versions at the same time is a mystery to me to this day). You CAN make the older system show the threads by using the "-m" option with ps. This all has to do with the NPTL (Native Posix Thread Library). To verify the "threads" (a/k/a clone processes) are actually using the same memory you can cat /proc/<PID>/maps into a file for each PID then diff the output files. When I did this in the setup discovered above for two separate sets of processes I actually found two different maps. There is a parent process that has a map then multiple children that all share a single map. The best way to see "memory" usage on Linux is with the "free" command rather than top. |
yea alot of people get hung up on this one. Linux is showing you how much memory the process could take up in "virtual" memory as if that process were the only process running on the system. Every process gets the benifit of using all memory accept the bit alocated to the kernel. then all the threads get reported seperately and all the shared memory and forks and extra processes of the process even the share libs linked to the process all get repoted again and again for each thread and process even though the shared lib only takes up one space in actual memory.
Basically if you have a memory leak it will cause degraded performance and high swap use and eventual memory thrashing as the swap gets full and kswapd starts taking up all the processor power. |
thanks for your replies.
i realize that the kernel does a lot of magic behind the scenes with how it allocates memory, but ultimately what happens here is that the system runs out.. so something must be leaking. Once it runs out, the console is spammed with stuff like: cpu 1 hot: high 186, batch 31 used:98 cpu 1 cold: high 62, batch 15 used:12 cpu 2 hot: high 186, batch 31 used:145 cpu 2 cold: high 62, batch 15 used:14 cpu 3 hot: high 186, batch 31 used:156 cpu 3 cold: high 62, batch 15 used:9 Free pages: 2934312kB (2929472kB HighMem) Active:48013 inactive:3586 dirty:0 writeback:5 unstable:0 free:733578 slab:217462 mapped:5762 pagetables:809 DMA free:3548kB min:68kB low:84kB high:100kB active:16kB inactive:0kB present:16384kB pages_scanned:791917820 all_unreclaimable? yes lowmem_reserve[]: 0 0 880 4080 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 880 4080 Normal free:1292kB min:3756kB low:4692kB high:5632kB active:152kB inactive:128kB present:901120kB pages_scanned:2272739400 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 25600 HighMem free:2929472kB min:512kB low:3928kB high:7344kB active:191780kB inactive:14320kB present:3276800kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 3*4kB 0*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3548kB DMA32: empty Normal: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1292kB HighMem: 10382*4kB 13681*8kB 12750*16kB 11271*32kB 8651*64kB 5506*128kB 2346*256kB 559*512kB 61*1024kB 3*2048kB 0*4096kB = 2929472kB Swap cache: add 24, delete 24, find 0/0 [...] it might be that what's growing is really cache (this process is actually doing i/o non-stop): procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 3210664 153984 4435668 0 0 1 7 74 74 1 2 97 0 0 ... if someone could confirm that the kernel does not see 'cache' as belonging to any one process, then the business of individual process' memory not growing but the total free decreasing would make sense. ... still not sure why run out at the end though. |
i'm not exactly sure what you mean but if this helps "Resident Set Size" RSS
tracks the number of incore (resident) pages belonging to the virtual address space of a process. when kswapd runs around deciding how to free pages it does it acording to RSS. beyond that the vm system will caache any and all file data contained in memory pages as well as anything else rejected back to disk, even data to unaltered swap pages on disk it will cache until it is full so as not to waste memory. eventually this has to lead to memory preasure and the system cleaning itself up but its not a problem at all. It makes the system run better. you didn't say if you had a swap partition or not or if it is filing up. |
Cache isn't the problem, but you've got the right idea.
Have a read of this Be worth checking the whole thread out. (forgot to mention - if you want to keep an eye on slab allocation try slabtop) |
All times are GMT -5. The time now is 06:18 PM. |