Detecting temporary memory leaks

johnsfine · 11-11-2012, 03:46 PM

Do you know of any good tools or methods for detecting "temporary" memory leaks, those that accumulate to be too large during the run of a program, but are deleted before the program exits?

This would be allocations that are not deleted as early as they ought to be deleted, but are deleted before the program exits.

I need this for a program that has thousands of unimportant memory leaks totaling several MB of memory. Unimportant memory leaks are data allocations that are absolutely bounded in size and/or needed all the way up to the end of processing and simply aren't deleted by the program before exiting (so they are deleted by the OS when the program exits). The unimportant leaks might need significant book keeping to track in order to delete them, and all that coding and overhead would have no benefit in the ordinary operation of the program. But unimportant memory leaks can be a massive source of noise when you try to use standard tools to find the important memory leaks.

I can run the program in a way that causes the memory leaks I'm looking for to dominate the process's memory use. So I do know a way to track them down, but it would take a large amount of custom coding and other effort. So I wonder if I would be "reinventing a wheel". Does the diagnostic I might code (or some other valid approach) already exist?

My basic concept is to replace malloc and in that replacement, keep extra info on every allocation (to enable the extra processing described below for each deallocation).
Also classify the allocations into a meaningful finite set that can be back tracked to the source code, and keep a map with an entry for each member of that set giving the total outstanding size of all allocations of that set. Then when one set gets unreasonably large at any time during execution, it can be reported.
In an optimized build with symbols, the address of the caller of malloc is often good enough or almost good enough to classify the allocation. In a debug build, there is no inlining, so the address of the caller of malloc is likely to be a std function that is too general to be an indication of the source of the problem. Even in an optimized build, the problem caller of malloc might be too basic a function and it would be necessary to rerun the problem classifying by the caller of the caller of malloc.

When allocating, your malloc replacement can know its caller's address (thought knowing caller of caller can be trickier in a optimized build and nowhere near high enough in a debug build). But we need to keep running total up to date when freeing, and it is the caller of malloc, not the caller of free that we classify by. So we need that extra info per chunk to remember how we classified its allocation.

That map is easier if using standard function, which use malloc, so we need a recursion guard to keep the diagnostic malloc from trying to track its own use of itself.

All these things are things I know how to do, but they add up to quite a bit of work, and aren't that different from what ordinary memory leak detection tools need to do.

An ordinary memory leak detection tool does almost what I need, but so far as I understand misses by a fundamental enough difference that it would not be usable.

Also, I have both a Windows and Linux version of this program, but this test is a lot easier for me to run in Windows. However, replacing malloc with my own function is a lot easier in Linux than in Windows. Simply replacing malloc before the relevant lib when linking the main executable is enough in Linux to replace malloc (despite then including both a .so that has standard malloc and more .so files linked against that .so file). In Windows, I know (from earlier experiments) that malloc cannot be replaced that simply and I don't really know how to do it. Anyone have any experience in replacing malloc in a Windows program that uses many DLLs? Or do I need to do this testing in Linux? The experts who write ordinary memory leak detection tools know how to intercept malloc in Windows or Linux without even relinking the application. I assume that would be too difficult for me. I've never seen even a description of how that is done. But if you know an easy way to do that, it would help a lot.

Sergei Steshenko · 11-11-2012, 05:12 PM

I have questions rather than answers

"are deleted before the program exits?" - I am not sure this is clear enough. That is, I do not understand how to determine that the program is about to exit and then tell the checking tool to report.

Or you mean that allocations/deallocations are simply logged, and then are post-processed ?

And a drastic tentative proposal - if you are ready to work under Linux, you can consider compiling a custom kernel with modified 'malloc' (whatever it is called in the kernel) that can selectively be switch into some debug mode. E.g. you will also create a system call which turns debug mode on/off, and only your program to be debugged will turn it on.

Maybe this way you'll eventually have even more control with less coding overhead.

suicidaleggroll · 11-11-2012, 05:37 PM

I typically replace calls to large complex functions with calls with dummy code that mimics their behavior (albeit much more simplistically). In the end you're swapping out parts of the program with dummy bits and watching the excess memory usage during execution. When the excess memory usage stops or at least reduces, you know that you just swapped out a function that had a memory leak in it.

This does take quite a bit of overhead to track down though, maybe not the solution you're looking for. With a bit of time invested it's always found my memory leaks though.

johnsfine · 11-11-2012, 08:19 PM

Quote:

Originally Posted by Sergei Steshenko

"are deleted before the program exits?" - I am not sure this is clear enough. That is, I do not understand how to determine that the program is about to exit and then tell the checking tool to report.

I meant that part only as an extra reason that traditional memory leak detection wouldn't work.

I don't care to detect allocations based on whether they are deleted at the end or not.

I only want to detect when the total outstanding allocation from one "caller" is very large, and when that is detected report some stats about it (which caller, number of outstanding allocations, average size of them, etc.) As mentioned earlier "caller" might mean the caller of malloc, except that when the caller of malloc is too basic a routine it might mean the caller of the caller.

Quote:

Or you mean that allocations/deallocations are simply logged, and then are post-processed ?

I probably can't afford anything that large:

The legitimate operation of the program uses hundreds of MB of ram, with a very large number of allocations and deallocations, probably many millions of them. The memory leak seems to be a few MB per iteration of a very complicated portion of the program. It takes at least hundreds of iterations of that complicated loop before the cumulative leak would even stand out against the background of an enormous variety of allocations and deallocations.

Quote:

And a drastic tentative proposal - if you are ready to work under Linux, you can consider compiling a custom kernel with modified 'malloc'

The interface between the OS and the process for memory use is too coarse. Useful detail of malloc can only be seen inside the process, not from the OS.

Quote:

Maybe this way you'll eventually have even more control with less coding overhead.

If your suggestion points to some method to make this simpler or more effective, I can't see it. Maybe there is something about Linux memory management that you know and I am overlooking. But I think it is the opposite. I think involving the kernel can only make this task harder.

johnsfine · 11-11-2012, 08:24 PM

Quote:

Originally Posted by suicidaleggroll

I typically replace calls to large complex functions with calls with dummy code that mimics their behavior (albeit much more simplistically). In the end you're swapping out parts of the program with dummy bits and watching the excess memory usage during execution. When the excess memory usage stops or at least reduces, you know that you just swapped out a function that had a memory leak in it.

This is an incredibly complicated program worked on by dozens of programmers for over 10 years. The approach you describe is not even conceivable in this context.

I need something that can automatically track tens of thousands of different allocation call points totalling millions of allocations and deallocations.

Sergei Steshenko · 11-11-2012, 08:38 PM

Quote:

Originally Posted by johnsfine

...
If your suggestion points to some method to make this simpler or more effective, I can't see it. Maybe there is something about Linux memory management that you know and I am overlooking. But I think it is the opposite. I think involving the kernel can only make this task harder.

All I know is that with my approach you resolve the problems regardless of libraries or you program.

And, of course, I didn't mean to go through modified 'malloc' log manually, I meant writing a script which filters out matching malloc/free pairs and leaves the unmatched by 'free' 'malloc's.

I think memory management is PID-aware, so, I think, what I've suggested is feasible.

johnsfine · 11-12-2012, 07:49 AM

Quote:

Originally Posted by Sergei Steshenko

All I know is that with my approach you resolve the problems regardless of libraries or you program.

If we are still talking about changing malloc in the kernel as opposed to changing it in the main executable, I still don't see any way that makes sense.

In normal use, malloc within the process divides very large allocations from the kernel into the allocations the program needs and it merges free requests into contiguous freed areas and reuses that memory. The kernel only sees very rare malloc requests that happen to exhaust the process's internal pool of free memory. Then the process's malloc asks the kernel for a much larger chunk than it currently needs so it can satisfy many future requests before it asks the kernel for more.

So the kernel just doesn't see the info I need to track.

Quote:

And, of course, I didn't mean to go through modified 'malloc' log manually, I meant writing a script which filters out matching malloc/free pairs and leaves the unmatched by 'free' 'malloc's.

I understood that. I was worried about all the millions of tiny allocate/free pairs in the normal operation of the program. My idea of looking them up in a map (by caller) and increasing/decreasing a running total stored in that map would add a lot of overhead to each allocate/free call. But I think logging the calls to a file would add an even larger overhead. I was worried about the increase in overhead.

But post processing a log vs. finding the problems on the fly does partition and slightly simplify the overall task. I wouldn't need to store extra data with every allocation in the real program (that would be done with a second map in the post processor program). The important map would also be in the dedicated post processor program rather than in the program being tested (which simplifies it). I could consistently store both caller address and some reasonable attempt at caller's caller and sort that out in post processing, rather than need to rerun the main test after deciding which callers with high totals are too basic in function to lead me to the source position of the problem.

So if I end up coding this myself, I might choose post processing a log vs. more processing on the fly. But I still would prefer if someone tells me a usable form of this tool already exists.

Ordinary memory leak detectors output the set of allocations that are not freed at the end of the program. But do any of them have a mode that instead outputs the raw log of all allocation and free requests in some format that wouldn't be too hard to post process?