Excessive build-up of virtual memory by program

johan162 · 01-20-2010, 03:34 AM

I've written a daemon (C, gcc) which is fairly standard. It sits in the background listening on a TCP/IP port for client(s) to connect.

Each connected client gets it's own thread. The client can then give various commands and some commands will cause some external programs images to run. These programs are started via a standard vfork()/exec() combination.

(I use vfork() rather than fork() to avoid the issue with unnecessary demand of virtual memory until until the new program image has been created with the exec() call)

In the parent process thread I then do a wait4() on the forked process in order to know when the external program is done.

(The external program itself requires quite a lot of real memory but that's not really a problem.)

The issue I have is that after the daemon has been running for a couple of days I can see a build up of virtual memory. The reserved and shared memory is still pretty much constant.

This makes me believe that somehow the vfork()/exec() pair use of virtual memory is recorded in the parent process. This doesn't really make any sense but I cannot see any other reason since by it's nature the daemon itself allocates very little dynamic memory (which is also guarded by the use of both MALLOC_CHECK as well as a compile time -DFORTIFY_SOURCE=2)

The most likely issue is that I have some fundamental issue, but I've been writing daemons for years and never had this issue. Needless to say the external programs terminates fine. So right now I'm a bit clueless how to proceed to fully understand this issue.

While I don't expect anyone to pinpoint what is wrong the more general question that perhaps someone can hint on is:

What in general might make a process consume a lot of virtual memory (but not real memory) ?

P.S I'm thinking of trying posix_spawn() instead of a vfork()/exec() pair since the posix method was designed exactly to solve some of the inherent virtual memory issues that may arrise in the use of vfork()/exec()

neonsignal · 01-20-2010, 06:18 AM

Quote:

Originally Posted by johan162

This makes me believe that somehow the vfork()/exec() pair use of virtual memory is recorded in the parent process. This doesn't really make any sense...

You are correct, the child process use of memory will not affect the parent. A simple test is the following, which runs 1000 child processes staggered 1ms apart, then continues to start new ones as old ones finish (indefinitely).

Code:

#include <stdio.h>
#include <unistd.h>
int main()
{
  unsigned int i;
  for (i = 0; ; ++i)
  {
    int status;
    if (i < 1000)
      usleep(1000);
    else
      wait4(-1, &status, 0);
    if (vfork() == 0)
      execl("/bin/sleep", "/bin/sleep", "1", 0);
    printf("%7d\r", i);
  }
}

This example shows no memory leak, over millions of iterations. Of course, if the parent process doesn't do the wait, then the 'defunct' children will continue to hang around and consume memory, but this should be obvious from the process list.

Quote:

Originally Posted by johan162

which is also guarded by the use of both MALLOC_CHECK as well as a compile time -DFORTIFY_SOURCE=2)

These checks don't prevent memory leaks. The MALLOC_CHECK is used to protect against gross allocation bugs, and FORTIFY_SOURCE against some forms of buffer overflows. Memory leaks are more subtle (either allocated blocks that are not freed when the pointer is discarded, or data structures that grow indefinitely). Perhaps valgrind would be helpful to find these leaks (you would have to terminate the daemon, eg after x connections had been made).

Quote:

Originally Posted by johan162

What in general might make a process consume a lot of virtual memory (but not real memory) ?

Any memory leak will do this, because the memory that has not been freed will eventually be swapped out (because it is no longer being accessed).

This would suggest that either the child processes are not terminating completely, or there is a memory leak in the parent daemon. It isn't necessarily a leak in one of your data structures; it could be for example a system object that is not being freed (such as a file being left open).

johan162 · 01-20-2010, 07:00 AM

Thanks for the reply.

While I'm aware of the infinite possibilities for memory leaks the daemon is very defensively coded.

What I originally suspected was that since fork()/vfork() inherits file handles that some was left open. But the first thing I always
do in the child is close all file handles (except 0,1,2).

I have also checked (with lsof(1) and netstat(1)) for any open handles or ports which shouldn't be there but I didn't come up with anything.

Nevertheless, I'm getting quite confident that this is a system object issue. The reason being that one of the commands a client can give is to read a video stream from a HW MPEG decoder cards (which appears as a /dev/ file that allows reading and controlling. That data is then copied to a plain file (using a standard select()/read()/write() construct). This produces a couple of GB worth of data and it seems like after each each copying is done and the thread exits the virtual memory increases.

This will be very interesting to track down since in that thread there is only two buffers used and they are both allocated and correctly de-allocated within the thread. So the only thing that can leak the amount of memory I'm seeing (~100 times the buffer size)is system objects connected with the copy operation.

I'm sure there is a lessen here for me (and perhaps other) to learn once I've identified my mistake.

BTW. The daemon is available on sourceforge as "tvpvrd"

neonsignal · 01-20-2010, 08:14 AM

No doubt you are aware that the child process of a vfork shouldn't change any data before it calls exec (and if it exits instead, it should call _exit).

johan162 · 01-20-2010, 03:11 PM

Thanks for your good points. (It's good to have someone to bounce ideas with..)

Yes, I'm aware of that "drawback" with vfork() vs. fork(). But I believe I'm not doing anything not allowed, for example the following short extract shows the code to spawn a ffmpeg transcoding process from the daemon.

The thing that really has me stumped is that any the data structure I have is only a few Kb in size and I see a virtual memory increase of ~60-80MB within a few hours and there is not enough calls to malloc() or open() that (as far as I can see) is even close to reserving this kind of memory.

The only thing that can remotely explain this is if some child process or thread is holding on to some file buffers after the thread/process has ended. But even that is far fetched since the file buffer is only 200K and is only allocated once for each process (and suitable de-allocated). Even if this was not free'd it would only leak ~0.2MB / child.

Code:

    
    pid_t pid;
    if ((pid = vfork()) == 0) {
        // In the child process
        // Make absolutely sure everything is cleaned up except the standard
        // descriptors
        for (int i = getdtablesize(); i > 2; --i) {
            (void) close(i);
        }

        // Since the ffmpeg command is run as a child process (via the sh comamnd)
        // we need to make sure all of this is in the same process group. This is
        // done in order so that we can kill the ffmpeg command if the server
        // is stopped by the user. The pid returned by the fork() will not be
        // the same process as is running the 'ffmpeg' command !
        setpgid(getpid(), 0); // This sets the PGID to be the same as the PID
        if (-1 == nice(20)) {
            logmsg(LOG_ERR, "Error when calling 'nice()' : ( %d : %s )", errno, strerror(errno));
            
            // We are not guaranteed that _exit() will close open handles
            for(int i=2; i >= 0; --i )
                (void)close(i);
            
            _exit(EXIT_FAILURE);
        }
        if( -1 == execl("/bin/sh", "sh", "-c", cmdbuff, (char *) 0) ) {
            logmsg(LOG_ERR, "Error when calling execl() '/bin/sh/%s' : ( %d : %s )", cmdbuff,errno, strerror(errno));
            
            // We are not guaranteed that _exit() will close open handles
            for(int i=2; i >= 0; --i )
                (void)close(i);
            _exit(EXIT_FAILURE);
        }
    } else if (pid < 0) {

Unfortunately since the daemon makes use of a number of threads as well a spawning child processes this will take some time to resolve.

I will see if some "clever" use of pmap(1) will help me pinpoint the issues.

I will report back to this thread since it will either turn out to be something really, really stupid or indeed something very interesting that is of use for others.

(I've been doing Unix programming for ~almost 20 years now and this is really the first time I come across this kind of problem where I have not been able to fairly soon identify my mistake - so this is bound to be a learning experience)

/J

johnsfine · 01-20-2010, 03:22 PM

Have you looked at /proc/pid/smaps for the pid of the daemon?

That may be too much detail to wade through to find your problem. But you haven't left much room in your question for a better starting point.

I just now read the documentation of vfork(). I never read that before seeing this thread, so obviously I'm not an expert. But your code certainly looks to me like it violates the rules of what you can do between the vfork and the exec.

johan162 · 01-20-2010, 04:57 PM

Hi,
That is a good tip and also what I hoped to see more on
using the pmap(1) analysis (which uses the /proc table to present the data in a slightly more readable form)

The reason for using vfork() is to avoid the situation where fork() could potentially yield a OOM (out of memory) even though the actual program image used by the following exec() call is quite small and fits well within the remaining memory.

This is because fork() will initially create an entire copy of the parents process. Using vfork() it will share the space until the exec() call is made and at that time that image will replace the existing image.

However, there is major, major drawback with using vfork() together with posix threads in that there is, depending on your program library handling, actually a possibility for a deadlock (loosely speaking this is due to the linker ld will require a lock before searching for the exec() function in a dynamic library environment). However in this particular server there can never be any other simultaneous call to vfork() so from this point of view this is safe.

As far as I have previously interpret the restriction for vfork() (and used it) is that since it shares the address space with the parent no data structures nor functions can be called since that could lead to a situation where the parent and child simultaneous tries to, for example, execute the same function using the same stack and data image => corruption. So this would resolve around corruption and not data leakage per se. However error checking which ends in an _exit() should be safe. But just as an experiment I will also try a plain old fork()

An alternative which I will try out as well is the less common posix_spawn() function which is actually better than the traditional fork()/exec() pair, in that it avoids some pitfalls that a user space program cannot avoid.

The story continues ...

neonsignal · 01-20-2010, 06:18 PM

You shouldn't be closing file handles inside the vfork child, even if it is just the plain exit case.

In theory, this will lead to undefined behaviour.

In practice, you are probably closing the handles of the main process. This might not matter in the case of the extra handles (assuming you aren't using them again), but closing the stdio handles on an error is asking for trouble. This is why they say that vfork must use _exit, not exit (so that the handles don't get closed).

Technically you can't even use the (shared) stack, but since the parent is held up until the exec is called, it isn't usually a problem (and you have to use it for the exec call anyway).

Under Linux, the plain fork is not as expensive as you might think. It does a copy on write of the memory, which means that although it allocates virtual memory for a copy, the memory doesn't occupy any real space until it is used. The only cost is the page table. It will be significantly slower, but the exec (especially of a shell) is not fast either...

Some people argue that there is no need for vfork (the documentation has the following bug line "It is rather unfortunate that Linux revived this spectre from the past.")! Posix doesn't actually require that it does anything different from fork.

posix_spawn is intended for use in an embedded system (especially those where the processor doesn't have a memory management unit). I'm not saying you shouldn't use it, but there may not be much advantage.

johan162 · 01-22-2010, 05:08 AM

I can now conclude that the virtual memory build up is due to the posix threads behaviour and not a program problem in itself.

Each time a new thread is created (pthread_create())it will reserve a chunk of ~8MB memory which will not be returned when the thread is ended via pthread_exit().

This can be easily demonstrated by the following "nonsense" program which just creates three threads with some time in between to give room for monitoring of the memory build-up.

Running this program and monitoring the virtual memory build up one can easily see that the virtual memory increases with ~8MB each time a thread is created (for example by monitoring the VmData: field in /proc/<pid>/status ). This memory is not returned when the thread exits.

Code:

// We want the full POSIX and C99 standard
#define _GNU_SOURCE

// Standard UNIX includes
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>

#define NTHREADS 3

void *
dummy_thread(void *arg) {
    int id = *(int *)arg;
    printf("Started thread: %d\n",id);
    sleep(120);
    printf("Finished thread: %d\n",id);
    pthread_exit(NULL);
    return (void *) 0;
}

int
main(int argc,char *argv[]) {
    pthread_t t[NTHREADS];
    int idx[NTHREADS];
    int sleeptime[] = {30,30,60};

    printf("Program started with pid=%d. Waiting 30 s to start first thread.\n",getpid());
    for(int i=0; i < NTHREADS; ++i) {
        sleep(sleeptime[i]);
        idx[i] = i+1;
        (void)pthread_create(&t[i], NULL, dummy_thread, (void *) &idx[i]);
    }
    sleep(240);
    exit(EXIT_SUCCESS);
}

This clearly indicates that I need to read up more on threads. The only speculation I can offer for this behavior is that the thread library keeps the allocated memory (up to some specified limit) for later use in order to avoid excessive memory fragmentation so that the next time a thread is created it will re-use this memory.

However, I'm sure someone with more Posix thread experience will shead some light on this behavior.

neonsignal · 01-22-2010, 06:46 AM

Quote:

the thread library keeps the allocated memory (up to some specified limit) for later use

The memory being kept is the exit status of the thread. Because you have not joined or detached the thread, you are getting a resource leak.

If you use pthread_detach (which can be called as soon as the thread starts) you will find that the behaviour is different. Or you can use pthread_join at some later stage.

johan162 · 01-22-2010, 09:49 AM

True, so true!. That was stupid of me. Thanks for the reminder. In my real program I actually create the threads detached from the beginning (in the attribute in pthread_create()) which make me wonder if this was enough.

Correcting the "nonsense" example above so that the threads are detached with a

Code:

pthread_detach(pthread_self());

doesn't make any difference. Tracing the the VmData field (in /proc/<pid>/status (a.k.a. swap in top(1)) would give the following memory usage profile (as the threads are started and stopped)

Code:

Running ...
Fri Jan 22 16:23:32 2010: 36 kB (1 threads)
Fri Jan 22 16:24:02 2010: 8364 kB (2 threads)
Fri Jan 22 16:24:32 2010: 16560 kB (3 threads)
Fri Jan 22 16:25:02 2010: 24756 kB (4 threads)
Fri Jan 22 16:25:32 2010: 32952 kB (5 threads)
Fri Jan 22 16:26:02 2010: 32952 kB (5 threads)
Fri Jan 22 16:26:32 2010: 32952 kB (4 threads)
Fri Jan 22 16:27:02 2010: 32952 kB (3 threads)
Fri Jan 22 16:27:32 2010: 32952 kB (2 threads)
Fri Jan 22 16:28:02 2010: 32952 kB (1 threads)
Fri Jan 22 16:28:32 2010: 32952 kB (1 threads)
Fri Jan 22 16:29:02 2010: 32952 kB (1 threads)
Fri Jan 22 16:29:32 2010: 32952 kB (1 threads)

So, clearly this isn't enough. The build up is still there.
So it seems I have some more reading up on the Posix threads to do.

neonsignal · 01-22-2010, 05:24 PM

Looking at the heap space for just a single set of threads running at the same time doesn't tell you much (it just shows you the peak usage). What you need to know is whether that memory has been made available for the next set of threads.

If you start another set of 5 threads after the first set have finished, does it reuse the existing memory or does it allocate more? If it reuses the same space, then you don't have a leak.

ta0kira · 01-23-2010, 10:40 PM

Isn't vfork an alias to fork in Linux? Also, why not run it through valgrind? Lastly, maybe you've made a libc call somewhere that returns a dynamic allocation where you thought it was a pointer to a static buffer.
Kevin Barry

PS POSIX.1-2008 removed vfork, though I might be mistaken about it being an alias already.

johan162 · 01-24-2010, 01:12 PM

I now believe that the explanation is very simple. The working memory page set (as recorded by the swap field (in top(1)) or the VmData field in /proc/<pid>/status) will keep the maximum for quite some time.

This means that if the process once had, say 50, threads then the virtual memory needed (~0.5GB) will be kept as the working set for quite some time before it is released. So even if the overall process has released the memory used and now only have , say 2 threads running, the virtual memory will still show ~0.5GB for some time. (This can be easily demonstrated with the simple test program included in the previous post)

I also have to confess on two errors on my own:

one small (only ~8bytes) leak in my code for each thread - (can't blame anyone except my own stupidity by not having clarified in my mind the design on where the responsibility for releasing the memory was). This leak could have meant that the pages were tagged as dirty and kept. Hence the buildup.
a typo in one of the pthread_create() call for the attribute field which had the unfortunate effect of not detaching the thread (and hence releasing the memory once the thread exited.)

Thanks everyone for this interesting discussion.