system() calls return "cannot allocae memory"

linuxdev817 · 08-26-2011, 10:55 AM

I have a program that pre-allocates ~55Gb of the system's 64Gb of memory. After allocating the memory I am beginning to notice that some of the calls I make to system() from my code begin returning -1 and errno says the cause is "system cannot allocate memory". The two calls to system() I am having trouble with currently are system("pgrep ntpd > pgrep_output") and system("dmidecode -s system-serial-number > system_serial").

The output from free -m gives me:

total used free shared buffers cached
Mem: 64556 56419 8136 0 44 65
-/+ buffers/cache: 56310 8245
Swap: 2368 0 2368

I am a little confused as to why memory cannot be allocated when there appears to be more than enough free. Furthermore, if I open another terminal on the same machine while the program is running and I run the commands from the prompt they execute correctly. I am assuming this means the issue is with the process that allocated the ~55Gb, however this behavior surpasses my knowledge of Linux.

ta0kira · 08-26-2011, 11:48 AM

I'm assuming system calls fork, which will duplicate the current process space before running the command. vfork is meant to allow forking without duplicating the process space; however, on Linux I believe it just calls fork. You might fork a "worker process" at the beginning of your program that you can send system commands to via a pipe from the main process. In other words, pipe and fork before you reserve memory, fprintf lines for the worker process to system, fgets those lines from the worker process, and send the resulting system output back to the main process with another pipe (or use socketpair for two-way IPC). This is probably a bit nebulous if you haven't dealt with fork, so let me know if you want more specific info.
Kevin Barry

edit:

Quote:

Originally Posted by linuxdev817

The two system calls I am having trouble with currently are system("pgrep ntpd > pgrep_output") and system("dmidecode -s system-serial-number > system_serial").

Although these are technically calls to system, a "system call" is a call to the kernel such as open, read, write, and fork. You might get unexpected answers referring to what you're doing as system calls.

linuxdev817 · 08-26-2011, 01:03 PM

You are right. I edited my post. It is not system calls I am referring to, it is calls to the "system()" method.

johnsfine · 08-26-2011, 01:08 PM

This may be one of those situations in which it is simplest to just provide a bunch of swap space that won't be used, but its existence will solve the problem.

From the symptoms, I expect you have a lot less than 64GB of free swap space. Probably you think you're using less than 64GB of "memory" so you don't need any swap space.

I think having 64GB of free swap space would fix the problem, even though that swap space wouldn't actually get used.

If I'm right about the situation, there are ways to deal with it that don't require swap space. But those are more complicated. Disk space is cheap. Swap space is nice insurance against a number of kinds of failure. If you aren't seriously short of disk space, just create and enable a swap file.

Quote:

Originally Posted by linuxdev817

I am a little confused as to why memory cannot be allocated when there appears to be more than enough free. Furthermore, if I open another terminal on the same machine while the program is running and I run the commands from the prompt they execute correctly. I am assuming this means the issue is with the process that allocated the ~55Gb, however this behavior surpasses my knowledge of Linux.

ta0kira partially explained it. The main issue is the control over "committed" memory. You can think of that as memory Linux has promised to some process, but in most cases that process hasn't (and won't) actually use the memory.

The system function needs to start by forking, which needs to ask Linux for a promise of enough memory to duplicate all the anonymous writable mappings of the calling program (in your program almost all of the virtual memory of the process is anonymous writable).

The system function then immediately throws away that promise and uses memory in a way unrelated to the calling program's memory. But you have a problem for the moment Linux needs to make that promise it could never keep (another 55GB after you've already tied up 55GB).

The default over commit setting allows Linux to make multiple such promises when it couldn't keep them all, but not to make a single impossible promise (If it has 10GB to hand out, it could promise 9GB each to 50 different processes as long as none of them actually use that memory. But it can't promise 11GB to one process).

What it has in that "to hand out" pool I mentioned is only part of the free physical ram (some must be reserved for other things) but all of the free swap.

ta0kira · 08-26-2011, 01:20 PM

Quote:

Originally Posted by linuxdev817

You are right. I edited my post. It is not system calls I am referring to, it is calls to the "system()" method.

Thanks. I wasn't confused myself by the wording, though. My previous comments still apply to your situation. More explicitly, I think system is trying to create a second 55GB process as a result of trying to duplicate the main process (via fork) before calling the shell (via execve). That's how processes create new processes on Linux.
Kevin Barry

edit:

Quote:

Originally Posted by johnsfine

ta0kira partially explained it. The main issue is the control over "committed" memory. You can think of that as memory Linux has promised to some process, but in most cases that process hasn't (and won't) actually use the memory.

To me "pre-allocates" sounds like the memory is effectively in use by the process, e.g. a massive 55GB malloc or a 55GB mlock.

linuxdev817 · 08-26-2011, 01:27 PM

Yeah, if I had spent some more time reading the man page for system() I probably would have realized what both of you have just explained. I did not realize it would fork. I apologize for that.

I have worked with pipes/sockets/fork before so I could probably put something together that would work. The swap file idea also seems like it would suit my needs. Honestly though, I would prefer to find a solution in which I did not have to call system(). I have always tried to avoid calls to system(). Currently, pgrep and dmidecode are the only two I am using. Maybe I should be looking for alternatives to using those rather than trying to correct the current situation.

linuxdev817 · 08-26-2011, 01:38 PM

Quote:

Originally Posted by ta0kira

Thanks. I wasn't confused myself by the wording, though. My previous comments still apply to your situation. More explicitly, I think system is trying to create a second 55GB process as a result of trying to duplicate the main process (via fork) before calling the shell (via execve). That's how processes create new processes on Linux.
Kevin Barry

edit:To me "pre-allocates" sounds like the memory is effectively in use by the process, e.g. a massive 55GB malloc or a 55GB mlock.

I have several caches of memory blocks/objects. Each block/object is created with malloc() when the program starts.

ta0kira · 08-26-2011, 01:39 PM

Quote:

Originally Posted by linuxdev817

I have worked with pipes/sockets/fork before so I could probably put something together that would work. The swap file idea also seems like it would suit my needs. Honestly though, I would prefer to find a solution in which I did not have to call system(). I have always tried to avoid calls to system(). Currently, pgrep and dmidecode are the only two I am using. Maybe I should be looking for alternatives to using those rather than trying to correct the current situation.

If your system commands are as simple as you've posted (and don't need to be parsed by the program), you can fork, open+dup2 to handle the output redirection, and execvp in lieu of system (provided you address the memory problems).
Kevin Barry

linuxdev817 · 08-26-2011, 02:01 PM

I guess by "find an alternative" I mean that maybe I should consider analyzing the running processes for ntpd in more of a manual fashion and not use pgrep. The dmidecode call is a one time thing that I can do before the memory is allocated. What is the best approach to finding a running daemon without using pgrep and system().

ta0kira · 08-26-2011, 02:23 PM

Quote:

Originally Posted by linuxdev817

I guess by "find an alternative" I mean that maybe I should consider analyzing the running processes for ntpd in more of a manual fashion and not use pgrep. The dmidecode call is a one time thing that I can do before the memory is allocated. What is the best approach to finding a running daemon without using pgrep and system().

The next-best thing, besides execvp, would be to write something like pgrep yourself and integrate it into your code. That would be a lot of work since it would involve manual traversal of /proc. Dealing with directories in C is a pain as it is.

I couldn't help but test out the "worker process" idea:

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/socket.h>


/*the loop run by the "worker process" to execute 'system' commands*/
static int worker_loop(int fFile)
{
        if (fFile < 0) return -1;

        FILE *command_file = fdopen(fFile, "r+");

        if (!command_file) return -1;

        static char command_buffer[1024];

        while (fgets(command_buffer, sizeof command_buffer, command_file))
        {
        int outcome = system(command_buffer);
        fprintf(command_file, "%i\n", outcome);
        if (outcome < 0) break;
        }

        fclose(command_file);
        return 0;
}


/*a replacement for 'system' in the main process*/
static int fake_system(FILE *fFile, const char *cCommand)
{
        static char return_buffer[16];

        if (!fFile) return -1;
        fprintf(fFile, "%s\n", cCommand);

        if (fgets(return_buffer, sizeof return_buffer, fFile))
        return strtol(return_buffer, NULL, 10); /*(assumes only digits are read)*/

        else return -1;
}


int main(int argc, char *argv[])
{
        /*create a pair of sockets for IPC between the main and worker processes*/

        int duplex[2] = { -1, -1 };

        if (socketpair(PF_LOCAL, SOCK_STREAM, 0, duplex) == -1)
        {
        fprintf(stderr, "%s: couldn't create socket pair: %s\n", argv[0], strerror(errno));
        return 1;
        }

        fcntl(duplex[0], F_SETFD, fcntl(duplex[0], F_GETFD) | FD_CLOEXEC);
        fcntl(duplex[1], F_SETFD, fcntl(duplex[1], F_GETFD) | FD_CLOEXEC);


        /*create the worker process up front while memory usage is low*/
        pid_t worker = fork();


        if (worker < 0)
        {
        fprintf(stderr, "%s: couldn't create worker process: %s\n", argv[0], strerror(errno));
        close(duplex[0]);
        close(duplex[1]);
        return 1;
        }


        /*execute the worker-process loop*/
        if (worker == 0)
        {
        close(duplex[0]);
        return worker_loop(duplex[1]);
        }


        /*cleanup/preparation in the main process*/

        close(duplex[1]);

        FILE *worker_file = fdopen(duplex[0], "r+");
        if (!worker_file) return 1;

        int outcome;


        /*proceed with the rest of the program and call 'fake_system' as needed*/

        while ((outcome = fake_system(worker_file, "echo the time is: `date`")) >= 0)
        {
        fprintf(stderr, "[outcome=%i]\n", outcome);
        sleep(5);
        }

        fclose(worker_file);
        return 0;
}

Something like that might serve as a solution while you figure out how best to eliminate system.
Kevin Barry

linuxdev817 · 08-26-2011, 03:15 PM

This is cool. Maybe I will go ahead an implement something like this in the event i have this same problem in the future.

One question, won't the worker loop terminate once it reads to the end of the stream? I don't see anything that it will loop on.

Nominal Animal · 08-26-2011, 04:43 PM

The Linux kernel uses copy-on-write when cloning (forking) a process; it does not allocate any new memory before one of the copies is changed. Even then it does the copying on a page-by-page basis. (Page size is typically 4096, 8192, or 2097152 bytes; use sysconf(_SC_PAGESIZE) to find it out in C.)

You are therefore not really running out of memory, just hitting the overcommit limit.
While vfork() does suspend the parent in Linux (and does differ from fork()), it does not help you in this case. (I don't think it is possible to use the clone() system call (underlying fork()); I believe it will always duplicate the address space anyway.)

You could just change the overcommit ratio, although it is kind of a big hammer to use. It is much simpler than setting up a large swap area, though. File /proc/sys/vm/overcommit_ratio defines the amount of overcommit allowed as percentage; the default is 50, which means you can allocate half as much again as you have RAM and swap total. In your case, you'd want to set it to a bit over one hundred, maybe 120 or so.

Your best bet is to fork a child before allocating any memory, and communicate using a two-way socket created using socketpair(). Let the initial child read the command and parameters from the socket, and fork to run the external command. Let it detect when the (other end of the) socket closes, and exit; that way it will automatically exit when your program exits. If you want to be able to run more than one external command in parallel, let the initial child create a new pipe for each external command, and pass the read-end descriptor to the main program. See cmsg man page for a descriptor passing example using sendmsg and recvmsg. (The cmsg man page shows how to create the message header structure, to pass the descriptor(s) between processes.)

If you want, I could probably whip up an example C code to show how it is done.

johnsfine · 08-26-2011, 05:47 PM

Quote:

Originally Posted by Nominal Animal

You could just change the overcommit ratio, although it is kind of a big hammer to use. It is much simpler than setting up a large swap area, though. File /proc/sys/vm/overcommit_ratio defines the amount of overcommit allowed as percentage; the default is 50, which means you can allocate half as much again as you have RAM and swap total. In your case, you'd want to set it to a bit over one hundred, maybe 120 or so.

One could argue that messing with the overcommit settings is "simpler" than adding swap space. If you fully understood the overcommit settings, that would even be true.

The reason I said adding swap was simpler is that it is too easy to misunderstand the overcommit settings, as you (Nominal Animal) have demonstrated.

Yes, the problem could be fixed by messing with overcommit settings. No, it could not be fixed by the specific change suggested by Nominal Animal.

Fixing it by an algorithm change (to avoid the need to fork while owning a lot of memory) may ultimately be better. But if I had code that would be working now if I just added some swap space, I'd add some swap space, rather than recode it.

ta0kira · 08-26-2011, 07:51 PM

Quote:

Originally Posted by linuxdev817

One question, won't the worker loop terminate once it reads to the end of the stream? I don't see anything that it will loop on.

It will terminate when EOF is reached for the socket, which is why you leave the socket open in the main process until it no longer needs the worker process. Until then it will block on fgets until a newline is read, enter the loop body, then block again.

Note that this method will fail miserably if the command sent to the worker has a newline. This can be fixed by replacing or removing newline characters before fprintfing in fake_system, if it concerns you.
Kevin Barry

linuxdev817 · 08-26-2011, 08:16 PM

Quote:

Originally Posted by Nominal Animal

The Linux kernel uses copy-on-write when cloning (forking) a process; it does not allocate any new memory before one of the copies is changed. Even then it does the copying on a page-by-page basis. (Page size is typically 4096, 8192, or 2097152 bytes; use sysconf(_SC_PAGESIZE) to find it out in C.)

You are therefore not really running out of memory, just hitting the overcommit limit.
While vfork() does suspend the parent in Linux (and does differ from fork()), it does not help you in this case. (I don't think it is possible to use the clone() system call (underlying fork()); I believe it will always duplicate the address space anyway.)

You could just change the overcommit ratio, although it is kind of a big hammer to use. It is much simpler than setting up a large swap area, though. File /proc/sys/vm/overcommit_ratio defines the amount of overcommit allowed as percentage; the default is 50, which means you can allocate half as much again as you have RAM and swap total. In your case, you'd want to set it to a bit over one hundred, maybe 120 or so.

Your best bet is to fork a child before allocating any memory, and communicate using a two-way socket created using socketpair(). Let the initial child read the command and parameters from the socket, and fork to run the external command. Let it detect when the (other end of the) socket closes, and exit; that way it will automatically exit when your program exits. If you want to be able to run more than one external command in parallel, let the initial child create a new pipe for each external command, and pass the read-end descriptor to the main program. See cmsg man page for a descriptor passing example using sendmsg and recvmsg. (The cmsg man page shows how to create the message header structure, to pass the descriptor(s) between processes.)

If you want, I could probably whip up an example C code to show how it is done.

I believe ta0kira already posted something similar to this.