LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   fork: not enough memory in tcl/expect (https://www.linuxquestions.org/questions/programming-9/fork-not-enough-memory-in-tcl-expect-800978/)

bharatbsharma 04-09-2010 04:48 AM

fork: not enough memory in tcl/expect
 
Hello

I have a TCL framewrok for my test cases automation. This is been working for last 1 year. But yesterday it exited with exception when a function was called.
The exception was given as

fork: not enough memory
while executing
"spawn bash"

(procedure "runtrigger" line 36)
invoked from within
"runtrigger $fnAfter $sid_l $mapver"
("trigger" arm line 6)
invoked from within

Can you tell me what might be the reason?

johnsfine 04-10-2010 07:43 AM

I don't know what less common causes might exist. But the most common cause is that your system's ram and swap space are nearly full.

The fork operation is "committing" memory that it won't actually use (because I'm assuming bash is smaller than the process calling bash). But the Linux kernel at that moment does not know the memory won't be used.

With the default settings for managing over commit of memory, a nearly full system needs enough free swap space for each process that commits memory. It doesn't need nearly enough for the total of all memory that is committed but not used. It just needs enough for the commit of the current process.

I would expect that if your memory is that full that you don't have the swap space needed for the commit level of one more process, then you are also close to having other failures due to lack of swap space.

So I think you need significantly more swap space.

But it is hard to be sure of any of the above without knowing details of your memory use.

An entirely different possibility is that the memory limit hit is virtual within one process rather than physical and system wide. Combined with the info you posted, that would imply your TCL process has a memory leak so that after it has done too many iterations of whatever action leaks, it needs to be killed and restarted.

bharatbsharma 04-11-2010 12:48 AM

Thanks for initial quick response. But i would like to give few more details which can help pinpoint root cause.

I call a proc in my tcl/expect framework. Every time this proc is called i spawn a bash. That is if i execute 1000 test cases this proc will be called 1000 times.

proc run_tc {

spawn bash
do some steps
}

while <> {

run_tc

}

Is the error mentioned above is due to not managing "spawn bash" properly? if yes then how can this be avoided?

Valery Reznic 04-11-2010 01:12 AM

Quote:

Originally Posted by bharatbsharma (Post 3931480)
Thanks for initial quick response. But i would like to give few more details which can help pinpoint root cause.

I call a proc in my tcl/expect framework. Every time this proc is called i spawn a bash. That is if i execute 1000 test cases this proc will be called 1000 times.

proc run_tc {

spawn bash
do some steps
}

while <> {

run_tc

}

Is the error mentioned above is due to not managing "spawn bash" properly? if yes then how can this be avoided?

Available memory (i.e RAM + swap) shared among all processes.
I.e summary memory usage by all processes in the classic case should be < ( RAM + swap)
Linux (at least recent enough kernels) allows memory overcommit.
I.e summary memory usage CAN be > than (RAM + swap).
But memory overcommit is not unlimited. When total memory usage is exceed available memory + swap by some threshold, kernel will refuse to provide more memory.

So, it can be nothing wrong with your program - may be your system just run too close to this threshold and your program is happen to cross it.

Could you try to run your program on system where nothing else running ?


On the other hand, how many shell you spawned run simultaneously ?
Did one have a chance to finish it's job before another started ?

johnsfine 04-11-2010 09:25 AM

Quote:

Originally Posted by bharatbsharma (Post 3931480)
i would like to give few more details which can help pinpoint root cause.

If you have a system running in near the same condition as it was when it failed, you can use tools such as top and free to get an idea of the basic nature of the failure.

Obviously there is some circular reasoning there. You don't know what mattered to the failure so you don't really know what "near the same condition" means in my instructions above. Sometimes you just need to guess and look and think about what you see.

In top you might try typing Fo to bring processes with very high VIRT to the top. Is your tcl process one of them? Does its VIRT keep growing as it runs?

You will also see how much swap space you have and how much of that is free. If you have a few GB of swap space free or a large fraction of your Mem shown as "cached", a system wide commit limit problem is unlikely and your problem is more likely a memory leak inside the tcl code. If you have very little free swap space free, you might want to increase swap space anyway for safety and/or it might be the fix to your problem.

Quote:

I call a proc in my tcl/expect framework. Every time this proc is called i spawn a bash. That is if i execute 1000 test cases this proc will be called 1000 times. ...
Is the error mentioned above is due to not managing "spawn bash" properly?
I barely know anything about TCL, certainly not enough to help you.

If the problem is a memory leak in the TCL stuff, rather than a system wide commit limit issue, you'll need help from someone who knows more about TCL.

Quote:

Originally Posted by Valery Reznic (Post 3931490)
On the other hand, how many shell you spawned run simultaneously ?
Did one have a chance to finish it's job before another started ?

Or maybe that's the answer.

Maybe you are simply starting too many at once of whatever those bash instances do. Memory could be exhausted for that reason, or you might exhaust some other kernel resource that couldn't even be covered by adding swap space but might still appear to be memory when it fails.

Maybe when it works, you're just lucky that the TCL code spawning things doesn't get enough CPU time (competing against all the things it already spawned) to start too many more before some early ones finish. Maybe it fails if the TCL code happens to get a slightly bigger share of CPU time.


All times are GMT -5. The time now is 01:11 AM.