Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I am running Fedora on an AMD 64 machine (so I can easily use 8-byte long doubles!) and writing my own C code for physics research. I have an application that won't run in the background consistently, although it will run in the foreground just fine. It uses the GNU scientific library and has a long running time, typically several hours. I've tried compiling it both with optimization level 3 and default optimization with debugging information. Those options don't seem to make any difference. The critical difference seems to be whether I run it in foreground or background. In my current version, I am printing a couple of lines every iteration, with 10,000 iterations total. In the background version I redirect standard out and error together into a file:
./a.out >run000.txt 2>&1 &
In the foreground version, I don't redirect but just let the output come to the terminal window.
When running in background, I occasional display the end of the output file, just to see how much progress is being made.
Different runs, which should be deterministic and identical, will stop after anywhere from 32 to 9700 iterations, with no apparent pattern to the length of time before the program stops outputting data. On termination, there is no message of any kind in the output/error file.
While looking for answers, I ran "top", and discovered that before the process actually died, it sat for several minutes, maybe as must as an hour, in the "Sleeping" state, before it finally disappeared as a process. I don't know if that's how the code dies every time, but I did catch it doing that once, due to lucky timing of my observations of the output from "top".
The code is long, but fairly simple--I am not explicitly calling any sleep() type routines or implement anything with explicit multiple threads. Of course, that may be happening in a library that I am calling without my knowledge. I use some variable length memory structures, but only a few tens of Megabytes on a machine with 4 Gigabytes of RAM.
Does any of this ring a bell? The only explanations I can come up with are these:
1) There is a timeout setting in file IO and the writing to standard out at some point times out and the program sleeps waiting for the write to complete
2) There is a setting for user-activity timeout that eventually kills the background version because there is insufficient terminal/standard in activity that differs between foreground and background tasks.
3) There is a bug in the code that causes it to crash randomly and I am just unlucky that it has only occured so far in background invocations and never occured in foreground invocations.
I need some other ideas of possible causes and solutions or at least strategies to figure out what is going on. I am also the (inexperienced) sys admin on two of these machines and support a small group of researchers who use them as remote computing servers. I am waiting for some feedback from them on any timeout/sleeping problems they have been having. But, their usage on these machines is so light that they may not have stressed them the way I have.
Before I go "instrumenting the code" to pinpoint the error, I wanted to post this on a forum to see if there are some system administration issues to do with setting activity timeouts, power conservation, i/o buffer lengths.
Thanks in advance for reading this and thinking about how to help.