Need help running program in background in BASH
Somehow, I managed to create a program - in C, if that matters - which solves a math differential equation (don't worry if you don't understand that). We need the program to run for a very long time (multiple weeks!) on a cluster. I need to figure out how to run it in the background, since it seems my fine ISP doesn't seem to want to give me a consistent internet connection (and I want to disconnect and watch the new Arrested Development during the next few weeks). I am trying to do so using the "&" and "screen" features of BASH, but having very little luck. I am connecting to the cluster using SSH from a Cygwin terminal on my fine Windows Vista PC.
First off, how do you get the "&" feature (I don't know what it's called) to work right? I'm told the "&" after a command should execute that command in the background. But when I try to use that, the background job stops the moment I try to try to do something else. Then, I have to call the job back into the foreground (using "fg"), but then it's no longer running in the background and I have to wait for it to finish. For example: Code:
prompt>mpirun 4 MYPROG 6 0 & Also, what happens if a background program has some output, or requires input? I can find no documentation on "&" and could use a good, clear explanation. Second (is this post too long?), how do I use the "screen" command? I have searched the web far and wide and I see that you create a new screen with "screen -S SCNNAME" which works fine. I checked that it works by adding $STY to my prompt. But then everything I read says to switch screens you type "Ctrl-a c" or "Ctrl-a n". But none of those Ctrl-a sequences are working for me? Am I using screen in an environment that doesn't support it? Am I reading the right instructions? Am I doing something else wrong? Please tell me how to use "screen" and where I can get good documentation for it(not the "screen --help" or "man screen" crypto-help). So how can I get my long running program to run so that I can disconnect and come back later, and still see the relevant output? TIA. |
man page:
Screen does not understand the prefix "C-" to mean control, although this notation is used in this manual for readability. Please use the caret notation ("^A" instead of "C-a") as arguments to e.g. the escape command or the -e option. Screen will also print out control characters in caret notation. |
I'm not sure what you're trying to tell me. Can you clarify please?
|
If you're going to run a program in the background via '&', you have to re-direct any output not already directed to a file ( eg stdout & stderr) to a file.
It should NOT require any input or it will likely hang waiting for the input. You'd have to bring it back to the foreground to do input.... If you want it to continue after you have logged out, you need to prefix with 'nohup' eg Code:
nohup ./myprog >myprog.log 2>&1 & |
I am using "screen" in place of "nohup". I understand where you put ">myprog.log" at the end of the execution statement - that is to send output to the file myprog.log. What does "2>&1" do? What do I search on to read about that?
|
File handles numbers 0,1 and 2 are respectively the standard input, standard output and standard error messages. These are created by default and are available to every process.
By your command line, the output (file handle 1) stands redirected to myprog.log 2>&1 redirects the standard error messages to the standard output which already stands redirected to myprog.log So myprog.log contains BOTH the standard output and standard errors. Thus all outputs are redirected and won't wait indefinitely for a non existent output device. Questions to answer. (1) Whats the role of & in 2>&1? (2) Would 2>myprog.log work as well? OK |
Thanks, Anantha.
The output is not working quite right. I'm running the program as Code:
mpirun -machinefile $HOME/utils/Host_file -np 30 MYPROG 6 4 >MYPROG.log 2>&1 To answer your questions: (1) The role of "&" is obviously some type of delimiter which tells linux not to interpret "1" as a file name. (2) My first impression is that would work, but I'm guessing this is a trick question and there would be some file conflict issues. By redirecting standard error to standard output, it would combine the two outputs. |
In file descriptors/redirections, '&' represents a 'file duplicator'. In natural language terms it could be translated as 'the same place as'.
>MYPROG.log 2>&1 means " send stdout to MYPROG.log, and also send stderr to the same place as stdout's current setting (i.e. also to MYPROG.log). File descriptors are defined for the main command process launched on the line, mpirun in this case. redirections and file descriptors explained Since you've already mentioned screen though, I think you should forget about backgrounding anything and just use a dedicated screen session for it. Set it running, detach it, and open a new terminal for general use. You can re-attach to it at any time for control and monitoring. It's exactly the kind of thing screen was designed for. (Don't ask me how to do it though, I don't have much experience with screen either. ;)) |
If you are using a cluster, then normally the cluster configuration includes a batch process that will do this for you (though this depends on the cluster - a cheap thrown together cluster might not... but then it also isn't really a cluster as it is more just a bunch of nodes on a net).
You can also read the man pages on "batch" and "cron" (batch uses cron to implement a simple batch queuing system). |
Quote:
In the meantime, do you have any idea why MYPROG.log is not receiving output? It seems like the first few lines are sent to that file, but no further output. It seems like the OS might be buffering what it writes to MYPROG.log. But it's now 20 hours since I submitted the job - it is clearly still running because it is creating new output files - but there are no more messages being sent to MYPROG.log. If the file is being buffered, then it will be complete when the job finished (which will be about 30 hours from now). But if the job hangs - like it did last time - I get no further output. Any idea what I can do about that? I hope I'm describing it well. As to using screen (whoever is reading this thread), if I submit the job in the foreground how do I get out of the current screen? Using Ctrl-a c isn't working for me, and I don't understand what linosaurusroot, in the second post, was trying to tell me. |
Quote:
Yes, the MPI (multiple programming interface?) submits the job to all of the cluster's processors. But the command line that issues the mpi command Code:
>mpirun -machinefile $HOME/utils/Host_file -np 30 MYPROG 6 4 >MYPROG.log 2>&1 The file MYPROG.log has received the first few lines of the standard output, but nothing more. Also, if the job hangs - which it did after 30 hours the last time I ran it - I never get the rest of the output. Is it buffering the output before writing to MYPROG.log? Did it stop sending output to MYPROG.log when I changed the job to the background (using Ctrl-z)? Thanks. |
Most clusters will be running a batch system, something like PBS or LFS. There are others:
http://en.wikipedia.org/wiki/Job_scheduler specifically the section on queuing for HPC clusers. .... IF you just use Ctrl-z you didn't put the process in background - you just suspended it. If you want it to continue running you need to use the "bg" command to resume it. |
Well, it seems to keep running after I used Ctrl-z. Using redirection and "&" didn't supply me with adequate output (Ctrl-z may not, either).
|
The only process suspended is the one attached to the terminal. Any processes spawned by the mpirun before the suspension will still be running. But since status data will still be sent back to the mpirun control process will be unprocessed (and if UDP, could be dropped).
It is the logging that you want one of the cluster based batch systems - errors from remote processes may get lost otherwise. |
I think I get the first paragraph - basically, my terminal is suspended, but not running in the background. The cluster's processors are running the "mpispawn" jobs in the background and sending output back to the suspended terminal. The terminal stores the standard output and standard error so that when I call it to the foreground, I can see it. But if the job dies before I return it to the foreground, I will miss the important why-it-died information.
Can you clarify your second paragraph? |
All times are GMT -5. The time now is 11:43 PM. |