how to schedule an "at" job to start when another finishes?

ashlock · 12-03-2004, 11:41 AM

I set up a numerical analysis program to run groups of related jobs by using the at daemon like so;

"at -f run.bat now"

where the file run.bat contains the following:
-----
./program
mv program ../otherdir1
cd ../otherdir1
./program
mv program ../otherdir2
cd ../otherdir2
./program etc...
-------

While those jobs are running, I will be setting up new input data for more runs, which I would like to have start automatically when the above at-job finishes (usually in the wee a.m.). Currently, I estimate when the above at-job will finish based on the time for each ./program run to complete, and start the next batch of jobs using, e.g.

"at -f run_another.bat now+4hours"

but this is not always accurate, as some jobs take longer than others.

Is there any way to start an at job when another job in the at queue finishes? I don't think appending to the run.bat file would work, as run.bat is only read when the at job is submitted. If I use the batch option of the at command, I think I would end up with both at jobs running at the same time after the first completion of ./program

Hko · 12-03-2004, 04:25 PM

Try using the "batch" command instead of "at".

From "man batch" (or "man at"):

Quote:

batch executes commands when system load levels permit; in other words, when the load average drops below 1.5, or the value specified in the invocation of atrun.

nilleso · 12-03-2004, 04:27 PM

I don't think at cmds are best suited to what you're trying to achieve. You could quite easly do this within a simple script with a few if loops (google for script loop syntax)

Also have a look at command list constructs. For example:

cmd1 && cmd2 && cmd3 && cmd4 && cmd5
If cmd1 succeeds, the shell runs cmd2. If cmd2 succeeds, the shell runs cmd3, and on through the series until a command fails or the last command ends. (If any command fails, the shell stops executing the command line).

cmd1 || cmd2
If cmd1 fails, then the shell runs cmd2. If cmd1 succeeds, the shell stops executing the command line.

Something like:
mv program ../otherdir1 && cd ../otherdir1 && mv program ../otherdir2 && etc...

cheers.

ashlock · 12-03-2004, 05:31 PM

Quote:

Originally posted by Hko
Try using the "batch" command instead of "at".

From "man batch" (or "man at"):

"man batch" on my system gives the same man page as "man at" which has a batch option that starts another at job when the load limit falls below a certain value. As I mentioned above, this might result in the second at job being started during the interim when the executable is being moved to the second directory.

I will give it a try, though. Thanks for the suggestion.

ashlock · 12-03-2004, 05:50 PM

Quote:

Originally posted by nilleso
I don't think at cmds are best suited to what you're trying to achieve. You could quite easly do this within a simple script with a few if loops (google for script loop syntax)

Also have a look at command list constructs. For example:

cmd1 && cmd2 && cmd3 && cmd4 && cmd5
If cmd1 succeeds, the shell runs cmd2. If cmd2 succeeds, the shell runs cmd3, and on through the series until a command fails or the last command ends. (If any command fails, the shell stops executing the command line).

cmd1 || cmd2
If cmd1 fails, then the shell runs cmd2. If cmd1 succeeds, the shell stops executing the command line.

Something like:
mv program ../otherdir1 && cd ../otherdir1 && mv program ../otherdir2 && etc...

cheers.

Actually, you've hit on something here. I originally tried using shell scripts before I knew of the && command (which I learned of later when compiling kernels). anyways, I separated the ./program and cd .. commands with ";" or carriage returns in the shell script file, which obviously resulted in the shell not waiting for programs to complete before moving the executable and marching through the 36 or so directories.

I started using the at-queue since it naturally waits for completion of the command on each line of the text file used as input, but I think using && as you mentioned would work great.

The only problem is that I would start a shell script, say using "sh ./run.sh" where run.sh contains

-----
./program && mv program ../dir2 && cd ../dir2 && ./program
-----

although I'm not sure if this would "follow" the directory changes as does my current at queue approach (I'll have to try it).

But I still don't see how I can start the first glob of jobs in run.sh, go prepare more input data, come back 2 hours later and tell another glob of jobs in another directory with another script, run2.sh, to start when 2 hour old process run.sh finishes in another x-hours.

In other words, when I run the first shell script, the second glob of input files doesn't exist, so I can't put them in the first run.sh with more &&'s.

Am I making sense?

ashlock · 12-03-2004, 06:28 PM

OK, so far using

"batch -f run.bat"

on four different batch files simultaneously is producing the desired result; only two at-jobs at a time are running, each one resulting in an instance of ./program getting 99% of one processor, and the other two at-jobs are waiting, even during moving of the executables.

Thanks for the help!

Hko · 12-04-2004, 06:12 AM

Quote:

and the other two at-jobs are waiting, even during moving of the executables.

That doen't happen because the at-daemon waits for a load average which does not drop abruptly, but merely peters out.

In case this does happen, you can lower the load-average through arguments when the "atd" daemon is started (/etc/init.d/atd or so).

From "man atd":

Quote:

-l Specifies a limiting load factor, over which batch jobs should
not be run, instead of the compile-time choice of 1.5. For an
SMP system with n CPUs, you will probably want to set this
higher than n-1.

ashlock · 12-05-2004, 12:30 AM

I'm having trouble coaxing two jobs to run at once. Sometimes one job will run on each processor, but right now only one processor is being used, with a load average of 0.99, so I type /usr/sbin/atd -l 1.5, but this doesn't result in another batch job starting.

With one job on each processor, the load average is around 2-4, but /usr/sbin/atd -l 3 or /usr/sbin/atrun -l 3 don't cause jobs to start (even as root). Why don't these commands appear to work as documented?

dustu76 · 12-05-2004, 02:56 AM

Quote:

But I still don't see how I can start the first glob of jobs in run.sh, go prepare more input data, come back 2 hours later and tell another glob of jobs in another directory with another script, run2.sh, to start when 2 hour old process run.sh finishes in another x-hours.

In other words, when I run the first shell script, the second glob of input files doesn't exist, so I can't put them in the first run.sh with more &&'s.

I'm not familiar with at. From what I understand, the flow is:

1. Start first job
2. After 2 hours start the second job
3. The second job should check the status of first job:
If completed --> start its processing
If not ----------> may be sleep for half and hour & then check again.

This being the case, why cant it be handled by one main script which calls the two script as below:

[psuedocode]
first_script.sh & #puts the first script in background.
backproc=$!
# Let us assume that first_script.sh creates as file called /tmp/done.dat
# upon completion
sleep 7200
while [ "$(ps -p $backproc 2>/dev/null |grep -v CMD)X" != "X" ] ; do
sleep 1800 ## or anything else you might want
done
second_script.sh
[/psuedocode]

Correct me if I'm missing something here.

HTH.

dustu76 · 12-05-2004, 02:58 AM

Ignore that /tmp/done.dat etc. comment... that was part of a less elegant solution...

ashlock · 01-13-2005, 11:12 PM

atd is still not working as intended. If I start many batch jobs from different directories, and I want them to execute two at a time, I have been using /usr/sbin/atd -l 3 or /usr/sbin/atrun -l 3.

This works for the first two batch files, that is, one batch file (containing many sequential program executions) is run on each processor.

However, once the first two batches finish, the rest of the batch queue is only dispatched one at a time, resulting in only one processor being used for all batch entries from the 3rd to the final one.

If I log in and run the "/usr/sbin/atrun -l 3" or the "/usr/sbin/atd -l 3" command again, I will get one job on each processor. After those finish, it goes back to only one processor finishing up the batch queue.

the man page for atd mentions that atrun processes the batch queue only once, but shouldn't the use of atd start a daemon that would continually monitor the batch queue? Instead, it seems to only process the batch queue once.

I have searched and searched for a solution, but there are no configuration options for the at daemon.

There is something called the portable batch system (PBS) out there, for using 100's or 1000's of different machines, but I think that is overkill for what I am trying to do when the batch daemon should work for this.

Do I need to enable the batch queue to migrate across processors? that is, if I type "batch -f run.bat now" in a bunch of directories within the span of a few minutes, the 2nd through the last batch jobs will essentially be submitted by one processor, as the first processor will be at 100% after submission of the first batch job. then when the first two batches finish, are the remaining jobs only allowed to be started on the processor that was used to submit the batch job?

Essentially, if I could permanently set the load level threshold to 3 as happens only temporarily when using "atd -l 3", then I might get the rest of the batch queue to execute on 2 processors instead of one. Currently, using "atd -l 3" only seems to work for one pass through the batch queue.