LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Global Variable updating from multithread jobs (https://www.linuxquestions.org/questions/linux-general-1/global-variable-updating-from-multithread-jobs-4175515004/)

Ztole 08-15-2014 06:33 PM

Global Variable updating from multithread jobs
 
So there is probably a much better way to do this but I am not seeing it.

What my end goal is to have a shell script run multiple jobs in parallel but limit the total number of jobs. I am hoping to avoid relying on searching for process as it seems silly to not be able to count within the script. So here is a sample to illustrate:
------------------
#/bin/bash

function setvariables {
testarray=("test 1" "test 2" "test 3" "test 4" "test 5")
testvar=0
}

function main {
for t in "${testarray[@]}"
do
while [ $testvar -gt 2 ]
do
sleep 1
echo while loop shows $testvar
done
(testing $t; testvar=$(($testvar - 1)) ; echo $testvar) &
testvar=$(($testvar + 1))
sleep 1
done
wait
}

function testing() {
echo $1
sleep 3
}

old_IFS=$IFS
IFS=$'\n'

setvariables
main

IFS=${old_IFS}

exit 0
------------------

My hope was this would print 1 string in the array every second until $testvar was incremented to 3 and then be stuck in a while loop until $testvar was decreased back below 3. The problem is getting the variable decreased only AFTER 1 of the threaded jobs finishes ie "(testing $t; testvar=$(($testvar - 1)) ; echo $testvar) &"

With this test code the output basically shows that the $testvar set in main increases to 3 and never drops down while the subshell for each thread will decrease this variable based on the current parent variable at run time.

I appreciate any ideas out there!

-Will

rigor 08-17-2014 08:07 PM

Hi Ztole!

I can't tell from what you've done, with what information you are familiar, versus what you're not familiar.
So pardon me if I am telling things you already know.

I've put together three simple bash scripts, which I hope will illustrate various concepts
that might be related/helpful in what you're doing.

The following is the code for the various scripts.

prog1.bash:

Code:

#!/bin/bash

a_variable=1


echo -e '\nprog1:  The value of a_variable, after I set to it 1 ='  $a_variable
echo 'prog1:  The value of BASH_SUBSHELL ='  $BASH_SUBSHELL

echo -e '\nprog1:  NOT exporting the value of a_variable'

( echo -e '\nsubshell in prog1, to run prog2:  The value of BASH_SUBSHELL ='  $BASH_SUBSHELL  ;  ./prog2.bash ) &

echo "prog1:  ran prog2 in pid $!"

wait

echo -e '\nprog1:  After run of prog2, the value of a_variable ='  $a_variable
echo 'prog1:  The value of BASH_SUBSHELL ='  $BASH_SUBSHELL

echo -e '\nprog1:  NOT exporting the value of a_variable'

(  echo -e '\nsubshell in prog1, to run prog3:  The value of BASH_SUBSHELL ='  $BASH_SUBSHELL  ;  ./prog3.bash ) &

echo -e "\nprog1:  ran prog3 in pid $!"

wait

echo -e '\nprog1:  After run of prog3, the value of a_variable ='  $a_variable
echo 'prog1:  The value of BASH_SUBSHELL ='  $BASH_SUBSHELL


prog2.bash:

Code:

#!/bin/bash



echo -e "\nprog2:  running as pid $BASHPID"

echo -e "\nprog2(PID:$BASHPID)  The value of a_variable, before I set it = $a_variable"
echo "prog2(PID:$BASHPID):  The value of BASH_SUBSHELL = $BASH_SUBSHELL"


a_variable=5

export a_variable

echo -e "\nprog2(PID:$BASHPID)  The value of a_variable, after I set it to 5 = $a_variable"

echo -e "\nprog2(PID:$BASHPID)  EXPORTED the value of a_variable = $a_variable"

( echo -e "\nsubshell in prog2(PID:$BASHPID), to run prog3:  The value of BASH_SUBSHELL = $BASH_SUBSHELL"  ;  ./prog3.bash ) &

echo -e "\nprog2(PID:$BASHPID):  ran prog3 in pid $!"

wait

echo -e "\nprog2(PID:$BASHPID):  After run of prog3, the value of a_variable = $a_variable"
echo "prog2(PID:$BASHPID):  The value of BASH_SUBSHELL = $BASH_SUBSHELL"

and

prog3.bash:

Code:

#!/bin/bash



echo -e "\nprog3:  running as pid $BASHPID"

echo "prog3(PID:$BASHPID):  The value of a_variable, before I set it = $a_variable"
echo "prog3(PID:$BASHPID):  The value of BASH_SUBSHELL = $BASH_SUBSHELL"


a_variable=9

echo "prog3(PID:$BASHPID):  The value of a_variable, after I set it to 9 = $a_variable"


When I run ./prog1.sh the output looks like this:

Code:

prog1:  The value of a_variable, after I set to it 1 = 1
prog1:  The value of BASH_SUBSHELL = 0

prog1:  NOT exporting the value of a_variable
prog1:  ran prog2 in pid 4684

subshell in prog1, to run prog2:  The value of BASH_SUBSHELL = 1

prog2:  running as pid 4685

prog2(PID:4685)  The value of a_variable, before I set it =
prog2(PID:4685):  The value of BASH_SUBSHELL = 0

prog2(PID:4685)  The value of a_variable, after I set it to 5 = 5

prog2(PID:4685)  EXPORTED the value of a_variable = 5

prog2(PID:4685):  ran prog3 in pid 4686

subshell in prog2(PID:4686), to run prog3:  The value of BASH_SUBSHELL = 1

prog3:  running as pid 4687
prog3(PID:4687):  The value of a_variable, before I set it = 5
prog3(PID:4687):  The value of BASH_SUBSHELL = 0
prog3(PID:4687):  The value of a_variable, after I set it to 9 = 9

prog2(PID:4685):  After run of prog3, the value of a_variable = 5
prog2(PID:4685):  The value of BASH_SUBSHELL = 0

prog1:  After run of prog2, the value of a_variable = 1
prog1:  The value of BASH_SUBSHELL = 0

prog1:  NOT exporting the value of a_variable

prog1:  ran prog3 in pid 4688

subshell in prog1, to run prog3:  The value of BASH_SUBSHELL = 1

prog3:  running as pid 4689
prog3(PID:4689):  The value of a_variable, before I set it =
prog3(PID:4689):  The value of BASH_SUBSHELL = 0
prog3(PID:4689):  The value of a_variable, after I set it to 9 = 9

prog1:  After run of prog3, the value of a_variable = 1
prog1:  The value of BASH_SUBSHELL = 0

I hope they show these concepts:
  • Environment. You can almost think of the environment as where the variables are that you called "global". Each shell has it's own "environment". So even if you define a variable in one shell, if that shell runs another shell, without using the BASH export to "copy" the value of the variable to the new shell's environment, the new shell usually won't be able to see the value set by the first shell. Values set in the environment of a second shell run by a first shell, are not set in the environment of the first shell; the first shell has it's own values. You can see that in the output I've included.
  • Process usage, and accessing process ID's from BASH. When you run a second BASH script from a first BASH script in the way you did, you more or less create a process, which then creates another process to run the bash script. But keeping the relationship of those processes in mind, one way or another, you can access the Process ID's from the BASH scripts. A BASH script can access it's own Process IDentifier, the PID of the process that ran it ( not necessarily the BASH program that ran it ), and the PID's of processes that it runs.

If you are new to Linux, you might want to keep things simple, and perhaps try to keep track of the number of processes your BASH program has running, using BASH's built in jobs command.

Just as a trivial example, if I run three sleep command in the background, telling them each to "sleep" for a different number of seconds,
I can see them using the jobs commands:

Code:

> sleep 180 &
[1] 5061
> sleep 240 &
[2] 5065
> sleep 360 &
[3] 5069
> jobs -l
[1]  5061 Running                sleep 180 &
[2]-  5065 Running                sleep 240 &
[3]+  5069 Running                sleep 360 &

This can work not just from a interactive/command-line BASH session, but also from running BASH scripts. Then too you can read and write files from a BASH script, so you could also track processes that way; keep the PID's of currently running jobs in files, or a single file, if you are careful how and when you write to the single file. But you'd also have to be very careful to make sure that if a program fails, something still knows the program has completed.

If you don't mind getting more deeply into Linux, there are ways processes can talk to one another. So it would be possible to have one program monitor another, etc.

If you can can give us some idea with what sorts of BASH capabilities you're familiar, maybe we can give you a better idea how to approach what you're trying to do, using concepts with which you are comfortable. For example, were you aware of the issue with the use of the export command, and the use of the environment?

HTH.

Ztole 08-17-2014 08:31 PM

Thanks for the reply Rigor! I definitely track what you have demonstrated. Which is the crux of my problem. I have been working on a few alternatives which rely on tracking either the pid or job count, however it just feels overly complicated. I was hoping there was a way to wait for a return value or explicitly setting it, instead of waiting for tracking the jobs. In the long run there may be no difference, but I am imagining something such as the system being run out of resources (by someone else's out of control code of course!) causing the script to process in error simply due to an invalid response.

Also, it is hard to quickly demonstrate knowledge but i will say i am an experienced systems engineer who has written MANY scripts for automation purposes. That being said I am certainly NOT a coder/developer.

I am fortunate to be on vacation this week so if i have a slow reply to anything please forgive me :)

Thanks!
-Will

rigor 08-23-2014 12:24 AM

Hi Will,

Normally, when someone asks a question involving BASH, I'm expecting that BASH is what they prefer to use, to do whatever it is that they are doing.
I'm expecting that if they know for example, PERL, and they wanted to use PERL, they would have asked the question in terms of PERL. So if they ask about something in terms of BASH, and I have anything to contribute on the subject, I'll try to give them something in terms of BASH.

I could be missing something here, but very roughly, I suspect it might be about fair to say that if I listed a few languages that can monitor/control processes, and ordered the list from least direct/simple/complete control available in the language, to most direct/simple/complete control, the list would be BASH, GAWK, PERL, C. In a Unix or Unix-like environment, C tends to have most of the process monitoring/controlling capabilities that the Kernel does. As a result, of those languages, it would be my first choice for any detailed process monitoring/controlling.

So I wasn't asking you to demonstrate knowledge. Instead I was wondering with what features of BASH, you might be comfortable, or, if you are in a position to accomplish your goal using some language other than BASH script.

If you are just talking about executing programs, and limiting the number of programs executing, you could do something simple in BASH, such as:

Code:

job_list=(  './prog_a.bash'  './prog_b.bash'  './prog_c.bash'  './prog_d.bash'  './prog_e.bash'  './prog_f.bash'  './prog_g.bash'  './prog_h.bash'  ) ;

next_job=0 ;
jobs_in_list=8

job_count_limit=3 ;

job_count=`jobs -p | wc -l` ;

while :
    do

        job_count=`jobs -p | wc -l` ;

        echo "`date +'%Y/%m/%d@%H:%M:%S'`  Count of jobs running:  $job_count"

        while  [[  $job_count  -lt  $job_count_limit  ]]
            do

                echo "`date +'%Y/%m/%d@%H:%M:%S'`  Running ${job_list[ $next_job ]}"
                (  ${job_list[ $next_job ]}  )  &
                next_job=$(( $next_job + 1 ))
                next_job=$(( $next_job % $jobs_in_list ))
                sleep 3
                job_count=`jobs -p | wc -l` ;
                echo "`date +'%Y/%m/%d@%H:%M:%S'`  Count of jobs running:  $job_count"

            done

        sleep 60

    done

Naturally the "round robin" job list is an over simplification, unless you're trying to write something like cron.

It now sounds almost as if you're trying to write something like an at Daemon.

If you wanted almost anything beyond simply limiting the number of programs running, you might get into trouble, or at least deep Kludge, pretty quickly with BASH.

For example, if you wanted the programs run by the BASH script to be killed off, when the BASH script is killed off, you might add one or more traps to the BASH script.

Something like this code:

Code:

running_job_sets=`jobs | awk -F'[\\\]\\\[]' ' { printf "%%%s " , $2 ; } END { print "" ; } '`  ;
kill  $running_job_sets  ;

can work from an "interactive" BASH shell; it can kill off not only something run from the BASH script, but also other programs that those programs run.

But run it from a BASH script running in the "background", and you'll tend to only kill the programs run directly by the BASH script.

If you run BASH in the background, you can pass the argument to BASH itself to tell it that you want to run BASH interactively, and even try to connect BASH to pseudo-tty's to make BASH "think" it's interactive.

You can try to make use of other Inter-Process Communication facilities in Linux, for use with your BASH script.

Yet in a sense with those approaches, you're more or less trying to add-on, or make facilities available to your BASH environment, that are already directly available in C.

If you are worried about the System being run out of resources, you might want to look into placing resource limits on the jobs that you will be running. Unless, your goal is ultimately to monitor resource usage, and react accordingly, rather then impose limits; there are facilities in C for monitoring resource usage of child processes.

Although I have repeatedly been impressed with how quickly some language processors have been able to run programs written in "scripting languages", a similarly written compiled program running "native code" is still hard to beat.

Imagine a so called "Rabbit Job" that starts quickly spawning one process after another process. If by the time you've gotten a particular PID and tried to kill off one process, it's gone, and there's a different process with a different PID in its place, naturally the program trying to control that situation needs to be pretty speedy.

That's another thing to recommend C if you are dealing with monitoring of processes that might not be "well behaved". Naturally process states can change rather quickly, so you don't want your program to be chasing it's own tail, if it doesn't react sufficiently fast, it can effectively create its own "race conditions".

If I still don't have a good idea of what you're trying to accomplish, maybe you could give us some additional details, to help us, help you better.

HTH.

Ztole 08-26-2014 05:55 PM

Hi Rigor, good info and I appreciate the help. I ended up relying on a job count which to be honest I saw here http://stackoverflow.com/questions/1...oncurrent-jobs. This feels backwards to me but I just got back from vacation and now i am on a time crunch ha!

Here is the base i added:

for path in "${sourcefolders[@]}"
do
lognum=$((lognum + 1 ))
mkdir -p "$path/large_files"
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 20 ))
do
sleep 5
joblist=($(jobs -p))
done

filelist $path $lognum &
done
wait

This is ultimately what i landed on to traverse a file system, find files over 15GB in size and move them to a new folder based on their original location. Source directory paths have been been changed to protect their identity :)

Code:

#!/bin/bash

function cleanup {
mkdir ./tmp>/dev/null 2>&1
rm -f ./tmp/* >/dev/null 2>&1
}

function setvariables {
sourcefolders=("<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>" "<directory_path>")
lognum=0
}

function main {
wctest=`echo ${#sourcefolders[@]}`
echo "------------------------------------------------------------">>auto_mc.log
echo "Processing $wctest Source Directory Streams...">>auto_mc.log
for path in "${sourcefolders[@]}"
  do
    lognum=$((lognum + 1 ))
    mkdir -p "$path/large_files"
    joblist=($(jobs -p))
      while (( ${#joblist[*]} >= 20 ))
        do
          sleep 5
          joblist=($(jobs -p))
        done
    filelist $path $lognum &
  done
wait
echo ""
if [ -e ./tmp/files_moved ]
  then
    echo -e "File\tSize (Bytes)">files_moved.csv
    cat ./tmp/files_moved | sort>>files_moved.csv
  else
    echo "No files were moved">>auto_mc.log
fi
echo ""
}

function filelist() {
ls -Ra "$1" | awk '/:$/&&f{s=$0;f=0}/:$/&&!f{sub(/:$/,"");s=$0;f=1;next}NF&&f{ print s"/"$0 }' | grep -v "$1/large_files" | sed -e '/\/\.$/d' -e '/\/\.\.$/d' | sed -r '/.DS_Store|\/\.\_/d'>./tmp/$2
filemover $1 $2
wcint=$(($wcint + 1))
echo "Source Directory $1...Done">>auto_mc.log
}

function filemover() {
while read -r fullpath
  do
    filesize=$(ls -ld --time-style=long-iso "$fullpath" | grep -v ^d | awk '{print $5}')
    if [[ ! -z "$filesize" ]] && [[ "$filesize" -gt 16106127360 ]]
      then
        echo -e "$fullpath\t$filesize">>./tmp/files_moved
        mv "$fullpath" "$1/large_files/"
    fi
  done<"./tmp/$2"
}

old_IFS=$IFS
IFS=$'\n'

cleanup
setvariables
main

IFS=${old_IFS}

exit 0



All times are GMT -5. The time now is 11:31 PM.