LinuxQuestions.org - killing child processes of a bash script results in strange random kills

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - killing child processes of a bash script results in strange random kills (https://www.linuxquestions.org/questions/programming-9/killing-child-processes-of-a-bash-script-results-in-strange-random-kills-532527/)

killing child processes of a bash script results in strange random kills

Hi. I've been doing bash programming for quite a while but I really can't figure out what's the problem here. This is what I'm trying to do:

I have multiple instances of a bash script running, let's say the name of the script is "myparent". Each instance launches external commands like tail or cat, and both the myparent processes and the child processes are continuously running (ok I mean myparent doesn't exit after forking subshells, and the external commands don't exit also, they are more like tail -f). In other words, the process tree would look like this:

myparent
\_ child1
\_ child2
myparent
\_ child1
\_ child2
[...]

So far so good.

Now I have a block code which is supposed to terminate all this stuff, by finding the parent processes and then the child ones, and kill them. This is it:

Code:

# find the pids of myparent processes (based on the NAMES)

PARENTS="$(ps axu | grep -e "myparent" | grep -v "grep" | awk '{ print $"2" }')"



# find the pids of the child processes.

for CANDIDATE_PROC in ${PARENTS}

do

    #  lets get its children (the syntax should work with all ps's)

    CHILDREN_AND_PARENTS="$(ps ax --format pid,ppid,command | grep -e ${CANDIDATE_PROC} | grep -v "grep" | awk '{ print $"1" }')"

    # lets sum parents and children

    PROCS_TO_KILL="${PROCS_TO_KILL} ${CHILDREN_AND_PARENTS}" 

done



# now lets kill all the processes

for ONE_PROC in ${PROCS_TO_KILL}

do

    kill -0 "${ONE_PROC}" 2> /dev/null && kill "${ONE_PROC}"

done

And it works, most of the time. But sometimes, when I run the above code it just kills processes which aren't related to myparents or their children, destroying the apps around (like the messenger, the terminal, or even the X session).

Notes
------
1. PROCS_TO_KILL : when is used for the first time, is undeclared but I don't think this is the problem here.
2. I can't use in myparent useful tricks like: child1 & SOME_PID=$!
3. Of course there are some preliminary tests before the above code, to see if myparent is running, but I didn't include them here.
4. I thought that sometimes garbled variable content is passed to kill (like \n or other stuff), but using kill "$(echo ${ONE_PROC})" also doesn't solve this problem.
5. Could this happen because in some situations awk may exit before ps/grep finished outputing?

Any ideas why this happens and kill chooses "random" targets (processes) ?

Maybe use session ID's?

Code:

echo "Hello, I am $$."; for pid in `pgrep -s $$`; do

 [ $pid -ne $$ ] && echo "child $pid : `readlink -f /proc/$pid/exe`"

done

Excellent. Thank you, you gave me 2 good ideas - to use pgrep (or directly pkill) which will reduce all the above code to maybe only 2 lines, and to use session ID's - so that only the procs in this range will be killed. I'm quite sure this will fix it - too bad I can't find a decent explanation why the old code was failing *sometimes* on linux (I use a hardened kernel btw (PAX+GRSEC (with filesystem and proc protection and randomisation)). Thanks again.

too bad I can't find a decent explanation why the old code was failing *sometimes* on linux (I use a hardened kernel btw (PAX+GRSEC (with filesystem and proc protection and randomisation)).
Well you could run some debug runs (set -x) and look at the output, maybe it'll explain why. If not maybe post output from a failed debug run here (in BB "code" tags please).

I strongly doubt GRSec or PAX are to blame for any of this.
If they where you'd seen stuff in the logs, or so I'd hope.

One explanation I can think of is related to:

Code:

grep -e ${CANDIDATE_PROC}

This will also get the pids which *contain* ${CANDIDATE_PROC}, i.e. "grep -e 2451" will also get 24513 or 12451 as a result. One workaround would be to sorround the pid with white spaces:

Code:

grep -e " ${CANDIDATE_PROC} "

Thanks.

Quote:

grep -e "myparent" | grep -v "grep"

you can get round this like so:

grep -e "[m]yparent"

it seems you are going to a lot of trouble here - for what reason? ;)

Quote:

Originally Posted by bigearsbilly

you can get round this like so:

grep -e "[m]yparent"

That was smart, thanks.

Quote:

Originally Posted by bigearsbilly

it seems you are going to a lot of trouble here - for what reason? ;)

There are many answers to this, depending on what you were actually asking.:cool: