ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am in an interesting situation with a program that I use and that has no support.
This program has a very bad habit of not terminating correctly and ends up leaving orphaned child processes and potentially a zombie of itself. If not removed, when the parent process is restarted these will cause a failure of the entire program and unfortunately these child processes require a SIGKILL command to be sent to clear them out.
I am trying to figure out a way to script this in BASH but I am not having very much luck.
Here is my current attempt and idea
-Shutdown script is initialized
- Get the process tree from the parent PID using 'pgrep'
- Use the normal program commands to attempt a termination
- Once the program has been "stopped" search for orphans and the parent process
- Kill the process if it is found to still be running
This is the code I have so far:
Code:
#!/bin/bash
index=0
quit=0
ids[0]=$(pgrep ****)
while [ $quit -eq 0 ]
do
((index++))
# get all child processes spawned by this/these ppid/s
ids[$index]=$(ps -o pid --ppid ${ids[$index-1]} | \pcregrep '\d+' | tr \\n ' ')
# if no child processes found
if [ ! "${ids[$index]}" ]
then
# quit
((quit++))
fi
done
echo -n $"Shutting down... "
<program shutdown command>
sleep 60s
echo -n $"Stopping Databases... "
<Database shutdown command>
sleep 30s
echo -n $"Killing any remaining processes... "
# kill process from parent to all child processes
for i in $(seq 0 ${#ids[@]})
do
if [ "${ids[$i]}" ]
then
kill -9 ${ids[$i]}
fi
done
echo -n $"Shutdown Complete."
RETVAL=$?
[ $RETVAL -eq 0 ]
;;
restart)
$0 stop
sleep 90s
$0 start
;;
*)
echo $"Usage: $0 {start|stop|restart}"
RETVAL=1
;;
esac
exit $RETVAL
However all this does is spit back errors about being unable to kill processes.
I am in an interesting situation with a program that I use and that has no support.
This program has a very bad habit of not terminating correctly and ends up leaving orphaned child processes and potentially a zombie of itself. If not removed, when the parent process is restarted these will cause a failure of the entire program and unfortunately these child processes require a SIGKILL command to be sent to clear them out.
This is in fact more common than you might think.
What you can do, is start the program in a new session using setsid program[args..] . The session ID and the process group ID will be the PID of the initial program. Then, you can kill the entire session at once using kill -s TERM -PID or kill -s KILL -PID , killing all processes in that group, including any child processes started.
The only real trick is to save the session ID or process group ID. The easiest is to just grab the PID after the setsid call, via a wrapper script, say /usr/local/bin/savepid :
You could replace the wrapper script using a C program, that retains the CAP_NET_BIND_SERVICE capability (ability to bind to ports < 1024, see man capabilities) while otherwise switching to user user; starting a new session, saving the session/process group ID, and running the actual program. Something like that is very useful in running Java-based services without giving them extra privileges.
To see if the process is still alive, using Bash:
Code:
PID=$(($(exec 2>/dev/null ; cat /var/run/program.sid)))
if [ $PID -lt 1 ]; then
echo "Not started: no /var/run/program.sid file." >&2
else [ -z "$(ps ax -o sid= | grep -e "^$PID\$" ]; then
echo "Exited cleanly. Removing /var/run/program.sid file." >&2
rm -f /var/run/program.sid
elif [ "$PID" = "$(ps -o sid= -p $PID)" ]; then
echo "Still running nicely." >&2
else
echo "Leader died. Killing the group $PID." >&2
kill -s KILL -$PID
rm -f /var/run/program.sid
fi
It should not be too difficult to translate that into a service script?
Last edited by Nominal Animal; 05-22-2012 at 04:01 PM.
Thanks for the reply. Unfortunately i was a little vague initially since the program gets forked to a service when it is started. However, I see where you are going with the idea and think I can adapt to it
@cheesus
I can tell that the processes are being killed because with the current loop I get errors when it tries to kill the processes that are already dead when going through the array. I am trying to clean it up so that this is not seen since it clogs up the logfiles
Thanks for the reply. Unfortunately i was a little vague initially since the program gets forked to a service when it is started.
It should still work, because -pid actually refers to process group pid. Even if the leader (the one with pid pid) forks and exits, the process group will still remain (under the same name), and all its children will still belong to that process group. All you need to change is the logic on how to detect whether the master is still alive or not. The killing part should work just fine either way.
To kill only still alive pids without error messages, you can use
Code:
kill -KILL $(ps -o pid= -p $pidlist) 2>/dev/null
or, if you want to be careful, and make sure you only kill those processes if they still belong in the original process group pid,
Since there is a race window -- the process might exit between the ps and the kill command --, it is best to discard all error messages, and instead check if the processes won't die.
Using a delay, say calling a suitable ps command in a loop with .1 seconds in between (to see if the processes still exist), you won't need a several-second fixed delay (which would be annoying!), but can still detect the case where something just won't die (say, in fifteen seconds?).
Once a smart guy whose name is Anton Rapp and I discussed what I call "graceful killing" - to kill a process and all its children, even in the light of that while they are all being killed (killing is not atomic), they can produce yet other children.
I have a working algorithm, but Anton came up with a much more simple and elegant idea - I think it's simply brilliant.
It is known that children inherit environment (variables that is) from parent, and they can modify environment. However, it's very unlikely child processes with touch environment variables they don't use.
After such a launch /badly/behaving/process and all its children will have ___MARKER_VAR___ environment variable set to a known value.
When a need arises to call all the children, they are selected among processes by analyzing each process environment (this is possible at least under Linux) and choosing only those processes which have ___MARKER_VAR___ environment variable set to the known value.
Killing should be done in a loop until there are no more processes to kill - again, because killing is not atomic.
It is known that children inherit environment (variables that is) from parent, and they can modify environment. However, it's very unlikely child processes with touch environment variables they don't use.
That's a sharp observation, and a very, very good idea.
In Linux, /proc/pid/environ contains the (exported) environment variables for the process pid as ASCII NUL separated data. Simply checking if \0___MARKER_VAR___= , or if killing the descendants of a specific marked process, \0___MARKER_VAR___=PID\0 , exists in the file is enough. It is something a killer can do very, very fast.
I seem to recall a similar technique by D.J.Bernstein, but using high-numbered descriptors? Ah yes, the fghack utility. The usage was basically the opposite, to let the parent process know when there are no more child processes left. Unless a process goes out of its way to close the extra file descriptors, the descriptor is inherited by each child, and is only closed after the final child process exits. Along those lines, one could use a high-numbered fixed descriptor, say the equivalent of exec 511<>/var/lib/process-tracer/marker, and then simply stat() all /proc/*/fd/511 and kill the processes if st_dev and st_ino match the marker file. This should be even faster than manipulating the environment.
For normal services, a process group is optimal. It is the fastest option, because the kernel handles the targeting details; all you need is the process group ID. However, process groups are trivial to "escape" for even unprivileged processes, since there is no limitation who can fork and call setsid(), or call setpgid() to start themselves a new process group.
A process which executes another process using execve() or execle() often clears out the environment, only populating it with known safe values. A typical one is Apache, via its SuEXEC mechanism. It is also possible for a process to scan through its environ array and unsetenv() those it does not want to keep, but I have not seen that in real life. It is also rare but not impossible for a process to opportunistically close all file descriptors (or all above 2, anyway).
Was there a reason why you and Anton Rapp could not rely on process groups or sessions? I guess the use of screen and similar tools would make process groups ineffective? Or was the graceful killing for potentially hostile processes? (I guess typical ones would be those started by inquisitive students, who want to know exactly what effects a fork bomb has.)
Note: I would not recommend combining the process group killing method with the other methods. If a process escapes its process group, then it might be able to escape to an existing process group, with important processes you do not want to be killed. (It depends which user rights the processes have, mostly.) You could end up killing innocent bystander processes.
...
Was there a reason why you and Anton Rapp could not rely on process groups or sessions?
...
We were just discussing the issue, and he came up with his idea after hearing my algorithm.
My algorithm does parse process tree and deals with non-atomicity. I am a lazy guy who doesn't like chasing moving targets, so the algorithm reflects it.
Then (lazily and gracefully) I kill children (and the father) one by one.
If, however, the process is evil enough to mask SIGSTOP, my algorithm is in trouble. Luckily, it wasn't the case in practical cases I had to deal with - EDA SW.
...
I think there is no 100% bullet-proof solution (because of the atomicity and possible signal masking or environment purging problems), but many pretty reliable practical ones exist.
...
A process which executes another process using execve() or execle() often clears out the environment, only populating it with known safe values. A typical one is Apache, via its SuEXEC mechanism. It is also possible for a process to scan through its environ array and unsetenv() those it does not want to keep, but I have not seen that in real life.
...
If, for example, the processes in question respect TMP/TEMP environment variables, it/they it can be used as marker, i.e. the father process is started with individual temporary directory pointed to by TMP/TEMP.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.