Catching SIG of running process with bash

mashiox · 11-16-2009, 01:18 PM

Hello all!

I'm writing a script that will catch when an outside process crashes (SIGHUP, right?) without having to loop into infinity.
With that in mind, I came across the trap utility and thought if could be used to monitor another process other than it's own. But from what I've read, I'm thinking it might be more limited than what I initially believed.

Could someone point me in the right direction?

This is what I have on trap so far:
http://linux.die.net/man/1/trap
http://tldp.org/LDP/Bash-Beginners-G...ect_12_02.html

bigearsbilly · 11-16-2009, 02:00 PM

hmm well.

I've played an awful lot with trap in the past.
I've used it so much that I rarely bother with it now.
it's ok for EXIT.

you can't catch a signal from another process - full stop.
they don't propogate.

the shell trap system is very very weak.
for instance they ain't inherited by functions in the same script.
remember shell scripting has limits.

if you look at waitpid(3) you'll see that there is a bitmask
which shows how the process exited. I'm not sure if this can be
examined in a shell with the $?.
on my system NOHUP produces 129, TERM = 127 so maybe it can.

the perl system allows you to query it.
or write a little C program to run it.

mashiox · 11-16-2009, 03:17 PM

So if I use waitpid, the process I'm watching out for crashes needs to be a child process of the script?

bigearsbilly · 11-16-2009, 05:26 PM

no, waitpid is a C function.

and SIGHUP is not a crash.

what exactly are you trying to achieve?

mashiox · 11-17-2009, 06:27 AM

Here it is:

This script will start by starting another process.
It will grab it's PID using pgrep
Then I want it to wait until the process it started at the beginning has crashed or not.
When it crashes, I want the script to restart the process.
Otherwise, keep waiting for it to crash.

Concise enough? Let me know if I got vague anywhere.

Guttorm · 11-17-2009, 06:39 AM

Hi

It could be done easily without pids, signals or traps. But you will have to figure out what is a crash and what is a normal exit. If the process writes some pid file or something you could maybe just check if it still exists?

Code:

while true
do
   start-process-command &
   wait
   if [ the-process-did-a-clean-exit ]
   then
       break
   fi
done

ntubski · 11-17-2009, 08:25 AM

Why not just:

Code:

while !start-process-command 
do :
done

You said before

Quote:

Originally Posted by mashiox

without having to loop into infinity.

But I can't see why a loop is bad here. If by crash you mean only termination from SIGHUP, maybe what you want here is nohup?

mashiox · 11-17-2009, 07:17 PM

We'll I'd be a fool not to toy around with what I've been given.
As far as loops go, I though if it was constantly checking for a contingency it would run the processor into the ground.

Thinking about it, that was silly to think that would be the case in this situation. But we'll see. I likely need to employ the rule of optimization more rather than trying to get it 100% right the first few time.

Also adderek on unix.com suggested I use fork to make the program a child process, and monitor it that way. Does fork have unintended consequences/issues?

chrism01 · 11-17-2009, 07:33 PM

The processor thrashing issue depends on how soon after a failure you need(!) the program to restart.
If you can manage for a few minutes, then you could use cron to run a watchdog script that checks/restarts the program every n minutes.

catkin · 11-18-2009, 01:27 AM

Quote:

Originally Posted by chrism01

The processor thrashing issue depends on how soon after a failure you need(!) the program to restart.
If you can manage for a few minutes, then you could use cron to run a watchdog script that checks/restarts the program every n minutes.

Or you could put a sleep in the loop.

bigearsbilly · 11-18-2009, 04:18 AM

what is this vital program?

out of interest.

mashiox · 11-18-2009, 10:04 PM

Quote:

Originally Posted by bigearsbilly

what is this vital program?

out of interest.

An RPG game server.
Keep the daemon up, keep the players from squawking too loudly.
The whole concept for this I got from my father explaining to me his wastewater SCADA system he's been working on for the better part of a decade.
While he's not the developer, just the end-user and a wastewater supervisor at a small town's public works, the idea fit for UNIX system control.
And yes, I have heard of webmin. I just love my shell much much more.

But the bigger picture is eventually I'd like to code this in a C language, and go much larger and much more extensible. Checking the system for errors, anticipating and fixing problems, and communication to the sysop on duty. (I got sendmail working, yay!)
But for prototyping, I'll stick with bash. She's been good to me.

ta0kira · 11-19-2009, 10:54 AM

Quote:

Originally Posted by ntubski

Why not just:

Code:

while !start-process-command 
do :
done

You said before

But I can't see why a loop is bad here. If by crash you mean only termination from SIGHUP, maybe what you want here is nohup?

Maybe this, just to be safe:

Code:

while [[ -x "$( which "start-process-command" )" ]] && !start-process-command

Kevin Barry

mashiox · 11-30-2009, 11:13 PM

I did some digging and came up with this.

Code:

tcore=`pgrep process-name`

startcore() {`/path/to/application`}

$tcore
while [ -z "$tcore" ]
dostartcore
done

Gonna play with some more functions...