ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I've written two scripts one that starts the process if it hasn't been started already and the other that on start waits for the process to crash and then restarts it.
The one that causes problems is the latter.
It's form is close to exact to it's sister script, bit a little slimmer because I just didn't want this one to do as much.
On execution, this script causes the CPU to spike and hold at 100%.
Since the two scripts are very alike, I don't know where the problem is.
I've used the sleep, and echo "" > /dev/null, command to hold from sucking half my resources, but writing to /dev/null in a loop still sucks down the resources, and the script won't actually do anything if I use sleep.
I've thought about using the wait command, but I'm unsure of how to use it here.
Here's the script.
Code:
#!/bin/bash
tcore=`pgrep AppName`
clown="email@domain.com"
cnt=0
subcnt=0
if [ -n $tcore ]
then
until [ -z "$tcore" ]
do
if [[ ! -x /path/to/application ]]
then
echo "AppName could not recover from a crash" | mail $clown
elif [[ -x /path/to/application && -z "$tcore" ]]
then
echo "AppName recovered from a crash" | mail $clown
/path/to/application
let subcnt=0
fi
if [ $subcnt = 0 ] && [ -z "$tcore" ]
then
echo "AppName has crashed. Attempting to restart..." | mail $clown
let subcnt=subcnt+1
fi
done
fi
I think that, unless you're always bent on reinventing the wheel over and over again, searching LQ for similar threads and your distro repos, Freshmeat, Sourceforge, Nongnu, Berlios for similar apps should return usable ones (Monit?, Mon?) as this has been done times and times before. You could reduce it somewhat to something like
, use a cronjob if the interval is sufficient, make /sbin/init restart a dying application (but structurally fixing the underlying, recurring problem would be "better") if it isn't, or use inotify (or Auditd) to watch for cores and act on that or, if you want to issue a command periodically, have a look at say Shelldorado's 'periodic': http://www.shelldorado.com/scripts/c...e/periodic.txt.
@gnashley: Yes, I could, however as previously mentioned, adding the sleep statement DOES cut down the resources to an acceptable level, but doesn't allow the script to do anything.
@unSpawn:
Yes, I COULD use Monit, and I'm familiar with it
I could also use webmin,
init.d,
and numerous others, but I'm choosing to write my own because I want to get better at this sort of thing, and I find that I can prototype a project in shell script, then port it over to another language.
In this case I want to use C eventually.
I'm also not writing in-script functions because I want to keep things very flat for now.
I have written prior versions of this script before, and I used crontab.
Not knocking cron, but I want the script to always be monitoring the process in question.
I picked apart your code and some stuff caught my fancy.
Code:
if [[ ! -x /path/to/app ]]
then
if [[ ! -e /path/to/app ]]
then
echo "AppName does not exist!" | mail $clown
elif [[ -e /path/to/app ]]
then
echo "AppName is not executable" | mail $clown
fi
# I'll use this because file tests are sexy and I figure it
# wastes less time if the machine can diagnosis the problem.
And this:
Code:
>/dev/null 2>&1
I understand it writes to /dev/null, but what's the 2>&1 doing?
@gnashley: Yes, I could, however as previously mentioned, adding the sleep statement DOES cut down the resources to an acceptable level, but doesn't allow the script to do anything.
That's the whole point. You can't have your cake and eat it too.
[QUOTE=mashiox;3791332]I'm also not writing in-script functions because I want to keep things very flat for now.[/CODE]
OK. Then I suggest you drop the "-e" test from the code because "-x" basically implies "-e" and besides, if you expect to run your restart script (shipped with?) then you could also expect /path/to/app is installed (else why run it anyway?), right? Going one step further in the "mean and lean" dept., if you can expect your restart script to be run and /path/to/app to be previously installed then wouldn't it be fair to assert it's installed octal mode 0[5,7].* anyway? As in ditching the "-x" test as well? I mean basically the core functionality will only need to be one or another form of doing only two tests like 'pgrep /path/to/app || { [ -e core ] && doEmail; /path/to/app >/dev/null 2>&1; }', right?
Quote:
Originally Posted by mashiox
I understand it writes to /dev/null, but what's the 2>&1 doing?
It's "show me no output whatsoever" redirection of stderr to stdout to the bitbucket.
You are getting what you want. That's why the CPU usage goes to 100%. If the CPU usage was less than %100, it would not always be monitoring.
--- rod.
That makes sense, but the other script monitors the application's status and until the contingency is true, it idles. But it stays "alive" and does not rape the processor.
Concisely:
The script I'm inquiring about initializes with this:
Code:
if [ -n $tcore ]
then
until [ -z "$tcore" ]
do
<relops>
done
fi
The one that works, (gives me what I actually want) initializes with this:
Code:
if [ -z "$tcore" ]
then
until [ -n "$tcore" ]
do
<relops>
done
fi
I hope I'm giving the picture a bit more clarity, if you were confused.
Quote:
Originally Posted by unSpawn
Quote:
Originally Posted by mashiox
I'm also not writing in-script functions because I want to keep things very flat for now.
OK. Then I suggest you drop the "-e" test from the code because "-x" basically implies "-e" and besides, if you expect to run your restart script (shipped with?) then you could also expect /path/to/app is installed (else why run it anyway?), right? Going one step further in the "mean and lean" dept., if you can expect your restart script to be run and /path/to/app to be previously installed then wouldn't it be fair to assert it's installed octal mode 0[5,7].* anyway? As in ditching the "-x" test as well? I mean basically the core functionality will only need to be one or another form of doing only two tests like 'pgrep /path/to/app || { [ -e core ] && doEmail; /path/to/app >/dev/null 2>&1; }', right?
Yes, simplification is DEFINITELY on the to-do list, if I was already concerning myself with simplification don't you think I'd had made /path/to/app just $apppath by now?
As far as the wall-of-code goes, I'm running with the Rule of Repair among others as articulated so well by Eric S. Raymond.
I still say put a sleep in there -even a half-second will stop the over-racing of the CPU. It will not completely disable your script. Either that or write in *lots* of extra staements checking -e, -x, -s , -L and the kitchen sink so that the thing doesn't run so fast. Even though you don't sleep you are still not *continuously* checking because the routine does take some time to run -even though it may just be mili-seconds. EVen 'sleep .1' will stop it from racing.
try to trap the SIGCHILD / SIGCHLD signal in bash. This automatically executes when a child process dies.
here we'll use read as a sleeper.
Code:
shopt -s extglob
QUITKEYPATTERN=@(q|Q)
COUNTER=0
TIMEOUT=1
<add more global variables here>
startapp() {
/path/to/application
let ++COUNTER
...
}
restartapp() {
# send a message that the application crashed
startapp
}
checkapp() {
# do some checks if application is still running
return <return 0 if application is still running or 1 if not>
}
stopapp() {
# stop the application
}
trap restartapp SIGCHILD # SIGCHLD?
startapp
until read KEY -t "$TIMEOUT" && [[ $KEY == $QUITKEYPATTERN ]]; do
checkapp || restartapp
done
stopapp
when the program crashes, restartapp() shall automatically be executed. To make sure only 1 function runs at a time or only 1 application runs at a time, use flag variables to mark them.
Note, if you're adding more subprocesses, perhaps you can use an array and mark the PIDs of the subprocesses to tell which of them is still running or not.
Code:
N=1234
PIDS[N]=$N # register the app
[[ -n $PIDS[N] ]] && : the process still runs somehow
unset 'PID[N]' # remove the app as running, the single quotes are needed to prevent the argument from being parsed for pathname expansion
for N in ${PIDS[@]}; do ... # process all sub applications
Last edited by konsolebox; 12-19-2009 at 04:51 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.