Bash script overloads CPU on run

mashiox · 12-12-2009, 12:59 AM

I've written two scripts one that starts the process if it hasn't been started already and the other that on start waits for the process to crash and then restarts it.
The one that causes problems is the latter.
It's form is close to exact to it's sister script, bit a little slimmer because I just didn't want this one to do as much.

On execution, this script causes the CPU to spike and hold at 100%.
Since the two scripts are very alike, I don't know where the problem is.
I've used the sleep, and echo "" > /dev/null, command to hold from sucking half my resources, but writing to /dev/null in a loop still sucks down the resources, and the script won't actually do anything if I use sleep.
I've thought about using the wait command, but I'm unsure of how to use it here.

Here's the script.

Code:

#!/bin/bash
tcore=`pgrep AppName`
clown="email@domain.com"
cnt=0
subcnt=0

if [ -n $tcore ] 
then
	until [ -z "$tcore" ]
	do
		if [[ ! -x /path/to/application ]]
		then
			echo "AppName could not recover from a crash" | mail $clown
		elif [[ -x /path/to/application && -z "$tcore" ]]
		then
			echo "AppName recovered from a crash" | mail $clown
			/path/to/application
			let subcnt=0
		fi
			if [ $subcnt = 0 ] && [ -z "$tcore" ]
			then
				echo "AppName has crashed. Attempting to restart..." | mail $clown
				let subcnt=subcnt+1
			fi
	done
fi

gnashley · 12-12-2009, 02:28 AM

You need to put a sleep statement in there so that it doesn't startup and simply loop infinitely as fast as the CPU will let it.

unSpawn · 12-12-2009, 03:01 AM

I think that, unless you're always bent on reinventing the wheel over and over again, searching LQ for similar threads and your distro repos, Freshmeat, Sourceforge, Nongnu, Berlios for similar apps should return usable ones (Monit?, Mon?) as this has been done times and times before. You could reduce it somewhat to something like

Code:

startApp() { [ -e /path/to/core ] && { doSomething; }; /path/to/application --args >/dev/null 2>&1 || startApp; }; startApp &

, use a cronjob if the interval is sufficient, make /sbin/init restart a dying application (but structurally fixing the underlying, recurring problem would be "better") if it isn't, or use inotify (or Auditd) to watch for cores and act on that or, if you want to issue a command periodically, have a look at say Shelldorado's 'periodic': http://www.shelldorado.com/scripts/c...e/periodic.txt.

mashiox · 12-14-2009, 02:44 PM

@gnashley: Yes, I could, however as previously mentioned, adding the sleep statement DOES cut down the resources to an acceptable level, but doesn't allow the script to do anything.

@unSpawn:
Yes, I COULD use Monit, and I'm familiar with it
I could also use webmin,
init.d,
and numerous others, but I'm choosing to write my own because I want to get better at this sort of thing, and I find that I can prototype a project in shell script, then port it over to another language.
In this case I want to use C eventually.
I'm also not writing in-script functions because I want to keep things very flat for now.

I have written prior versions of this script before, and I used crontab.
Not knocking cron, but I want the script to always be monitoring the process in question.

I picked apart your code and some stuff caught my fancy.

Code:

		if [[ ! -x /path/to/app ]]
		then
		if [[ ! -e /path/to/app	]]
		then
			echo "AppName does not exist!" | mail $clown
		elif [[ -e /path/to/app ]]
		then
			echo "AppName is not executable" | mail $clown
		fi
# I'll use this because file tests are sexy and I figure it 
# wastes less time if the machine can diagnosis the problem.

And this:

Code:

>/dev/null 2>&1

I understand it writes to /dev/null, but what's the 2>&1 doing?

tuxdev · 12-14-2009, 03:08 PM

Quote:

@gnashley: Yes, I could, however as previously mentioned, adding the sleep statement DOES cut down the resources to an acceptable level, but doesn't allow the script to do anything.

That's the whole point. You can't have your cake and eat it too.

unSpawn · 12-14-2009, 03:58 PM

[QUOTE=mashiox;3791332]I'm also not writing in-script functions because I want to keep things very flat for now.[/CODE]
OK. Then I suggest you drop the "-e" test from the code because "-x" basically implies "-e" and besides, if you expect to run your restart script (shipped with?) then you could also expect /path/to/app is installed (else why run it anyway?), right? Going one step further in the "mean and lean" dept., if you can expect your restart script to be run and /path/to/app to be previously installed then wouldn't it be fair to assert it's installed octal mode 0[5,7].* anyway? As in ditching the "-x" test as well? I mean basically the core functionality will only need to be one or another form of doing only two tests like 'pgrep /path/to/app || { [ -e core ] && doEmail; /path/to/app >/dev/null 2>&1; }', right?

Quote:

Originally Posted by mashiox

I understand it writes to /dev/null, but what's the 2>&1 doing?

It's "show me no output whatsoever" redirection of stderr to stdout to the bitbucket.

theNbomr · 12-14-2009, 07:05 PM

Quote:

I want the script to always be monitoring the process in question.

You are getting what you want. That's why the CPU usage goes to 100%. If the CPU usage was less than %100, it would not always be monitoring.
--- rod.

mashiox · 12-14-2009, 10:33 PM

Quote:

Originally Posted by theNbomr

You are getting what you want. That's why the CPU usage goes to 100%. If the CPU usage was less than %100, it would not always be monitoring.
--- rod.

That makes sense, but the other script monitors the application's status and until the contingency is true, it idles. But it stays "alive" and does not rape the processor.

Concisely:

The script I'm inquiring about initializes with this:

Code:

if [ -n $tcore ] 
then
	until [ -z "$tcore" ]
	do
	<relops>
	done
fi

The one that works, (gives me what I actually want) initializes with this:

Code:

if [ -z "$tcore" ]
then
        until [ -n "$tcore" ]
        do
	<relops>
	done
fi

I hope I'm giving the picture a bit more clarity, if you were confused.

Quote:

Originally Posted by unSpawn

Quote:

Originally Posted by mashiox

I'm also not writing in-script functions because I want to keep things very flat for now.

OK. Then I suggest you drop the "-e" test from the code because "-x" basically implies "-e" and besides, if you expect to run your restart script (shipped with?) then you could also expect /path/to/app is installed (else why run it anyway?), right? Going one step further in the "mean and lean" dept., if you can expect your restart script to be run and /path/to/app to be previously installed then wouldn't it be fair to assert it's installed octal mode 0[5,7].* anyway? As in ditching the "-x" test as well? I mean basically the core functionality will only need to be one or another form of doing only two tests like 'pgrep /path/to/app || { [ -e core ] && doEmail; /path/to/app >/dev/null 2>&1; }', right?

Yes, simplification is DEFINITELY on the to-do list, if I was already concerning myself with simplification don't you think I'd had made /path/to/app just $apppath by now?

As far as the wall-of-code goes, I'm running with the Rule of Repair among others as articulated so well by Eric S. Raymond.

gnashley · 12-15-2009, 12:58 AM

I still say put a sleep in there -even a half-second will stop the over-racing of the CPU. It will not completely disable your script. Either that or write in *lots* of extra staements checking -e, -x, -s , -L and the kitchen sink so that the thing doesn't run so fast. Even though you don't sleep you are still not *continuously* checking because the routine does take some time to run -even though it may just be mili-seconds. EVen 'sleep .1' will stop it from racing.

smeezekitty · 12-15-2009, 12:42 PM

Quote:

rape the processor

Nice choice of words.
Try sleep 0.2 because its just going to keep racing without anything to slow it down.

konsolebox · 12-19-2009, 03:12 AM

try to trap the SIGCHILD / SIGCHLD signal in bash. This automatically executes when a child process dies.

here we'll use read as a sleeper.

Code:

shopt -s extglob

QUITKEYPATTERN=@(q|Q)
COUNTER=0
TIMEOUT=1
<add more global variables here>

startapp() {
    /path/to/application
    let ++COUNTER
    ...
}

restartapp() {
    # send a message that the application crashed
    startapp
}

checkapp() {
    # do some checks if application is still running
    return <return 0 if application is still running or 1 if not>
}

stopapp() {
    # stop the application
}

trap restartapp SIGCHILD  # SIGCHLD?

startapp

until read KEY -t "$TIMEOUT" && [[ $KEY == $QUITKEYPATTERN ]]; do
    checkapp || restartapp
done

stopapp

when the program crashes, restartapp() shall automatically be executed. To make sure only 1 function runs at a time or only 1 application runs at a time, use flag variables to mark them.

Note, if you're adding more subprocesses, perhaps you can use an array and mark the PIDs of the subprocesses to tell which of them is still running or not.

Code:

N=1234
PIDS[N]=$N   # register the app
[[ -n $PIDS[N] ]] && : the process still runs somehow
unset 'PID[N]' # remove the app as running, the single quotes are needed to prevent the argument from being parsed for pathname expansion
for N in ${PIDS[@]}; do ...  # process all sub applications