bash scripts which go bump in the night and the system resources who hate them

dehuszar · 07-07-2004, 11:58 AM

I have two bash scripts I have written and they function well -in a fashion, but over time do not operate the way I had hoped.

The basic idea behind these scripts is that our offices intranet cannot have any direct connection to the internet, but certain conveniences like being able to read email, get virus scanner updates, etc. should still be available. So we picked up an A/B ethernet switch and cobbled together a mail server which connects to one or the other network (internet-A/intranet-B). As I mentioned at the beginning, I've devised two scripts; one to send outgoing mail and fetch any new messages (internet), and one to gather new outgoing mail and deliver incoming mail (intranet). I use qmail to relay our messages.

The first script, qflushIN delivers incoming mail and gathers outgoing mail:

#!/bin/bash
flushin=`ping -c 1 10.0.0.1 | grep "1 packets received"`

until [ -n "$flushin" ]
do
echo "no connectivity, I will try again in 2 minutes
sleep 120
done

/var/qmail/bin/qmail-tcpok
qmailctl flush
bash /home/<myusername>/temp/qflushOUT &

exit 0

The second script, qflushOUT then delivers all outgoing mail and checks for incoming:

#!/bin/bash
flushout=`ping -c 1 <my ISP's mailserver IP> | grep "1 packets received"`

until [ -n "$flushout" ]
do
echo "Waiting for internet connectivity. I will try again in 2 minutes"
sleep 120
done

/var/qmail/bin/qmail-tcpok
qmailctl flush

until [ -z "$flushout" ]
do
fetchmail -f /etc/fetchmail.conf
echo "Just checked mail. Will check again in 2 minutes."
sleep 120
done

bash /home/<myusername>/temp/qflushIN &

exit 0

Now what's happening is that the scripts will run great for a couple of turns of the knob. They'll deliver almost immdiately upon the turn of the switch and I can do it all day to my hearts content. But once I leave it running for a day or two, I come back to find that once I turn the switch again, it won't work anymore, or it will leave instances of qflushIN or OUT running in memory while opening new instances. I imagine there's some sort of ulimit-like file whose parameters I need to alter in order to smooth out operation, but I'm fairly stuck at the moment.

Does anyone have any ideas? Is there a more elegant way to do this script that doesn't involve using 2 scripts? I'm also using Daniel Bernstein's daemontools for qmail and I was using it for fetchmail but found it a little too much for me to take on at this particular moment. But anyone who is knowledgable in the "djb way" as it's called who can make some suggestions , would make my day. I'd very much like to go that way eventually as time and budget permits me to do more homework.

Thanks in advance,
Sam

Dark_Helmet · 07-07-2004, 12:14 PM

Well, the scripts are broken if I am interpreting them right (or you didn't post all of them).

In both scripts, you check for an internet connection by ping'ing an address and making sure you get a response. What happens when the ping fails? You enter a loop that says you will try again, but that never happens. The script looks at the contents of a variable to control when the loop should quit, but you never actually change the variable in the loop itself. The condition of the loop will never change, and you'll be stuck in it forever. All the script will do is say "trying again in 2 minutes", wait two minutes, and repeat that it's waiting again.

I'm guessing the problem is that the connection goes down sometime, causing the script to enter the loop. The loops should be something like:

Code:

#!/bin/bash
flushin=`ping -c 1 10.0.0.1 | grep "1 packets received"`

until [ -n "$flushin" ]
do
  echo "no connectivity, I will try again in 2 minutes
  sleep 120
  flushin=`ping -c 1 10.0.0.1 | grep "1 packets received"`
done

with similar change to the second script

<edit>
Also, I noticed that one script calls the other. So, you only need to start one of them once, adn that's it. If you have a cron job set up to launch them, then that's bad. You'll have multiple copies running all the time. If you want to use cron for it, then you would need to get rid of the until loops. If you want to use the until loops, then you ought to get rid of cron...
</edit>

dehuszar · 07-07-2004, 04:30 PM

Well, certainly they aren't working as I'd like, but your suggestion confuses me. (not surprising really, I'm a total bash noob)

To my understanding, the flushin= line is just declaring the variable; it's not actually doing any pinging. It was my (perhaps incorrect) assumption that the $flushin loop would test the varible (i.e. ping 10.0.0.1) once every 2 minutes until it is not null, at which point it will break loop and continue to the next part of the script. While it remained null, it would echo the "couldn't connect" message sleep for 2 minutes and then try again. At least that's what various bash tutorials led me to believe. I don't see how repeating the variable declaration in the loop would do anything (constructive anyway).

Dunno, maybe I'm wrong.

Sam

Quote:

Originally posted by Dark_Helmet
Well, the scripts are broken if I am interpreting them right (or you didn't post all of them).

In both scripts, you check for an internet connection by ping'ing an address and making sure you get a response. What happens when the ping fails? You enter a loop that says you will try again, but that never happens. The script looks at the contents of a variable to control when the loop should quit, but you never actually change the variable in the loop itself. The condition of the loop will never change, and you'll be stuck in it forever. All the script will do is say "trying again in 2 minutes", wait two minutes, and repeat that it's waiting again.

I'm guessing the problem is that the connection goes down sometime, causing the script to enter the loop. The loops should be something like:

Code:

#!/bin/bash flushin=`ping -c 1 10.0.0.1 | grep "1 packets received"` until [ -n "$flushin" ] do echo "no connectivity, I will try again in 2 minutes sleep 120 flushin=`ping -c 1 10.0.0.1 | grep "1 packets received"` done

with similar change to the second script

<edit>
Also, I noticed that one script calls the other. So, you only need to start one of them once, adn that's it. If you have a cron job set up to launch them, then that's bad. You'll have multiple copies running all the time. If you want to use cron for it, then you would need to get rid of the until loops. If you want to use the until loops, then you ought to get rid of cron...
</edit>

keefaz · 07-07-2004, 04:56 PM

You may did a confusion with the bash syntax between `` and $().

Here is my version of your first script :

Code:

flushin=$(ping -c 1 10.0.0.1 | grep "1 received")

until [ -n "$flushin" ]
do
    echo "no connectivity, I will try again in 2 minuts"
    sleep 120
done

Dark_Helmet · 07-07-2004, 05:05 PM

It declares the variable, but does so by assigning a value to it. The backticks used ( ` ` ) tell the script that you want to execute the text inside as a command, and assign take the command's ouput to the variable.

Referencing the variable (like "$flushin" for instance) will not re-execute the command; it just means "use the contents of the variable here". The contents of the variable is the text result from the ping-grep command earlier. I'm absolutely positive on this one

For example:

Code:

#!/bin/bash

loop_count=1
command_var=`echo $loop_count | tee test_output`

while [ $loop_count -lt 4 ]
do
  let loop_count=loop_count+1
  if [ ! $command_var -eq $loop_count ] ; then
    echo command_var and loop_count are out of sync
  fi
done

exit 0

If command_var is executed every time $command_var is referenced, then you should never see the "command_var and loop_count are out of sync" message right? If you run the script, you'll see that message is printed three times. If you check the contents of test_output, you'll see that it contains a single character: 1

The syntax keefaz mentioned is an alias for the pair of backquotes. It'll behave the same way as the original; it's just one other way of saying the same thing.

keefaz · 07-07-2004, 05:08 PM

Yes I have tested it more deeper, you're right ping works once

[edit]
Why use a variable anyway ?

Code:

until ping -c 1 10.0.0.1 | grep "1 received" 1> /dev/null; do
    echo "no connectivity, I will try again in 2 minuts"
    sleep 120
done

This one works for sure

dehuszar · 07-07-2004, 05:23 PM

Cause I'm learning from books and webpages and that's how most of them tell you to do it. I'm sure once I get further on I'll be able to make such decisions without training wheels, but I'm still just trying to get a few things working before I'll actually HAVE some time to do more studying.

BTW, both of your advice has been very helpful. So far it seems to work fine.

Sam

keefaz · 07-07-2004, 05:25 PM

Well I edited the script above and checked that it work well.
I changed the wrong condition :
until $(ping -c 1 10.0.0.1 | grep "1 received")
to :
until ping -c 1 10.0.0.1 | grep "1 received" 1> /dev/null
so it works now, the 1> /dev/null is to no output ping when success

Dark_Helmet · 07-07-2004, 05:40 PM

Keefaz's suggestion of putting the command in the condition is a good one. It guarantees the command will be executed, but might prove troublesome later on. If the script needs to know the success or failure of the ping command more than once, then you'd need to store something into a variable. For instance, say you had a script determine if a command failed, and then needed to interpret the output of the error and take appropriate action. In that case, you would need the contents of the error twice. Depending on the command, you may not be guaranteed to get the error again between runs (like a first ping fails because of net congestion, but the second succeeds because the traffic cleared in the time it took to get to the second ping).

I'm not saying Keefaz is wrong or that there's anything bad about his suggestion. It just depends on what a particular script needs.

keefaz · 07-07-2004, 05:50 PM

Well I think as you, looping every two minuts on a ping command is bad

but I was curious to resolve this sort of loop problem in bash for learning purpose. It is obvious that this sort of script can't work in a production environment.

dehuszar · 07-08-2004, 11:12 AM

Well, the hope is to get this working and then make it elegant. So far it seems to be doing the job. Given the criteria of what the project is designed to do, can you think of a better way to automatically negotiate the switch between networks? (and I intend the tone for that last statement to ring with curiosity and not pomp) The goal is to not require any of the co-workers to have to use Linux to pump mail internally. The switch is hardware, and not operable from any sort of timer, and the intranet CANNOT have any direct internet connection.

I know that may sound a little unreasonable, but it's the bosses orders, and given some of the work we do, it's best not left to chance.

Is there something else you would recommend that might be more stable in the long run? Timed cron schedules might work except that given how the office sometimes gets, it wouldn't get done on time very often.

Anyway, I'd love some input from more seasoned people. How would you approach the project?

Thanks in advance, and thanks for the help already provided.

Sam

keefaz · 07-08-2004, 11:36 AM

I realised that I was a little closed mind by saying the script is not suitable in a production environment. This is because for me, bash scripts are for little task and I always try to short their runtime.
Why don't use cron, say every quarter of an hour to check and send mail, the advantage is to not run a script indefinitely if it fails (it just wait 1/4 hour and come again) You can also use a file in which you put value if mail was updated fine or not, and then a second script will check the value in this file and run accordingly.

dehuszar · 07-08-2004, 11:54 AM

What if I were to set up an 'if' conditional in cron saying, (in cron speak of course) if multiple instances of qflushIN/OUT are running, kill all instances, then if neither qflushIN or qflushOUT are running, ping 10.0.0.1, if it's reachable, run qflushIN, else run qflushOUT? Then I can have that script check up on qflush* 3-4 times a day.

That way if it's running fine, it will be uninterrupted, otherwise, it will clean the slate and start over.

The next thing I need to learn is how to create and set up a detailed log creation, and have it emailed to me, so I can be made aware of any undesirable activity. I assume that O'Reilly's Using Bash will get there eventually, but if you know any good sources I'm all ears.

Thanks again.
Sam

keefaz · 07-08-2004, 12:19 PM

You can add in your qflushIN and qflushOUT script at the top :

Code:

echo $$ > /var/run/$0.pid

and at the bottom :

Code:

rm -f /var/run/$0.pid

$$ : pid of current script
$0 : current script name

So you have a way to know if script is running by (for example with qflushIN) :

Code:

if [ -f /var/run/qflushIN.pid ]; then
    echo "script qflushIN is running, now I will kill it"
    kill $(cat /var/run/qflushIN.pid)
    rm -f /var/run/qflushIN.pid
fi

Maybe with a third script

Dark_Helmet · 07-08-2004, 01:40 PM

Now you've got me thinking about this, and I'm the type that can't rest until I come up with some kind of solution. Curse you both!

This is how I would do things. I'm sure there are untold other ways that make more/less sense and are more/less efficient (my usual disclaimer).

One script - two basic usages:
qflush local [local_ip]
qflush external [mailserver_ip]

Code:

#!/bin/bash

# default ip address for internal ping target
default_internal_ip="10.0.0.1"

# default ip address for external mailserver
# !!!! this needs to be changed to work with your ISP !!!!
default_external_ip="12.34.56.789"

# These define the keywords that tell the script how to operate
local_mode="local"
external_mode="external"

######################################################################
# Function: display_usage
######################################################################
function display_usage
{
  command_base=$(basename ${0})
  cat << EOF
============================================= 
Basic usage statement: ${command_base} (${local_mode}|${external_mode}) [ip_address]
============================================= 
Mode: ${local_mode}
  This mode performs internal network mail maintenance
  The optional ip_address field specifies the local server to contact
    default: ${default_internal_ip}

Mode: ${external_mode}
  This mode contacts an external mail server to send/recevie mail
  The options ip_address field specifies the external mail server to contact
    default: ${default_external_ip}
EOF
}

######################################################################
# Function: execute_local_mode
######################################################################
function execute_local_mode
{
  ping -c 1 ${server_address} | grep "1 packets received" 1>/dev/null 2>/dev/null

  # As the last command in the pipe sequence, grep is responsible for
  # returning exit status.
  #   0 = match found
  #   anything else = no match or error
  # Exit status is stored in $?
  command_result=$?

  if [ ${command_result} -ne 0 ] ; then
    echo "Error: Could not ping ${server_address}."
    echo "  This attempt has failed."
    exit 2
  else
    /var/qmail/bin/qmail-tcpok
    qmailctl flush
  fi
}

######################################################################
# Function: execute_external_mode
######################################################################
function execute_external_mode
{
  ping -c 1 ${server_address} | grep "1 packets received" 1>/dev/null 2>/dev/null

  # As the last command in the pipe sequence, grep is responsible for
  # returning exit status.
  #   0 = match found
  #   anything else = no match or error
  # Exit status is stored in $?
  command_result=$?

  if [ ${command_result} -ne 0 ] ; then
    echo "Error: Could not ping ${server_address}."
    echo "  This attempt has failed."
    exit 2
  else
    /var/qmail/bin/qmail-tcpok
    qmailctl flush
    fetchmail -f /etc/fetchmail.conf
  fi
}

######################################################################
# Start of the main script
######################################################################

# Check that we were given at least a mode of operation
if [ $# -eq 0 ] ; then
  display_usage
  exit 1
fi

# Command line looks to be ok. Pull out the mode of operation from
# the first argument
op_mode=${1}

# Does the second argument exist? Use it if it does. Otherwise,
# use the default value based on the mode chosen
if [ ! -z "${2}" ] ; then
  server_address="${2}"
else
  if [ "${op_mode}" = "${local_mode}" ] ; then
    server_address=${default_internal_ip}
  else
    server_address=${default_external_ip}
  fi
fi

# Basically a big if statement. Execute the actions based on the
# chosen mode of operation.
# If the mode on the command line does not match a known mode,
# print an error and inform the user of the usage.
case ${op_mode} in

  ${local_mode})
    execute_local_mode
    ;;

  ${external_mode})
    execute_external_mode
    ;;

  *)
    echo "Error: Unrecognized mode (${op_mode}). Displaying usage and exiting."
    display_usage
    exit 1
    ;;

esac

exit 0

Ok, you can give that a go if you like, but I make no guarantees... mainly because I don't have your environment to test on. So the script might need a little tweaking. Hopefully it's fairly easy to follow (if not overkill for its purpose).

To use this script, I suggest setting up two cron jobs. One for local and one for external. Something like this in the crontab file:

Code:

SHELL=/bin/bash

0-50/10 * * * * /path/to/script local
5-55/10 * * * * /path/to/script external

That will run the "internal" version every 10 minutes at :00. :10, :20, etc. It will run the "external" version every 10 minutes as well, but at :05, :15, :25, etc. That way, the scripts won't run simultaneously and prevent any possible confusion between them.