bash scripts which go bump in the night and the system resources who hate them
I have two bash scripts I have written and they function well -in a fashion, but over time do not operate the way I had hoped.
The basic idea behind these scripts is that our offices intranet cannot have any direct connection to the internet, but certain conveniences like being able to read email, get virus scanner updates, etc. should still be available. So we picked up an A/B ethernet switch and cobbled together a mail server which connects to one or the other network (internet-A/intranet-B). As I mentioned at the beginning, I've devised two scripts; one to send outgoing mail and fetch any new messages (internet), and one to gather new outgoing mail and deliver incoming mail (intranet). I use qmail to relay our messages. The first script, qflushIN delivers incoming mail and gathers outgoing mail: #!/bin/bash flushin=`ping -c 1 10.0.0.1 | grep "1 packets received"` until [ -n "$flushin" ] do echo "no connectivity, I will try again in 2 minutes sleep 120 done /var/qmail/bin/qmail-tcpok qmailctl flush bash /home/<myusername>/temp/qflushOUT & exit 0 The second script, qflushOUT then delivers all outgoing mail and checks for incoming: #!/bin/bash flushout=`ping -c 1 <my ISP's mailserver IP> | grep "1 packets received"` until [ -n "$flushout" ] do echo "Waiting for internet connectivity. I will try again in 2 minutes" sleep 120 done /var/qmail/bin/qmail-tcpok qmailctl flush until [ -z "$flushout" ] do fetchmail -f /etc/fetchmail.conf echo "Just checked mail. Will check again in 2 minutes." sleep 120 done bash /home/<myusername>/temp/qflushIN & exit 0 Now what's happening is that the scripts will run great for a couple of turns of the knob. They'll deliver almost immdiately upon the turn of the switch and I can do it all day to my hearts content. But once I leave it running for a day or two, I come back to find that once I turn the switch again, it won't work anymore, or it will leave instances of qflushIN or OUT running in memory while opening new instances. I imagine there's some sort of ulimit-like file whose parameters I need to alter in order to smooth out operation, but I'm fairly stuck at the moment. Does anyone have any ideas? Is there a more elegant way to do this script that doesn't involve using 2 scripts? I'm also using Daniel Bernstein's daemontools for qmail and I was using it for fetchmail but found it a little too much for me to take on at this particular moment. But anyone who is knowledgable in the "djb way" as it's called who can make some suggestions , would make my day. I'd very much like to go that way eventually as time and budget permits me to do more homework. Thanks in advance, Sam |
Well, the scripts are broken if I am interpreting them right (or you didn't post all of them).
In both scripts, you check for an internet connection by ping'ing an address and making sure you get a response. What happens when the ping fails? You enter a loop that says you will try again, but that never happens. The script looks at the contents of a variable to control when the loop should quit, but you never actually change the variable in the loop itself. The condition of the loop will never change, and you'll be stuck in it forever. All the script will do is say "trying again in 2 minutes", wait two minutes, and repeat that it's waiting again. I'm guessing the problem is that the connection goes down sometime, causing the script to enter the loop. The loops should be something like: Code:
#!/bin/bash <edit> Also, I noticed that one script calls the other. So, you only need to start one of them once, adn that's it. If you have a cron job set up to launch them, then that's bad. You'll have multiple copies running all the time. If you want to use cron for it, then you would need to get rid of the until loops. If you want to use the until loops, then you ought to get rid of cron... </edit> |
Well, certainly they aren't working as I'd like, but your suggestion confuses me. (not surprising really, I'm a total bash noob)
To my understanding, the flushin= line is just declaring the variable; it's not actually doing any pinging. It was my (perhaps incorrect) assumption that the $flushin loop would test the varible (i.e. ping 10.0.0.1) once every 2 minutes until it is not null, at which point it will break loop and continue to the next part of the script. While it remained null, it would echo the "couldn't connect" message sleep for 2 minutes and then try again. At least that's what various bash tutorials led me to believe. I don't see how repeating the variable declaration in the loop would do anything (constructive anyway). Dunno, maybe I'm wrong. Sam Quote:
|
You may did a confusion with the bash syntax between `` and $().
Here is my version of your first script : Code:
flushin=$(ping -c 1 10.0.0.1 | grep "1 received") |
It declares the variable, but does so by assigning a value to it. The backticks used ( ` ` ) tell the script that you want to execute the text inside as a command, and assign take the command's ouput to the variable.
Referencing the variable (like "$flushin" for instance) will not re-execute the command; it just means "use the contents of the variable here". The contents of the variable is the text result from the ping-grep command earlier. I'm absolutely positive on this one :) For example: Code:
#!/bin/bash The syntax keefaz mentioned is an alias for the pair of backquotes. It'll behave the same way as the original; it's just one other way of saying the same thing. |
Yes I have tested it more deeper, you're right ping works once ;)
[edit] Why use a variable anyway ? Code:
until ping -c 1 10.0.0.1 | grep "1 received" 1> /dev/null; do |
Cause I'm learning from books and webpages and that's how most of them tell you to do it. I'm sure once I get further on I'll be able to make such decisions without training wheels, but I'm still just trying to get a few things working before I'll actually HAVE some time to do more studying.
BTW, both of your advice has been very helpful. So far it seems to work fine. Sam |
Well I edited the script above and checked that it work well.
I changed the wrong condition : until $(ping -c 1 10.0.0.1 | grep "1 received") to : until ping -c 1 10.0.0.1 | grep "1 received" 1> /dev/null so it works now, the 1> /dev/null is to no output ping when success |
Keefaz's suggestion of putting the command in the condition is a good one. It guarantees the command will be executed, but might prove troublesome later on. If the script needs to know the success or failure of the ping command more than once, then you'd need to store something into a variable. For instance, say you had a script determine if a command failed, and then needed to interpret the output of the error and take appropriate action. In that case, you would need the contents of the error twice. Depending on the command, you may not be guaranteed to get the error again between runs (like a first ping fails because of net congestion, but the second succeeds because the traffic cleared in the time it took to get to the second ping).
I'm not saying Keefaz is wrong or that there's anything bad about his suggestion. It just depends on what a particular script needs. |
Well I think as you, looping every two minuts on a ping command is bad ;) but I was curious to resolve this sort of loop problem in bash for learning purpose. It is obvious that this sort of script can't work in a production environment.
|
Well, the hope is to get this working and then make it elegant. So far it seems to be doing the job. Given the criteria of what the project is designed to do, can you think of a better way to automatically negotiate the switch between networks? (and I intend the tone for that last statement to ring with curiosity and not pomp) The goal is to not require any of the co-workers to have to use Linux to pump mail internally. The switch is hardware, and not operable from any sort of timer, and the intranet CANNOT have any direct internet connection.
I know that may sound a little unreasonable, but it's the bosses orders, and given some of the work we do, it's best not left to chance. Is there something else you would recommend that might be more stable in the long run? Timed cron schedules might work except that given how the office sometimes gets, it wouldn't get done on time very often. Anyway, I'd love some input from more seasoned people. How would you approach the project? Thanks in advance, and thanks for the help already provided. Sam |
I realised that I was a little closed mind by saying the script is not suitable in a production environment. This is because for me, bash scripts are for little task and I always try to short their runtime.
Why don't use cron, say every quarter of an hour to check and send mail, the advantage is to not run a script indefinitely if it fails (it just wait 1/4 hour and come again) You can also use a file in which you put value if mail was updated fine or not, and then a second script will check the value in this file and run accordingly. |
What if I were to set up an 'if' conditional in cron saying, (in cron speak of course) if multiple instances of qflushIN/OUT are running, kill all instances, then if neither qflushIN or qflushOUT are running, ping 10.0.0.1, if it's reachable, run qflushIN, else run qflushOUT? Then I can have that script check up on qflush* 3-4 times a day.
That way if it's running fine, it will be uninterrupted, otherwise, it will clean the slate and start over. The next thing I need to learn is how to create and set up a detailed log creation, and have it emailed to me, so I can be made aware of any undesirable activity. I assume that O'Reilly's Using Bash will get there eventually, but if you know any good sources I'm all ears. Thanks again. Sam |
You can add in your qflushIN and qflushOUT script at the top :
Code:
echo $$ > /var/run/$0.pid Code:
rm -f /var/run/$0.pid $0 : current script name So you have a way to know if script is running by (for example with qflushIN) : Code:
if [ -f /var/run/qflushIN.pid ]; then |
Now you've got me thinking about this, and I'm the type that can't rest until I come up with some kind of solution. Curse you both! :)
This is how I would do things. I'm sure there are untold other ways that make more/less sense and are more/less efficient (my usual disclaimer). One script - two basic usages: qflush local [local_ip] qflush external [mailserver_ip] Code:
#!/bin/bash To use this script, I suggest setting up two cron jobs. One for local and one for external. Something like this in the crontab file: Code:
SHELL=/bin/bash |
All times are GMT -5. The time now is 05:28 PM. |