[SOLVED] Shell Scripting: Scraping Public IP and Emailing

zer0signal · 07-03-2011, 01:24 PM

Ok, so I wrote a simple script to scrape whatsmyip.org for my public IP address and then email it to me if it has changed. I set this up as a cron job ever 30mins.

the problem is, it works for about 18 hours and then will have an issue where it says "line 33: [: too many arguments"

Which line 33 happens to be my if statement. Now like I said it works fine for about 18 hours, i run the script with 2>&1 to a .out file so i can see what its doing...

this is the output of the .out file, the "xxx" does actually show the ip.."

PHP Code:



xxx.xxx.xxx.xxx 
already in file 
xxx.xxx.xxx.xxx 
already in file 
xxx.xxx.xxx.xxx 
already in file 
xxx.xxx.xxx.xxx 
already in file 
xxx.xxx.xxx.xxx 
already in file 
xxx.xxx.xxx.xxx 
already in file 
 
/opt/public-ip/public-ip.sh: line 33: [: too many arguments 
[<-] 220 mx.google.com ESMTP r33sm4143833qcs.42

here is the code for my script

PHP Code:



 23  
 24  
 25 # Obtain the current public IP Address 
 26  
 27 IP=`curl -s http://www.whatismyip.org` 
 28 echo $IP 
 29  
 30  
 31 #  see if new ip is already in the file 
 32  
 33         if [ $IP = `grep $IP /opt/public-ip/currentip` ];then 
 34                 echo "already in file" 
 35         else 
 36                 echo $IP >> /opt/public-ip/currentip 
 37  
 38 #               ssmtp -vvv no@no.com < /opt/public-ip/currentip 
 39 #               ssmtp -vvv no@no.com < /opt/public-ip/currentip     
 40 #               ssmtp -vvv no@no.com < /opt/public-ip/currentip 
 41         fi 
 42  
 43 exit 0

so it is something with my IF statement.... I don't know what it could be... like i said it works right for 18hrs and then has a hiccup shows that too many arguments and emails me...

could it be my curl statement is returning something that is not matching whats in the currentip file? if so, what kind of logic can i use to weed that out?

Please any suggestions are very welcome...

Thanks!

Tinkster · 07-03-2011, 01:53 PM

I'll hazard a guess and say it (curl) didn't return anything, as
the line above the error (where you echo $IP) is empty.

zer0signal · 07-03-2011, 02:14 PM

So if i did an if statement right after curl like below, and have it exit if its "" blank that should weed out my no returns right?

If i look through the log for a few days, i don't see any alpha or numeric when it fails....

But what if the output is showing 3 spaces? and not just "" I don't think that if statement would work.

PHP Code:



IP=`curl -s http://www.whatismyip.org`
echo $IP 

if [ $IP == "" ];then

Reuti · 07-03-2011, 02:14 PM

Quote:

Originally Posted by zer0signal

Code:

 33         if [ $IP = `grep $IP /opt/public-ip/currentip` ];then

Won’t it return more than one line in case you have:

Code:

$ cat file
1.2.3.1
1.2.3.11
1.2.3.111
$ grep 1.2.3.1 file
1.2.3.1
1.2.3.11
1.2.3.111
$ grep -e "^1.2.3.1$" file
1.2.3.1

I would even replace the comparison inside the if statement by number of found occurrences:

Code:

$ if [ $(grep -c -e "^$IP$" file) -gt 0 ]; then

zer0signal · 07-03-2011, 02:45 PM

@Reuti - I get what you are saying, I did not think of that. I will adjust my code to compensate for multiple close entries, and I do like the idea of going with a greater than count!

Thanks!

grail · 07-03-2011, 07:21 PM

Why not just let if do the work for you:

Code:

if grep -q $IP /opt/public-ip/currentip

Nominal Animal · 07-03-2011, 07:44 PM

Quote:

Originally Posted by zer0signal

Code:

 33         if [ $IP = `grep $IP /opt/public-ip/currentip` ];then

You are missing a space between the semicolon and then.

David the H. · 07-03-2011, 11:48 PM

This might help you.

I wrote a function for a script a while back that queries my external ip. It hits several sites on a random basis, and you can easily add or subtract from the list as you find them. It currently has seven working sites. All you need is an url that sends back the address as a simple text string.

Code:

get_current_ip() {
     local -a ipsite
     local -i xc num i

     ipsite=(
               "http://automation.whatismyip.com/n09230945.asp"
               "http://showip.codebrainz.ca/"
               "http://cfaj.freeshell.org/ipaddr.cgi"
               "http://icanhazip.com/"
               "http://wooledge.org/myip.cgi"
               "http://ifconfig.me/ip"
               "https://secure.informaction.com/ipecho/"
            )

     xc=1
     num=${#ipsite[@]}

     until (( xc == 0 )) ; do
          (( i = RANDOM % num ))
          ip=$( wget -t 1 -T 5 -q -O- ${ipsite[i]} )
          xc=$?
     done

     echo -n "${ip//[^0-9.]}"
}

Edit: note that this is assuming you are using bash. And speaking of which...

1.) $(..) is highly recommended over `..`

2.) Consider using bash's new "[[" test instead of "[".

Nominal Animal · 07-04-2011, 02:31 AM

Quote:

Originally Posted by David the H.

2.) Consider using bash's new "[[" test instead of "[".

I must say I consider that extremely bad advice. I do realise you did say you assume Bash is used, but still.

First, original poster did not specify Bash, and the script name implies a generic POSIX shell (.sh suffix). Second, I believe [[ to be harmful.

[[ is Bash-specific. It is not a POSIX shell feature; you cannot assume it is implemented in anything but Bash. In many distros nowadays /bin/sh is symlinked to dash (which is a good idea, in my opinion, since it is an actual POSIX shell), which not support [[ .
Therefore, you can only rely on it being available if you use Bash explicitly (#!/bin/bash or equivalent).

The bigger reason is that the word-splitting no-escaping feature of [[ is a very close logical cousin of the magic quotes concept in PHP. Over a few years it was found that magic quotes in PHP produces security problems, since it allowed developers to ignore proper quoting and escaping rules. This, in my opionion, was a major reason why PHP was/is perceived as an inherently insecure language. Like [[, magic quotes were heavily recommended for use early on; even being enabled by default for PHP installations at one point.

Magic quotes in PHP are being phased out, both in the language and configuration files. The PHP site documents all of them to be deprecated, and states that the configuration options will definitely be removed in future versions.

I fail to see any significant differences between these two features. Both were intended to ease the rules for script writers, by overriding standard quoting and escaping rules. (Well, magic quotes were marketed as a security feature; perhaps some consider that a big difference.) Therefore, I suspect history will repeat itself, and that teaching users to prefer [[ over [ will lead to significant problems later on -- specifically, in a failure to understand proper quoting and escaping rules, this time in shell scripts. Personally, I'd recommend biting the bullet and learning them early on, and always applying them, even when technically not required.

Pattern matching can just as easily be done using a case statement, but I do like the regular expression matching [[ has.

konsolebox · 07-04-2011, 03:25 AM

Quote:

Originally Posted by zer0signal

Code:

if [ $IP = `grep $IP /opt/public-ip/currentip` ];then

you just have to place them inside quotes to fix it

Code:

if [ "$IP" = "`grep "$IP" /opt/public-ip/currentip`" ]; then

konsolebox · 07-04-2011, 04:08 AM

Quote:

Originally Posted by Nominal Animal

Second, I believe [[ to be harmful.

The bigger reason is that the word-splitting no-escaping feature of [[ is a very close logical cousin of the magic quotes concept in PHP. Over a few years it was found that magic quotes in PHP produces security problems, since it allowed developers to ignore proper quoting and escaping rules. This, in my opionion, was a major reason why PHP was/is perceived as an inherently insecure language. Like [[, magic quotes were heavily recommended for use early on; even being enabled by default for PHP installations at one point.

Magic quotes in PHP are being phased out, both in the language and configuration files. The PHP site documents all of them to be deprecated, and states that the configuration options will definitely be removed in future versions.

Are they really that the similar? Aren't there some difference on how scripts are handled? I also believe it just depends on the version's script or code, and parser or generator of bash.

Quote:

I fail to see any significant differences between these two features.

Lots of good differences but those differences doesn't really matter anymore when you get used to those two and know how to take advantage of them, in proper way that is. What I like with [[ though is speed in parsing and cleaner syntax. e.g. [[ NUMBER -op N || $VAR = "$VAR2" ]].

Quote:

Pattern matching can just as easily be done using a case statement, but I do like the regular expression matching [[ has.

Depends. Sometimes it's a lot easier with [[. But then, parsing with case statements is quicker. Again, it is in how you take advantage of their features.

sh is a universal shell but bash is still the most popular or most distributed shell around and easier (and safer) to code in my opinion.

also, you can't do this properly in sh:

Code:

VALUES='1 2 3 4 * abc[1] 1234?'
for A in $VALUES; do
    echo "$A"
done

to yield:

Code:

1
2
3
4
* <- expect different output
...

when there's a file around. do we have to use sed to fix that?

Code:

while read -d ' '; do echo "$REPLY"; done <<< "$VALUES "
read -a VALUES_A <<< "$VALUES"; IFS=$'\n' eval "echo \"\${VALUES_A[*]}\"

` also has some issues. i already forgot them though.

some scripts also do this:

Code:

NEWLINE="
"
IFS="$NEWLINE" for A in `grep exp BIGFILE.TXT`; do ... ; done

which is very expensive since it allocates all the output of `*` at once with respect to the value of IFS and parse it as one command. It also may cause problems if BIGFILE.TXT contains glob expressions.

Code:

while read A; do ...; done < <(exec grep exp BIGFILE.TXT)

David the H. · 07-04-2011, 08:04 AM

@NominalAnimal:

That's why I deliberately used the word "consider". It all depends on how portable the script needs to be, and what the coder is willing to deal with. The link I gave fully details the positives and negatives of both forms, so it shouldn't be hard to decide which suits you the best. The very last paragraph even says it explicitly:

Quote:

When should the new test command [[ be used, and when the old one [? If portability to the BourneShell is a concern, the old syntax should be used. If on the other hand the script requires BASH or KornShell, the new syntax is much more flexible.

This applies to all shell features, really. If you can be reasonably certain that a script will never executed in a non-bash environment, then there's no real reason to stick to posix-only syntax. I say take full advantage of what your shell has to offer whenever you can, and don't arbitrarily limit yourself to a less convenient subset of features. Posix-compliance mode will still be there for when you really need it.

Bash features and their posix equivalents

Now I have no idea exactly what the "magic quotes" thing in PHP is, but from your description I don't see the same level of worry happening here. They aren't going out of their way to override the need for quoting everywhere, they simply defined the [[ keyword so that it doesn't perform globbing or word-splitting after expansions. A variable or other substitution is always treated as a single element when inside the double-brackets, no matter what the contents. That's all. It's a specific, localized parsing rule that helps to avoid a lot of syntax problems that plague older tests. It doesn't mean you don't still have to be very careful everywhere else.

Don't read me wrong, either. Of course I agree with you that it's important to learn proper quoting. I just doubt very highly that this single exception is going to lead to the quoting/security slippery-slope-disaster you envision.

"$()" is specified by posix, by the way, so as long as you aren't using a truly ancient shell there's no real reason to use anything else.

Nominal Animal · 07-04-2011, 05:37 PM

(I managed to confuse this thread with the thread at hand; therefore the edit. The other thread talks about system-level scripting: startup scripts, service scripts, cron jobs, for which I recommend the POSIX shell, namely dash . For general scripting, I much prefer Bash, and am happy to use bash-specific features. Except for [[ .)

I would prefer not to advocate the use of [[ in bash for novice users. I personally avoid it in all my scripts, because the more traditional [ and case work well for my needs.

I do prefer bash over any other shell for general utility scripts. If you check out the shell scripts I've written here, they almost invariably use explicitly Bash. My use of `..` is an anachronism I'd prefer to get rid of, but the only big-endian machine I have access to runs SunOS 5.10, which has an ancient sh that does not support $(..). There are members here in a similar situation, so I tend to use `..` instead of $(..).

Quote:

Originally Posted by konsolebox

also, you can't do this properly in sh:

Code:

VALUES='1 2 3 4 * abc[1] 1234?'
for A in $VALUES; do
    echo "$A"
done

to yield:
[code]1
2
3
4
* <- expect different output

Sure you can. You simply add a set -f before the for loop to disable pathname expansion. You can add set +f as the first thing in the loop body, if you need pathname expansion in the loop body. Remember to add set -f after the loop to turn pathname expansion back on.

But I'm still not advocating any POSIX shell over bash in general. (I just recommend using dash for startup scripts and such.)

Quote:

Originally Posted by konsolebox

` also has some issues. i already forgot them though.

Absolutely. I don't advocate its use either. $(..) is superior over `..` , and happens to be standard in POSIX shells too. No reason to not use $(..).

Unless you do use an ancient version of sh, like I do.

Quote:

Originally Posted by konsolebox

some scripts also do this:

Code:

NEWLINE="
"
IFS="$NEWLINE" for A in `grep exp BIGFILE.TXT`; do ... ; done

which is very expensive since it allocates all the output of `*` at once with respect to the value of IFS and parse it as one command. It also may cause problems if BIGFILE.TXT contains glob expressions.

Code:

while read A; do ...; done < <(exec grep exp BIGFILE.TXT)

The obvious alternative in POSIX shells,

Code:

grep exp BIGFILE.TXT | while read -r LINE ; do ... ; done

does stream the grep results line by line, but the loop body is a subshell (making it difficult to pass results outside). The -r parameter tells the shell to not interpret backslash escapes.

I personally like the Bash-specific <( list ) a lot. It solves very cleanly the aforementioned result-passing problem. However, the way it does it -- the expression resolves to a file name, with the "file" containing the data -- is perhaps a bit surprising. For example, this command will work just fine:

Code:

dd if=<( for A in one two three ; do echo "$A" ; done )

It is not an actual file, but a path to the read end of the pipe, usually in /proc/self/fd/. For example, command

Code:

echo <( true )

will echo the path to the read end of the pipe instead of accessing the pipe.

Quote:

Originally Posted by David the H.

@NominalAnimal:

That's why I deliberately used the word "consider". It all depends on how portable the script needs to be, and what the coder is willing to deal with.

I misread it as a polite recommendation. I didn't notice the reservation, sorry.

Quote:

Originally Posted by David the H.

The link I gave fully details the positives and negatives of both forms, so it shouldn't be hard to decide which suits you the best. The very last paragraph even says it explicitly [.]

Here I disagree strongly.

The linked page does not even acknowledge the risk in not understanding proper escaping and quoting rules, it just tells users to "do this, and you don't need to worry about it".

(That is exactly what happened with magic quotes in PHP. Magic quotes were a feature that was supposed to help prevent SQL injection attacks. Whatever input a PHP script received from a HTML form, had single quotes ('), double quotes ("), backslashes (\) and NULs (zero byte) automatically escaped with a backslash. Sounds perfectly reasonable, doesn't it? And yet, it ended up causing a lot of grief instead.)

I personally would prefer users learned the quoting and escaping rules first (Bash uses POSIX rules AFAIK), and apply them always, even when not technically required. That solves a huge number of problems at once, from whitespace in file names to non-ASCII support.

In all the cases I'm aware of, creating exceptions to common rules to ease programming, has resulted in more problems that it has solved. I hate seeing an error repeated.

Quote:

Originally Posted by David the H.

They aren't going out of their way to override the need for quoting everywhere, they simply defined the [[ keyword so that it doesn't perform globbing or word-splitting after expansions. A variable or other substitution is always treated as a single element when inside the double-brackets, no matter what the contents. That's all. It's a specific, localized parsing rule that helps to avoid a lot of syntax problems that plague older tests. It doesn't mean you don't still have to be very careful everywhere else.

You may be right, but I'm personally not convinced. I believe pointing new users at [[ will result in even more fragile scripts in the future, since fewer users will consider quoting rules at all.

I deal with a lot of (job submission) scripts written by a lot of different users, mostly in Bash. They're typically very fragile, mostly due to total ignorance of quoting rules. Most work only because all our path components happen to consist of just letters and numbers. The scripts tend to break at the smallest hint of change. Fortunately, most users seem to reuse known working scripts, so script failures are not common enough to be considered a problem; only when any kind of changes occur.

With very little effort, the scripts could be robust. Being aware of [[ first will reduce the incentive to understanding quoting and escaping rules to practically nil. In beginner scripts, I mostly see commands, variable assignments, and the if conditional. Often beginners perceive the need to quote command parameters as specific to that command -- and of course it is not, it is needed for and used by the shell to determine where the parameter boundaries are. Natively, each parameter is a separate string.

Quote:

Originally Posted by David the H.

Don't read me wrong, either. Of course I agree with you that it's important to learn proper quoting. I just doubt very highly that this single exception is going to lead to the quoting/security slippery-slope-disaster you envision.

Sure, and that's ok. I'm often wrong. I did react strongly, because I do find your advice to others helpful, trustworty and quite valuable.

chrism01 · 07-04-2011, 09:02 PM

I like the comments here http://tldp.org/LDP/abs/html/testcon...ml#DBLBRACKETS and certainly [[ ]] is avail in ksh as well as bash; possibly others.
The main rule I recommend for any prod scripts is to always specify the desired shell in the #! line, so that if not found the script simply dies immediately, instead of defaulting to the 'current' shell and possibly doing unexpected things.

grail · 07-04-2011, 10:28 PM

Quote:

Originally Posted by Nominal Animal

The linked page does not even acknowledge the risk in not understanding proper escaping and quoting rules, it just tells users to "do this, and you don't need to worry about it".

I am not sure where you got this sentiment from? As with many others I often like / respect anything you have to write,
but here I was not sure we are necessarily looking at the same link??

The page makes constant reference to the fact that if you wish to remain POSIX compliant that you should use [ over [[, eg.

Quote:

[[ is a new improved version of it, which is a keyword, not a program. This has beneficial effects on the ease of use, as shown below. [[ is understood by KornShell and BASH (e.g. 2.03), but not by the older POSIX or BourneShell.

#And

When should the new test command [[ be used, and when the old one [? If portability to the BourneShell is a concern, the old syntax should be used.

It does also mention that there are subtle 'differences' and identifies that quoting need not be done when using [[,
but I did not read this as "should not be done".

Personally I go with the old adage, use the right tool for the job. So to the OP, I will reiterate what others have said
(aside from the above discussion), if POSIX compliance is a must then I agree that [ should be the choice and many have
offered appropriate solutions to work out your issue, but if not, then letting 'if' do the work or [[ take away
the worry of using an empty variable can be alternatives.