Regex in Linux does not work
Hi All,
Here's the script I was testing. In Linux my shell enviroment is bash. The objective is to test if my input is a whole number like 1, or 52 or 1000 and running the script it will not say anything as expected. Otherwise for any other input that doesn't match the criteria it will say "error: Not a number" and quit. #!/bin/bash re='^[0-9]+$' printf "`echo -n Enter a number or anuthing to test:` \n" read char if ! [[ "$char" =~ "$re" ]]; then echo "error: Not a number !!!" >&2; exit 1 fi FYI the above script works fine in one Linux pc in a bash shell. But the same script when used in another Linux pc which uses Bourne shell (sh) - it does not work :( All the whole numbers and everything else it gives the error message. Can somebody please help shed some light ? thanks, frodobag |
What distro is on that other PC?
|
Ubuntu
|
Quote:
In particular, I think the [[ ... ]] construct doesn't exist in the Bourne shell. What is the error message? EDIT: Your program doesn't work because of the following (from the bash reference guide): Quote:
If char has a value of, say, 'fdg^[0-9]+$lk', the expression [[ $char =~ "$re" ]] will be true. Thus, to check if char is a number, remove the quotes around $re. You can also remove the quotes around $char, since they are not needed inside [[ ... ]]. |
no error message. Just that after I run the script and when I type in 1, 52, or 1000 it erroneously outputs "error: Not a number !!" instead of outputting nothing and quietly exiting as would be expected.
|
Quote:
|
Nope that didn't work. But oddly enough I was trying some variations and with re='^[0-9]+$' and now it works! But thanks all, at least it jogs my thoughts a bit.
|
A side note: I am using the above test script for a more complex script to read a log file. Now since when I type in whole numbers on the keyboard , I guess the regex recognise it as actually numbers so it is correct. But when I extract a value with my more complex script from the log file using grep , cut ,sed, that value , although I see it as a number but is it possible the regex comparison I use above, "sees" it as a text ? and maybe thats why it says "not a number" ?
|
I don't use Bourne shell, but the bash abs guide notes that both the [[ ... ]] extended test and regex match aren't supported in Bourne and are portability issues.
|
Yeah, the original sh shell is less capable than bash (hence the name ;) ).
I'd stick with the latter, unless of course you want to move up to eg Perl (which is red hot on regexes...) |
Quote:
Your bash fragment above says "not a number" because when your $re is surrounded by quotes, you match $char against a mere string, not a regular expression. Since 12345 doesn't contain the string ^[0-9]+$, the test fails. When you remove the quotes, $re is interpreted as a regexp and the test succeeds. You say "nope it doesn't work", but I wonder what it is that doesn't work? I am curious to see your code, your input and your output. Edit: I participate here in parts because I learn. I didn't know about these details of =~ and would like to gain an even deeper understanding. I don't insist for insistance's sake. |
looks like you need to use:
Code:
# instead of "$re" |
There's a couple of good explanations/HOWTOs here
http://www.tldp.org/LDP/abs/html/regexp.html http://www.itworld.com/article/26933...pressions.html - this one has an example of your problem :) |
Thanks guys for the hints.
Here's a snippet of the other script: #!/bin/bash #re='^\d(\d)?(\d)?(\d)?(\d)?$ ' re='^[0-9]+$' char=` ...just grepping some whole numbers from a log file here, like 1234 or 56, etc...` if ! [[ $char =~ $re ]] ; then echo "error: Not a number !!!" else echo " Whole number - good" fi I've tried... if ! [[ "$char" =~ $re ]] ; then ...as well...along with other regex but the output was "error: not a number" even when the char value was something like 1234, when I expect it to say " Whole number - good" instead. Only the char value is let's say a text like THISTEXT or with special characters like 1234-456:7:8 then it should say "error: Not a number". But as of now all these 3 examples it says " error" ,which at this point still doesn't work. I probably need to find an alternative to the =~ operator and [[..]] |
I think I should elaborate char=` ...just grepping some whole numbers from a log file here, like 1234 or 56, etc...`
The way I use to grep might be the problem. I use... tac logfile | grep "(1.)" |grep -E '[0-9]{1,4}' | head -1 So the line in the logfile gets selected for example.... (1.) This is a line 1234 and that's it. Date: 12-12-2015 So it will pick out 1234 correctly but still, with the extra grep -E command, it doesn't help. While the result of grep of 1234 is correct, the result of the comparison operator +~ is not, which always say "error:Not a number". |
Quote:
It would help if your "Not a number" message included the string (wrapped in quotes) that was rejected. |
Quote:
Code:
(1.) This is a line 1234 and that's it. Date: 12-12-2015 Try echoing the $char variable just before entering the if: Code:
... Edit: just noticed that rknichols already suggested the same, oh well... |
A remark in addition to rknichols' comment about grep: To debug your program, make ample use of echo, writing the contents of the variables in question to the screen. You can also switch debugging on and off using set -x and set +x, so that you see what happens in crucial parts of your program.
Had you done that, you would have seen that $char is indeed not a number and not wasted your time writing a post. Edit: Looks like this is at least the 3rd time this suggestion is made... |
Thanks guys,
I would love to post a sample of the log file, but unfortunately its "classified". Hence I gave the example above just to illustrate. All I can say its a bunch of text, numbers and special characters bunched up together. Besides the tac and grep and head commands I actually had to use sed and cut commands to "pluck out" the numbers I wanted. But unfortunately depends on the time it also plucks out text and special characters. Hence the need to compare using regex to see if its numbers - which I want then to print it out. If its text and something else, then discard. oh thanks berndbausch for the set -x suggestion, I was wondering ways to debug. I will give that a try. |
here are the results of the debug
+ char=$'25064\001' + echo $'25064\001' 25064 + [[ 25064 =~ ^[0-9]+$ ]] + echo 'error: Not a number !!!' error: Not a number !!! + exit 1 So the odd thing is what is that \001 doing there ? Otherwise it seems to give out 25064 as a number |
Quote:
|
I did not initialise $char.
Its basically used as something like below to grep the number from a logfile: char=`tac logfile | grep "(1.)" |grep -E '[0-9]{1,4}' | head -1` |
Quote:
|
Quote:
As rnikols said, grep returns the whole line, not just the number. |
you should give us a usable sample to be able to help you to construct a usable solution.
Personally I suggest you to do the following: Code:
char=$(awk ' /^(1\.).*[0-9]{1,4}/ { parse lines, fetch relevant data } END { print that value } ' logfile) From the other hand the error message is correct, you tried to compare a string which contained not only digits, but something else too. awk will also ensure you have a valid number. |
Thanks, I was thinking of a similar sample to the one I am facing for such log lines
(123456:789.123)-{ABCDE.12345=456789:1234:ABCD.FGHJK:1111-CVBN543TGYH10:4564611:12312:5645=POIJKJH} (123456:890.456)-{POIU.12345=456711:567834:ABCD.FGHJK9:2223-YYN543TGYH10:46646PPOUY^^&%:5775=POIJKJH} (123456:990.888)-{POIU.12775=456709:1234:ABCD.FGHJK8:223-YYN543TGUU%10:466PPOUY%^%11:10777975=POIJKJH} I am trying to pick out the numbers after "11:" So In the first line I want the numbers 123 from "11:123" and the second line I want 5678 from "11:5678" and the third I want 107779 from "11:107779" As you can see the "11:" jumps in various positions as the log progresses but as example in the same 3 positions randomly as the logs progresses. Not sure how I can achieve that with just awk. That's why I was using a bunch of tools like tac, sed, cut ,etc to pick out the latest one whenever I run the script. But I'm happy to explore any options. |
You can do that quite easily with sed:
Code:
sed -r -n 's/.*11:([0-9]+).*/\1/p' |
Many thanks rknichols! It worked ! :) I tested many times and worked flawless so far, when the numbers moved from 4 digits to 5 digits and was correct each time. Thanks , now I can proceed with the rest of the script.
|
I thought I'd have a look at what would happen if an '11:' also appeared near the start of the first rec
Code:
(123456:789.123)-{ABCDE.12345=456711:1234:ABCD.FGHJK:1111-CVBN543TGYH10:4564611:12312:5645=POIJKJH} Could you elucidate? Thx |
Greediness,
Pretty dodgy to rely on it tho' ... |
Hey syg00,
how are you ? ;) So you're saying the first '.*' effectively causes it to match the last occurrence? I did a quick couple of tests and that seems to be the case. How would you specify the first match then? (this is worrying; I used to be ok at regexes...) |
Yep - it matches the entire text. Then the regex engine starts working backwards until the next regex element matches.
Rinse, shake, repeat. Of course, regex ain't regex. :p |
True; I need to re-read Friedl's book. Funnily enough I was planning to revise that today, hence the interest.
So (as per my last qn) how would you force it to match 1st occurrence instead? EDIT: oh yeah; just started doing it in Perl : with the front & back wildcards you get the last one; without wildcards you get the first. Still working on the sed version (I always was a bit iffy with sed...) |
Sorry - forgot that q.
You can't in sed. You need non-greedy quantifiers, which last I looked sed doesn't support them. perlre is simplest soln. |
That explains why I'm banging my head on the desk.... its so nice to stop :)
|
what about this?
sed -r -n 's/11:([0-9]+).*/11:\1/;s/.*11://p' |
You can't generalise it for an indeterminate number of occurrences of the required string (in sed)
|
obviously not, but can find first occurence
|
Quote:
|
Quote:
|
Now that would impress even Linus
|
All times are GMT -5. The time now is 06:03 AM. |