[SOLVED] Regex in Linux does not work

rknichols · 11-10-2015, 10:12 PM

Quote:

Originally Posted by frodobag

grep "(1.)" |grep -E '[0-9]{1,4}'

That grep is returning the whole line that contains the match, not just the part that matches. Maybe you have a line that looks like it contains just a number, but any trailing white space will be included. You need to include the "-o" (--only-matching) option to ensure that just the part of the line that matches is returned.

It would help if your "Not a number" message included the string (wrapped in quotes) that was rejected.

Diantre · 11-10-2015, 10:33 PM

Quote:

Originally Posted by frodobag

The way I use to grep might be the problem.

I use... tac logfile | grep "(1.)" |grep -E '[0-9]{1,4}' | head -1

So the line in the logfile gets selected for example.... (1.) This is a line 1234 and that's it. Date: 12-12-2015

Could you post a sample of your log file? I agree that the grep line could be causing the problems. If your input line is

Code:

(1.) This is a line 1234 and that's it. Date: 12-12-2015

and you pipe it to that grep command, you'll get the whole line, not only the integers.

Try echoing the $char variable just before entering the if:

Code:

...
echo "char: $char"
if ! [[ "$char" =~ $re ]] ; then
...

This way you can verify what exactly is in $char. Another debugging method is to use the "set -x" command at the beginning of the script.

Edit: just noticed that rknichols already suggested the same, oh well...

berndbausch · 11-10-2015, 10:44 PM

A remark in addition to rknichols' comment about grep: To debug your program, make ample use of echo, writing the contents of the variables in question to the screen. You can also switch debugging on and off using set -x and set +x, so that you see what happens in crucial parts of your program.

Had you done that, you would have seen that $char is indeed not a number and not wasted your time writing a post.

Edit: Looks like this is at least the 3rd time this suggestion is made...

frodobag · 11-10-2015, 10:59 PM

Thanks guys,

I would love to post a sample of the log file, but unfortunately its "classified". Hence I gave the example above just to illustrate. All I can say its a bunch of text, numbers and special characters bunched up together. Besides the tac and grep and head commands I actually had to use sed and cut commands to "pluck out" the numbers I wanted. But unfortunately depends on the time it also plucks out text and special characters. Hence the need to compare using regex to see if its numbers - which I want then to print it out. If its text and something else, then discard.

oh thanks berndbausch for the set -x suggestion, I was wondering ways to debug. I will give that a try.

frodobag · 11-10-2015, 11:15 PM

here are the results of the debug

+ char=$'25064\001'
+ echo $'25064\001'
25064
+ [[ 25064 =~ ^[0-9]+$ ]]
+ echo 'error: Not a number !!!'
error: Not a number !!!
+ exit 1

So the odd thing is what is that \001 doing there ?
Otherwise it seems to give out 25064 as a number

berndbausch · 11-10-2015, 11:33 PM

Quote:

Originally Posted by frodobag

here are the results of the debug

+ char=$'25064\001'
+ echo $'25064\001'
25064
+ [[ 25064 =~ ^[0-9]+$ ]]
+ echo 'error: Not a number !!!'
error: Not a number !!!
+ exit 1

So the odd thing is what is that \001 doing there ?
Otherwise it seems to give out 25064 as a number

How did you initialize $char?

frodobag · 11-10-2015, 11:40 PM

I did not initialise $char.
Its basically used as something like below to grep the number from a logfile:
char=`tac logfile | grep "(1.)" |grep -E '[0-9]{1,4}' | head -1`

Diantre · 11-10-2015, 11:58 PM

Quote:

Originally Posted by frodobag

So the odd thing is what is that \001 doing there ?
Otherwise it seems to give out 25064 as a number

"\001" is octal 1, ascii character #1, SOH (start of heading). Perhaps it's in your log file?

berndbausch · 11-11-2015, 12:36 AM

Quote:

Originally Posted by frodobag

I did not initialise $char.
Its basically used as something like below to grep the number from a logfile:
char=`tac logfile | grep "(1.)" |grep -E '[0-9]{1,4}' | head -1`

That's initialization.

As rnikols said, grep returns the whole line, not just the number.

pan64 · 11-11-2015, 02:13 AM

you should give us a usable sample to be able to help you to construct a usable solution.
Personally I suggest you to do the following:

Code:

char=$(awk ' /^(1\.).*[0-9]{1,4}/ { parse lines, fetch relevant data } END { print that value } ' logfile)

no sed, no grep, not tac, no head and cut and a lot of different tricks, just a single awk (or perl/python/whatever)

From the other hand the error message is correct, you tried to compare a string which contained not only digits, but something else too.
awk will also ensure you have a valid number.

frodobag · 11-11-2015, 05:42 PM

Thanks, I was thinking of a similar sample to the one I am facing for such log lines

(123456:789.123)-{ABCDE.12345=456789:1234:ABCD.FGHJK:1111-CVBN543TGYH10:4564611:12312:5645=POIJKJH}

(123456:890.456)-{POIU.12345=456711:567834:ABCD.FGHJK9:2223-YYN543TGYH10:46646PPOUY^^&%:5775=POIJKJH}

(123456:990.888)-{POIU.12775=456709:1234:ABCD.FGHJK8:223-YYN543TGUU%10:466PPOUY%^%11:10777975=POIJKJH}

I am trying to pick out the numbers after "11:" So In the first line I want the numbers 123 from "11:123" and the second line I want 5678 from "11:5678" and the third I want 107779 from "11:107779" As you can see the "11:" jumps in various positions as the log progresses but as example in the same 3 positions randomly as the logs progresses. Not sure how I can achieve that with just awk. That's why I was using a bunch of tools like tac, sed, cut ,etc to pick out the latest one whenever I run the script. But I'm happy to explore any options.

rknichols · 11-11-2015, 06:52 PM

You can do that quite easily with sed:

Code:

sed -r -n 's/.*11:([0-9]+).*/\1/p'

The "-r" option says to use extended regular expressions. The "-n" inhibits the default printing of all lines. The expression looks for lines that contain "11:" followed by a string of one or more digits and possibly followed by more characters, replaces the entire line with just that string of digits from the parenthesized sub-expression, and prints the result.

frodobag · 11-11-2015, 08:33 PM

Many thanks rknichols! It worked !

I tested many times and worked flawless so far, when the numbers moved from 4 digits to 5 digits and was correct each time. Thanks , now I can proceed with the rest of the script.

chrism01 · 11-11-2015, 08:50 PM

I thought I'd have a look at what would happen if an '11:' also appeared near the start of the first rec

Code:

(123456:789.123)-{ABCDE.12345=456711:1234:ABCD.FGHJK:1111-CVBN543TGYH10:4564611:12312:5645=POIJKJH}

but when I ran your sed it still picks out the 2nd occurrence. I can't seem to figure out why

Could you elucidate?
Thx

syg00 · 11-11-2015, 09:31 PM

Greediness,
Pretty dodgy to rely on it tho' ...