LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   exact match in awk (https://www.linuxquestions.org/questions/programming-9/exact-match-in-awk-903799/)

David the H. 09-19-2011 10:00 AM

I recall reading somewhere that when you use a variable in a test, the contents of it are treated as a regex, meaning you should store the entire pattern in the variable and leave the /../ brackets off. I can't locate any clear statement for it in the gawk manual, but it does appear to be true in testing:

Code:

$ x=78

$ awk -v y="^$x" '$0 ~ y { print }' file.txt
78 foo

$ awk -v y="^$x" 'match( $0 , y ) { print }' file.txt
78 foo

Note though that variables can't be used alone as matching patterns, you have to use a full syntax expression of some kind. This fails to match anything:

Code:

$ awk -v y="^$x" 'y { print }' file.txt
Edit: When using \y and similar operators in these patterns, you'll have an added problem of the shell condensing backslashes. You need to use three backslashes inside double quotes to ensure that one will remain in the final regex.
Code:

$ awk -v y="\\\y$x\\\y" '$0 ~ y {print}' file.txt
78 foo
100 foo 78
50 foo 78 bar

Edit2: I just noticed the comment mentioning using mawk. Unfortunately it appears that mawk doesn't support the \y flag. It looks ugly, but how about a regex that tests for the value if at the beginning or end, or if surrounded by spaces? This works with mawk on my machine:
Code:

$ mawk -v y="(^$x|$x$|[ \t]$x[ \t])" '$0 ~ y { print }' file.txt
78 foo
100 foo 78
50 foo 78 bar


colucix 09-19-2011 10:13 AM

Ok. This is a mawk problem, since word boundaries are a GNU awk (gawk) extension. You can try something different, like:
Code:

awk '/[^0-9]'$x'[^0-9]/' FILE
This will exclude any pattern with digits before and digits after $x. You can extend the character list to exclude also alphabetic characters a-z and A-Z. This can be accomplished more concisely using the [:alnum:] character class, e.g.
Code:

awk '/[^[:alnum:]]'$x'[^[:alnum:]]/' FILE

grail 09-19-2011 11:18 AM

Well the first suggestion I would have is to remove mawk and install gawk. It sounds harsh but mawk has a number of limitations that i have run into (not really sure why it was created actually).

All of colucix's suggestions are good and can be used with a variable and computed regex as well (at least I hope so)

David the H. 09-19-2011 11:28 AM

I'm not getting any output with character classes in mawk. The same patterns do work in gawk and nawk, so it looks like mawk just doesn't support them either.

colucix 09-19-2011 12:16 PM

Quote:

Originally Posted by David the H. (Post 4476209)
I'm not getting any output with character classes in mawk. The same patterns do work in gawk and nawk, so it looks like mawk just doesn't support them either.

It works for me, either with mawk 1.3.4 both on OpenSuSE and CentOS. I cannot test on Ubuntu, anyway.
Code:

$ mawk '/[^[:alnum:]]'$x'[^[:alnum:]]/' testfile
  tP      75 P4  76 P4 1    77 P4 2    78 P4 3    89 P422  90 P42 1 2


David the H. 09-19-2011 01:31 PM

Hmm. Version listed as 1.3.3 here. The exact same command and data isn't giving me anything. Again, nawk and gawk do just fine.

The release page does say that some unspecified "improvements" to the regex engine were added recently.

http://freshmeat.net/projects/mawk/releases

Debian's always behind the times. :(


All times are GMT -5. The time now is 01:00 AM.