LinuxQuestions.org - exact match in awk

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - exact match in awk (https://www.linuxquestions.org/questions/programming-9/exact-match-in-awk-903799/)

I recall reading somewhere that when you use a variable in a test, the contents of it are treated as a regex, meaning you should store the entire pattern in the variable and leave the /../ brackets off. I can't locate any clear statement for it in the gawk manual, but it does appear to be true in testing:

Code:

$ x=78



$ awk -v y="^$x" '$0 ~ y { print }' file.txt

78 foo



$ awk -v y="^$x" 'match( $0 , y ) { print }' file.txt

78 foo

Note though that variables can't be used alone as matching patterns, you have to use a full syntax expression of some kind. This fails to match anything:

Code:

$ awk -v y="^$x" 'y { print }' file.txt

Edit: When using \y and similar operators in these patterns, you'll have an added problem of the shell condensing backslashes. You need to use three backslashes inside double quotes to ensure that one will remain in the final regex.

Code:

$ awk -v y="\\\y$x\\\y" '$0 ~ y {print}' file.txt

78 foo

100 foo 78

50 foo 78 bar

Edit2: I just noticed the comment mentioning using mawk. Unfortunately it appears that mawk doesn't support the \y flag. It looks ugly, but how about a regex that tests for the value if at the beginning or end, or if surrounded by spaces? This works with mawk on my machine:

Code:

$ mawk -v y="(^$x|$x$|[ \t]$x[ \t])" '$0 ~ y { print }' file.txt

78 foo

100 foo 78

50 foo 78 bar

Ok. This is a mawk problem, since word boundaries are a GNU awk (gawk) extension. You can try something different, like:

Code:

awk '/[^0-9]'$x'[^0-9]/' FILE

This will exclude any pattern with digits before and digits after $x. You can extend the character list to exclude also alphabetic characters a-z and A-Z. This can be accomplished more concisely using the [:alnum:] character class, e.g.

Code:

awk '/[^[:alnum:]]'$x'[^[:alnum:]]/' FILE

Well the first suggestion I would have is to remove mawk and install gawk. It sounds harsh but mawk has a number of limitations that i have run into (not really sure why it was created actually).

All of colucix's suggestions are good and can be used with a variable and computed regex as well (at least I hope so)

I'm not getting any output with character classes in mawk. The same patterns do work in gawk and nawk, so it looks like mawk just doesn't support them either.

Quote:

Originally Posted by David the H. (Post 4476209)

I'm not getting any output with character classes in mawk. The same patterns do work in gawk and nawk, so it looks like mawk just doesn't support them either.

It works for me, either with mawk 1.3.4 both on OpenSuSE and CentOS. I cannot test on Ubuntu, anyway.

Code:

$ mawk '/[^[:alnum:]]'$x'[^[:alnum:]]/' testfile

  tP      75 P4  76 P4 1    77 P4 2    78 P4 3    89 P422  90 P42 1 2

Hmm. Version listed as 1.3.3 here. The exact same command and data isn't giving me anything. Again, nawk and gawk do just fine.

The release page does say that some unspecified "improvements" to the regex engine were added recently.

http://freshmeat.net/projects/mawk/releases

Debian's always behind the times. :(