exact match in awk
Dear Experts,
I have a file including number "78" and "178" at different unknown positions in the text. I want to match the line including "78" and do something to strings in that line. When I use Code:
awk 'match($0, "78"){action}'FILENAME I should not use pattern Code:
$Filed == "78" So, how can I match exactly "78" but not "178" with "match"? Any help would be greatly appreciated. Thanks a lot for your time! |
You can try word boundaries. In awk they are specified by the \y operator and you need to use a regular expression (enclosed in slashes) instead of a string constant:
Code:
awk 'match($0, /\y78\y/)' file |
Quote:
|
OK, I find a solution myself like:
Code:
awk 'match($0, /^78/) {action}' FILENAME Code:
awk '/^78/ {action}' FILENAME But now if I want to incorporate this code in a bash script. The 78 will be replaced by a variable, say x, according to my goal. Then I tried Code:
x=78 Code:
x=78 Code:
awk -v y=$x '/y/ {action}' FILE Sorry for the further more questions. I really hope to solve it completely and makes things clear. Any help would be appreciated! |
Try to keep $x outside single quotes:
Code:
awk 'match($0, /^'$x'/)' FILE |
So I find it interesting in your first post you say the position of 78 is unknown and yet the solution is to say it is at the start of the line??? I would consider this
knowing the position. Also have you considered (or maybe you do not need to) what happens if you have 785 at the start of a line? Lastly, match is not required as you save no data of the match so a computed regex will do fine: Code:
awk -v y=$x '$0 ~ "^"y{action}' FILE Code:
awk -v y=$x '$1 == y{action}' FILE |
perfect answer
Quote:
785 might also possibly appeared, so I also need to avoid it. I just only need the line have "78", it can have any other characters which I do not care as long as "78" is there. That's why I need EXACT match. Thanks a lot for your code. The first one is exactly what I am looking for! Works like a champ and perfectly fit the whole script so that I do not have to change anything else. Thanks a lot! |
I am happy you have a solution but you have contradicted yourself in the answer. If my solutions work then 78 must be at the start of a line. It will not find 78
anywhere else such as: Quote:
Good luck. |
Quote:
But, thanks anyway, I learnt lot from you. |
I wonder why
Code:
awk '/\y78\y/' FILE Code:
x=78 |
So, still unsolved.
If 78 is not at the beginning of the line. Nothing would work. Code:
So, still I need an answer for the EXACT match. Sorry for those useless and misleading replies I have posted. My faults. If anyone has still some idea on this topic, please post your comment. Any answer would be appreciated. Sorry again for your time and thanks all the same. |
I would stick with the regexp with word boundary solution, trying to understand why they didn't work for you. Which version of awk are you running and on which *nix OS?
|
The real file looks like:
Code:
|
Quote:
I use ubuntu 10.04. The awk I believe is Mawk. Would you please have some further comments? Thanks again! |
Quote:
|
I recall reading somewhere that when you use a variable in a test, the contents of it are treated as a regex, meaning you should store the entire pattern in the variable and leave the /../ brackets off. I can't locate any clear statement for it in the gawk manual, but it does appear to be true in testing:
Code:
$ x=78 Code:
$ awk -v y="^$x" 'y { print }' file.txt Code:
$ awk -v y="\\\y$x\\\y" '$0 ~ y {print}' file.txt Code:
$ mawk -v y="(^$x|$x$|[ \t]$x[ \t])" '$0 ~ y { print }' file.txt |
Ok. This is a mawk problem, since word boundaries are a GNU awk (gawk) extension. You can try something different, like:
Code:
awk '/[^0-9]'$x'[^0-9]/' FILE Code:
awk '/[^[:alnum:]]'$x'[^[:alnum:]]/' FILE |
Well the first suggestion I would have is to remove mawk and install gawk. It sounds harsh but mawk has a number of limitations that i have run into (not really sure why it was created actually).
All of colucix's suggestions are good and can be used with a variable and computed regex as well (at least I hope so) |
I'm not getting any output with character classes in mawk. The same patterns do work in gawk and nawk, so it looks like mawk just doesn't support them either.
|
Quote:
Code:
$ mawk '/[^[:alnum:]]'$x'[^[:alnum:]]/' testfile |
Hmm. Version listed as 1.3.3 here. The exact same command and data isn't giving me anything. Again, nawk and gawk do just fine.
The release page does say that some unspecified "improvements" to the regex engine were added recently. http://freshmeat.net/projects/mawk/releases Debian's always behind the times. :( |
All times are GMT -5. The time now is 10:17 PM. |