LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   exact match in awk (https://www.linuxquestions.org/questions/programming-9/exact-match-in-awk-903799/)

cristalp 09-19-2011 04:31 AM

exact match in awk
 
Dear Experts,

I have a file including number "78" and "178" at different unknown positions in the text.

I want to match the line including "78" and do something to strings in that line.

When I use
Code:

awk 'match($0, "78"){action}'FILENAME
I always got the action applied on both lines including "78" and "178".

I should not use pattern
Code:

$Filed == "78"
for searching, as the position of "78" was not known in advanced.

So, how can I match exactly "78" but not "178" with "match"?

Any help would be greatly appreciated. Thanks a lot for your time!

colucix 09-19-2011 04:46 AM

You can try word boundaries. In awk they are specified by the \y operator and you need to use a regular expression (enclosed in slashes) instead of a string constant:
Code:

awk 'match($0, /\y78\y/)' file

cristalp 09-19-2011 05:04 AM

Quote:

Originally Posted by colucix (Post 4475833)
You can try word boundaries. In awk they are specified by the \y operator and you need to use a regular expression (enclosed in slashes) instead of a string constant:
Code:

awk 'match($0, /\y78\y/)' file

Thanks for the help. I tried your code, and I added print action, but I can not get any output. I do not what would be wrong? Thanks anyway.

cristalp 09-19-2011 05:28 AM

OK, I find a solution myself like:

Code:

awk 'match($0, /^78/) {action}' FILENAME
or
Code:

awk '/^78/ {action}' FILENAME
I've made several tests, and seems all right.

But now if I want to incorporate this code in a bash script. The 78 will be replaced by a variable, say x, according to my goal.

Then I tried
Code:

x=78
awk -v y=$x 'match($0, /^y/) {action}' FILE

and
Code:

x=78
awk -v y=$x '/^y/ {action}' FILE

and even just:
Code:

awk -v y=$x '/y/ {action}' FILE
None of them works. The reason is that the "/" makes the variable not recognizable by awk anymore. So, now, how could I avoid the "/" problem and use variable in exact matching?

Sorry for the further more questions. I really hope to solve it completely and makes things clear. Any help would be appreciated!

colucix 09-19-2011 07:13 AM

Try to keep $x outside single quotes:
Code:

awk 'match($0, /^'$x'/)' FILE

grail 09-19-2011 08:48 AM

So I find it interesting in your first post you say the position of 78 is unknown and yet the solution is to say it is at the start of the line??? I would consider this
knowing the position.

Also have you considered (or maybe you do not need to) what happens if you have 785 at the start of a line?

Lastly, match is not required as you save no data of the match so a computed regex will do fine:
Code:

awk -v y=$x '$0 ~ "^"y{action}' FILE
Or maybe if it is only 78 and your delimiter is next (below assumes default FS):
Code:

awk -v y=$x '$1 == y{action}' FILE

cristalp 09-19-2011 09:03 AM

perfect answer
 
Quote:

Originally Posted by grail (Post 4476053)
So I find it interesting in your first post you say the position of 78 is unknown and yet the solution is to say it is at the start of the line??? I would consider this
knowing the position.

Also have you considered (or maybe you do not need to) what happens if you have 785 at the start of a line?

Lastly, match is not required as you save no data of the match so a computed regex will do fine:
Code:

awk -v y=$x '$0 ~ "^"y{action}' FILE
Or maybe if it is only 78 and your delimiter is next (below assumes default FS):
Code:

awk -v y=$x '$1 == y{action}' FILE

Hi grail, thanks for your detailed answer and kind help! No, 78 is not at the start of the line. For some file, it is in the middle, for some other files, it might be at the start or end. I need a general way to locate the line which includes a "78".

785 might also possibly appeared, so I also need to avoid it. I just only need the line have "78", it can have any other characters which I do not care as long as "78" is there. That's why I need EXACT match.

Thanks a lot for your code. The first one is exactly what I am looking for! Works like a champ and perfectly fit the whole script so that I do not have to change anything else. Thanks a lot!

grail 09-19-2011 09:16 AM

I am happy you have a solution but you have contradicted yourself in the answer. If my solutions work then 78 must be at the start of a line. It will not find 78
anywhere else such as:
Quote:

No, 78 is not at the start of the line. For some file, it is in the middle, for some other files, it might be at the start or end.
Only for the files where 78 is at the start of the line can the solution work. It also does not avoid the problem of 785.

Good luck.

cristalp 09-19-2011 09:30 AM

Quote:

Originally Posted by grail (Post 4476083)
I am happy you have a solution but you have contradicted yourself in the answer. If my solutions work then 78 must be at the start of a line. It will not find 78
anywhere else such as:

Only for the files where 78 is at the start of the line can the solution work. It also does not avoid the problem of 785.

Good luck.

Oops, yes. I made a mistake! Yes I run it on my testing file not the real one I'm gonna to work on. 78 is at the beginning of that test file. I was just too hurry to make the conclusion.

But, thanks anyway, I learnt lot from you.

colucix 09-19-2011 09:40 AM

I wonder why
Code:

awk '/\y78\y/' FILE
or
Code:

x=78
awk '/\y'$x'\y/' FILE

don't work. Please, can you post some lines of the real file to let us test the suggested solutions?

cristalp 09-19-2011 09:41 AM

So, still unsolved.

If 78 is not at the beginning of the line. Nothing would work.

Code:


x=78
awk -v y=$x '/^y/ {action}' FILE

will not work too.

So, still I need an answer for the EXACT match. Sorry for those useless and misleading replies I have posted. My faults.
If anyone has still some idea on this topic, please post your comment. Any answer would be appreciated. Sorry again for your time and thanks all the same.

colucix 09-19-2011 09:44 AM

I would stick with the regexp with word boundary solution, trying to understand why they didn't work for you. Which version of awk are you running and on which *nix OS?

cristalp 09-19-2011 09:47 AM

The real file looks like:
Code:


P      1 P1
  mP      3 P2  4 P2 1 
 mC mI    5 C2
  oP      16 P222  17 P222 1    18 P2 1 2 1 2  19 P2 1 2 1 2 1 
  oC      21 C222  20 C222 1 
  oF      22 F222
  oI      23 I222  24 I2 1 2 1 2 1 
  tP      75 P4  76 P4 1    77 P4 2    78 P4 3    89 P422  90 P42 1 2
          91 P4 1 22  92 P4 1 2 1 2  93 P4 2 22  94 P4 2 2 1 2
          95 P4 3 22  96 P4 3 2 1 2
  tI      79 I4  80 I4 1    97 I422  98 I4 1 22
  hP      143 P3  144 P3 1    145 P3 2    149 P312  150 P321  151 P3 1 12
          152 P3 1 21  153 P3 2 12  154 P3 2 21  168 P6  169 P6 1 
          170 P6 5    171 P6 2    172 P6 4    173 P6 3    177 P622
          178 P6 1 22  179 P6 5 22  180 P6 2 22  181 P6 4 22  182 P6 3 22
  hR      146 R3  155 R32
  cP      195 P23  198 P2 1 3  207 P432  208 P4 2 32  212 P4 3 32
          213 P4 1 32

Hope it would help to illustrate things more clearly.

cristalp 09-19-2011 09:52 AM

Quote:

Originally Posted by colucix (Post 4476121)
I would stick with the regexp with word boundary solution, trying to understand why they didn't work for you. Which version of awk are you running and on which *nix OS?

Thanks for your answer and kind suggestion. I am trying to understand why now.
I use ubuntu 10.04. The awk I believe is Mawk. Would you please have some further comments?

Thanks again!

cristalp 09-19-2011 09:55 AM

Quote:

Originally Posted by colucix (Post 4476111)
I wonder why
Code:

awk '/\y78\y/' FILE
or
Code:

x=78
awk '/\y'$x'\y/' FILE

don't work. Please, can you post some lines of the real file to let us test the suggested solutions?

I tried both of them on my real file again. None of them works. Even for my test file which the 78 and 178 are at the start of the line, they also do not work. That's what I've confirmed so far.


All times are GMT -5. The time now is 12:07 PM.