LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   AWK - skip line if line contains pattern and print next line (https://www.linuxquestions.org/questions/linux-newbie-8/awk-skip-line-if-line-contains-pattern-and-print-next-line-4175613269/)

udiubu 09-05-2017 08:15 AM

AWK - skip line if line contains pattern and print next line
 
Dear experts,

From a txt file in input:

0 this_pattern
1
2
3
4
0 this_pattern
5
6
7
8
9
10
11
0 this_pattern
etc.

I would like to find matching strings (e.g. "this_pattern") and print it along the next following FIVE lines. Mandatory is that the following lines must not contain "this pattern"; rather, this line has to be ignored, and the next following line has to be print instead.

My output should be the following:

0 this_pattern
1
2
3
4
5
0 this_pattern
5
6
7
8
9
etc.

So far this is my AWK command:

awk '/ this_pattern / {nr[NR]; nr[NR+1]; nr[NR+2]; nr[NR+3]; nr[NR+4]; nr[NR+5]} ; NR in nr'

However, this does not prevent lines including "this_pattern" to be ignored.

I essentially only need to say somehow that if NR+n has the pattern "this_pattern", it has to be ignored.

Any help or different approach would be highly appreciated.

Sincerely,

Udiubu

Turbocapitalist 09-05-2017 08:21 AM

I would have the pattern increment a counter. Then I would have a second statement print a line, and increment the counter, if the counter is already greater than zero. Then if the counter is greater than 5, reset it to zero.

Turbocapitalist 09-05-2017 08:30 AM

I'd also anchor the pattern to the beginning of the line or to the beginning of the field. Use a ^ for that

udiubu 09-05-2017 11:26 AM

Dear Turbocapitalist,

Thanks for your reply.
It's not entirely clear to me what you exactly mean, though.
Maybe a command line would be more helpful to get your point.
Udiubu

MadeInGermany 09-05-2017 01:58 PM

The elegant way sets a counter that is decremented until 0
Code:

awk '/this_pattern/ {cnt=6} (cnt && cnt--)'
This prints 6 lines including the matching line.
By switching the order one can print the (5) following lines
Code:

awk '(cnt && cnt--); /this_pattern/ {cnt=5}'

Turbocapitalist 09-05-2017 02:14 PM

The first line above produces the output shown as as an example in #1 above.

It is much smoother that what I proposed. However, being in the newbie subforum, it can be pointed out the shortcuts that awk takes: If an action statement is left off after the pattern, a print is assumed, and if the print has no parameters then $0 is assumed.

So the following is the same as the first line above, but shows the 'hidden' print statement:

Code:

awk '/this_pattern/ {cnt=6}; (cnt && cnt--) {print}'
About anchoring the pattern, you can make the pattern apply only to the second column using a tilde. You can make the pattern start matching only from the start of the column using a caret.

Code:

awk '$2 ~ /^this_pattern/ {cnt=6}; (cnt && cnt--) {print}'
The $2 stands for the second column.

udiubu 09-05-2017 02:56 PM

Thanks to both of you for the great suggestions.
I got the point of the commands.
The problem is that this does not solve my issue.
I am using

awk '/this_pattern/ {cnt=6} (cnt && cnt--)'

but I still get the following :

0 this_pattern
1
2
3
4
0 this_pattern
5
6
7
8
9
0 this_pattern

After each "this pattern" line I would need FIVE lines not containing the string "this_pattern", which can simply be ignored.

0 this_pattern
1
2
3
4
5
0 this_pattern
5
6
7
8
9
0 this_pattern

In this sense, when the first "this_pattern" matches, the next five lines need to be printed, but the second "this_pattern" has to be skipped.
Following, when the second "this_pattern" matches, the next five lines need to be printed. Or is it the case that once a line is skipped, it cannot be retrieved again? An ideal solution would be to say that the next following lines must not contain alphabetic strings: if this is so, ignore it.

Hope this help!
I thank you so much for your valuable help.

Best,
Udiubu

MadeInGermany 09-05-2017 03:05 PM

The following, if meeting another match, skips the printing and decrementing
Code:

awk '/this_pattern/ {if (cnt) next; cnt=6} (cnt && cnt--)'

Turbocapitalist 09-05-2017 10:01 PM

Wait. We're missing an explanation for where the extra "5" comes from. It is not in the input you have shown. But it is in the output:

Quote:

After each "this pattern" line I would need FIVE lines not containing the string "this_pattern", which can simply be ignore
Code:

0 this_pattern
1
2
3
4
5
0 this_pattern
5
6
7
8
9
0 this_pattern
. . .


Do you need some missing numbers filled in by the script so as to always have exactly five lines after the pattern? If so, how should the numbers be calculated?

grail 09-06-2017 12:57 AM

@Turbocapitalist - I think the '5' you mention actually comes from the line after this_pattern in the input file, hence the solution is a little more complicated. Not only do you need to provide data from additional lines but you also then need a rewind type option.

If I am understanding correclty, and OP may correct if not, I think the following example is a little clearer:
Code:

0 this_pattern
1
2
0 this_pattern
3
4
0 this_pattern
5
6
7
8
9

So assuming the above is the input, I suggest the output is as follows:
Code:

0 this_pattern
1
2
3 # this after second match
4 # this after second match
5 # this after third match
0 this_pattern # this IS second match at line 4 of input
3
4
5 # up to next 0 are after third match
6
7
0 this_pattern # this IS third match at line 7
5
6
7
8
9

As you can see, you are not only printing additional lines that do not include the pattern, but you are also then having to save / rewind back to any pattern lines found along the way

So my suggestion would be to create an array to store the required lines and when you hit the limit, in this case 5, you print out the array that has this many items

pan64 09-06-2017 01:46 AM

yes, would be nice to see a better input/output example
What I can imagine is to store the state somehow and print lines according to that state:
Code:

awk '
# but obviously this line does not meet the requirements
# so need to be improved
/pattern/ { state=found; nr=NR }
state==found { skip this line, but remember to print next 5 }
"within next 5 and /pattern/" { skip this line, recalculate "next 5" }
"within next 5 and no pattern" { print }
'

but still not sure if this was the real goal (or something else)

udiubu 09-06-2017 03:20 AM

Thanks everyone for suggestions:

@grail: you got exactly the point and your example is perfect to test.
There should exactly be a rewind back to any pattern lines found along the way.
However I honestly do not really know how to implement it.

@MadeInGermany: this command works, but indeed once you skip a matched string along the next lines, it is simply lost and not recoverable.
awk '/this_pattern/ {if (cnt) next; cnt=6} (cnt && cnt--)'

Thanks for helping!

MadeInGermany 09-06-2017 04:46 AM

Yes, information from a past cycle needs to be saved.
For example
Code:

awk '/this_pattern/ {if (cnt) {save=$0; saved=1; next} else cnt=6} { if (cnt) {cnt--; print} else if (saved) {print "saved:"save; saved=0; print; cnt=6}}'
For demonstration I have added "saved:".
This might be close to your requirement (that I have still not got in full).

grail 09-06-2017 05:47 AM

Ok, so not pretty, but this is what I came up with so far:
Code:

/pattern/{
  c++

  a[c][0] = $0

  next
}

{
  for(i in a){
    a[i][length(a[i])+1] = $0

    if(length(a[i]) > 5){
      for(j in a[i])
        print a[i][j]

      delete a[i]
    } 
  }
}

You can write it on one line, just remember the semi-colons :)

Turbocapitalist 09-06-2017 06:43 AM

Very cool. I was wondering how to push something onto an array.

Isn't something needed at the end for a catch-all? There might be an array or two left over with fewer than 5 elements.

Code:

END {
  for(i in a){
    for(j in a[i])
      print a[i][j]
  }
}



All times are GMT -5. The time now is 12:30 PM.