LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-05-2017, 08:15 AM   #1
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Rep: Reputation: Disabled
AWK - skip line if line contains pattern and print next line


Dear experts,

From a txt file in input:

0 this_pattern
1
2
3
4
0 this_pattern
5
6
7
8
9
10
11
0 this_pattern
etc.

I would like to find matching strings (e.g. "this_pattern") and print it along the next following FIVE lines. Mandatory is that the following lines must not contain "this pattern"; rather, this line has to be ignored, and the next following line has to be print instead.

My output should be the following:

0 this_pattern
1
2
3
4
5
0 this_pattern
5
6
7
8
9
etc.

So far this is my AWK command:

awk '/ this_pattern / {nr[NR]; nr[NR+1]; nr[NR+2]; nr[NR+3]; nr[NR+4]; nr[NR+5]} ; NR in nr'

However, this does not prevent lines including "this_pattern" to be ignored.

I essentially only need to say somehow that if NR+n has the pattern "this_pattern", it has to be ignored.

Any help or different approach would be highly appreciated.

Sincerely,

Udiubu

Last edited by udiubu; 09-05-2017 at 11:20 AM.
 
Old 09-05-2017, 08:21 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,611
Blog Entries: 3

Rep: Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859
I would have the pattern increment a counter. Then I would have a second statement print a line, and increment the counter, if the counter is already greater than zero. Then if the counter is greater than 5, reset it to zero.
 
Old 09-05-2017, 08:30 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,611
Blog Entries: 3

Rep: Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859
I'd also anchor the pattern to the beginning of the line or to the beginning of the field. Use a ^ for that
 
Old 09-05-2017, 11:26 AM   #4
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Original Poster
Rep: Reputation: Disabled
Dear Turbocapitalist,

Thanks for your reply.
It's not entirely clear to me what you exactly mean, though.
Maybe a command line would be more helpful to get your point.
Udiubu
 
Old 09-05-2017, 01:58 PM   #5
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,741

Rep: Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790
The elegant way sets a counter that is decremented until 0
Code:
awk '/this_pattern/ {cnt=6} (cnt && cnt--)'
This prints 6 lines including the matching line.
By switching the order one can print the (5) following lines
Code:
awk '(cnt && cnt--); /this_pattern/ {cnt=5}'

Last edited by MadeInGermany; 09-05-2017 at 02:00 PM.
 
Old 09-05-2017, 02:14 PM   #6
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,611
Blog Entries: 3

Rep: Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859
The first line above produces the output shown as as an example in #1 above.

It is much smoother that what I proposed. However, being in the newbie subforum, it can be pointed out the shortcuts that awk takes: If an action statement is left off after the pattern, a print is assumed, and if the print has no parameters then $0 is assumed.

So the following is the same as the first line above, but shows the 'hidden' print statement:

Code:
awk '/this_pattern/ {cnt=6}; (cnt && cnt--) {print}'
About anchoring the pattern, you can make the pattern apply only to the second column using a tilde. You can make the pattern start matching only from the start of the column using a caret.

Code:
awk '$2 ~ /^this_pattern/ {cnt=6}; (cnt && cnt--) {print}'
The $2 stands for the second column.
 
Old 09-05-2017, 02:56 PM   #7
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Original Poster
Rep: Reputation: Disabled
Thanks to both of you for the great suggestions.
I got the point of the commands.
The problem is that this does not solve my issue.
I am using

awk '/this_pattern/ {cnt=6} (cnt && cnt--)'

but I still get the following :

0 this_pattern
1
2
3
4
0 this_pattern
5
6
7
8
9
0 this_pattern

After each "this pattern" line I would need FIVE lines not containing the string "this_pattern", which can simply be ignored.

0 this_pattern
1
2
3
4
5
0 this_pattern
5
6
7
8
9
0 this_pattern

In this sense, when the first "this_pattern" matches, the next five lines need to be printed, but the second "this_pattern" has to be skipped.
Following, when the second "this_pattern" matches, the next five lines need to be printed. Or is it the case that once a line is skipped, it cannot be retrieved again? An ideal solution would be to say that the next following lines must not contain alphabetic strings: if this is so, ignore it.

Hope this help!
I thank you so much for your valuable help.

Best,
Udiubu
 
Old 09-05-2017, 03:05 PM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,741

Rep: Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790
The following, if meeting another match, skips the printing and decrementing
Code:
awk '/this_pattern/ {if (cnt) next; cnt=6} (cnt && cnt--)'
 
Old 09-05-2017, 10:01 PM   #9
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,611
Blog Entries: 3

Rep: Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859
Wait. We're missing an explanation for where the extra "5" comes from. It is not in the input you have shown. But it is in the output:

Quote:
After each "this pattern" line I would need FIVE lines not containing the string "this_pattern", which can simply be ignore
Code:
0 this_pattern
1
2
3
4
5
0 this_pattern
5
6
7
8
9
0 this_pattern
. . .
Do you need some missing numbers filled in by the script so as to always have exactly five lines after the pattern? If so, how should the numbers be calculated?
 
Old 09-06-2017, 12:57 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,818

Rep: Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083
@Turbocapitalist - I think the '5' you mention actually comes from the line after this_pattern in the input file, hence the solution is a little more complicated. Not only do you need to provide data from additional lines but you also then need a rewind type option.

If I am understanding correclty, and OP may correct if not, I think the following example is a little clearer:
Code:
0 this_pattern
1
2
0 this_pattern
3 
4
0 this_pattern
5
6
7
8
9
So assuming the above is the input, I suggest the output is as follows:
Code:
0 this_pattern
1
2
3 # this after second match
4 # this after second match
5 # this after third match
0 this_pattern # this IS second match at line 4 of input
3
4
5 # up to next 0 are after third match
6
7
0 this_pattern # this IS third match at line 7
5
6
7
8
9
As you can see, you are not only printing additional lines that do not include the pattern, but you are also then having to save / rewind back to any pattern lines found along the way

So my suggestion would be to create an array to store the required lines and when you hit the limit, in this case 5, you print out the array that has this many items
 
Old 09-06-2017, 01:46 AM   #11
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 16,490

Rep: Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532
yes, would be nice to see a better input/output example
What I can imagine is to store the state somehow and print lines according to that state:
Code:
awk '
# but obviously this line does not meet the requirements
# so need to be improved
/pattern/ { state=found; nr=NR }
state==found { skip this line, but remember to print next 5 }
"within next 5 and /pattern/" { skip this line, recalculate "next 5" }
"within next 5 and no pattern" { print }
'
but still not sure if this was the real goal (or something else)
 
Old 09-06-2017, 03:20 AM   #12
udiubu
Member
 
Registered: Oct 2011
Posts: 73

Original Poster
Rep: Reputation: Disabled
Thanks everyone for suggestions:

@grail: you got exactly the point and your example is perfect to test.
There should exactly be a rewind back to any pattern lines found along the way.
However I honestly do not really know how to implement it.

@MadeInGermany: this command works, but indeed once you skip a matched string along the next lines, it is simply lost and not recoverable.
awk '/this_pattern/ {if (cnt) next; cnt=6} (cnt && cnt--)'

Thanks for helping!
 
Old 09-06-2017, 04:46 AM   #13
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,741

Rep: Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790Reputation: 790
Yes, information from a past cycle needs to be saved.
For example
Code:
awk '/this_pattern/ {if (cnt) {save=$0; saved=1; next} else cnt=6} { if (cnt) {cnt--; print} else if (saved) {print "saved:"save; saved=0; print; cnt=6}}'
For demonstration I have added "saved:".
This might be close to your requirement (that I have still not got in full).
 
Old 09-06-2017, 05:47 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,818

Rep: Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083Reputation: 3083
Ok, so not pretty, but this is what I came up with so far:
Code:
/pattern/{
  c++ 

  a[c][0] = $0

  next
}

{
  for(i in a){ 
    a[i][length(a[i])+1] = $0

    if(length(a[i]) > 5){ 
      for(j in a[i])
        print a[i][j]

      delete a[i]
    }   
  }
}
You can write it on one line, just remember the semi-colons
 
1 members found this post helpful.
Old 09-06-2017, 06:43 AM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,611
Blog Entries: 3

Rep: Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859Reputation: 2859
Very cool. I was wondering how to push something onto an array.

Isn't something needed at the end for a catch-all? There might be an array or two left over with fewer than 5 elements.

Code:
END {
  for(i in a){
    for(j in a[i])
      print a[i][j]
  }
}
 
  


Reply

Tags
awk, ignore, line


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk print pattern match line and subsequent dmesserly Programming 2 04-09-2013 05:27 PM
[SOLVED] sed/awk to print every before line that pattern match niharikaananth Linux - Newbie 10 02-22-2012 10:47 PM
[SOLVED] Insert line using sed or awk at line using line number as variable sunilsagar Programming 11 02-03-2012 10:48 AM
[SOLVED] AWK print line only if next line matches a string wolverene13 Linux - Newbie 8 10-03-2011 03:32 PM
Awk question (How to print a line other than the first or the last line) maxxum600si Programming 5 10-15-2009 11:48 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration