LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Deleting n number of consecutive occurrences of a pattern (https://www.linuxquestions.org/questions/linux-newbie-8/deleting-n-number-of-consecutive-occurrences-of-a-pattern-4175559177/)

Thirumala! 11-18-2015 04:42 AM

Deleting n number of consecutive occurrences of a pattern
 
Hello All,

I want to delete a particular number of consecutive occurrences of a pattern from the file using awk. Please help me with the same.

Example of the file contents

0000
0010
0011
0000
0000
0000
0000
0000
1111
1111
0010
0000

Now I want to delete only the block where 0000 has repeated 5 times consecutively and keep other 0000's unchanged. How can i do this using awk?

Thanks in advance,
Thirumala

syg00 11-18-2015 05:07 AM

So you want, what have you attempted ?.
You make the effort, we'll help when you run into trouble.

Thirumala! 11-18-2015 05:13 AM

Hey syg00,

I have tried the below command

cat temp | awk 'N&&sub(PAT,REPL){N--};1' N=291 PAT="0000" REPL="" > temp1
cat temp1 | sed '/^$/d' > temp2

This command deletes first 291 occurrences but I want to delete the 291 consecutive occurrences.

Thanks,
Thirumala

berndbausch 11-18-2015 05:55 AM

Try this:

When the input line matches the pattern, remember the line in an array and count down. If counter is 0, throw the array away and set the counter back to N.
When it doesn't match the pattern:
- if the array isn't empty, less than N patterns were in a row, so write the array out. Clear the array. Set the counter back to N.
- print the current line

I wonder if it can be done with other commands.

grail 11-18-2015 06:05 AM

Another thing to consider would be, what if there are more than 5 in a row? Do you delete if it is 6? Or only if another 5, ie. 10?

Thirumala! 11-18-2015 06:09 AM

It should not replace if occurrences are more than n. And it should replace only if next set of occurrences are n again.

syg00 11-18-2015 06:32 AM

Nope, we are not going to write it for you.
You have been given some hints - incorporate them in your code. The countdown is a good idea, use it to also test if the current record is equal to the previous.

berndbausch 11-18-2015 07:05 PM

Quote:

Originally Posted by berndbausch (Post 5451391)
Try this:

When the input line matches the pattern, remember the line in an array and count down. If counter is 0, throw the array away and set the counter back to N.
When it doesn't match the pattern:
- if the array isn't empty, less than N patterns were in a row, so write the array out. Clear the array. Set the counter back to N.
- print the current line

Sorry I couldn't resist the itch and ended up writing it. Why not share it then:
Code:

#!/usr/bin/awk -f

BEGIN  { N=5; PAT="0000"; ix=0 }
$0==PAT { saved[ix] = $0; ix++;
          N--
          if (N==0) { delete saved; N=5 }
          next                              }

        { for (i in saved) print saved[i]
          delete saved
          N=5
          print                          }

Adding this condition is left as an exercise:
Quote:

Originally Posted by Thirumala! (Post 5451397)
It should not replace if occurrences are more than n. And it should replace only if next set of occurrences are n again.

By the way, now I notice that I forget to reset the index variable ix. Thanks to the associative nature of awk arrays, this doesn't seem to be a problem.

grail 11-19-2015 12:05 AM

@berndbausch - just remember that now this user may expect to be told answers without doing any work in the future too :(

But, as you have let the cat out of the bag, here are 2 points of interest:

1. What happens if the last 3 entries in the file are the pattern?

2. If you rethink your use of N, you could reduce it to only being needed once outside the definition ;) (hint: consider ix values)

Thirumala! 11-19-2015 02:07 AM

Hello All,

Thanks for the help. This is the first time i am using awk so took more help.:)
And rest assured that i will not expect any ready answers from you guys.

Thanks,
Thirumala

berndbausch 11-19-2015 03:20 AM

Quote:

Originally Posted by grail (Post 5451893)
@berndbausch - just remember that now this user may expect to be told answers without doing any work in the future too :(

But, as you have let the cat out of the bag, here are 2 points of interest:

1. What happens if the last 3 entries in the file are the pattern?

2. If you rethink your use of N, you could reduce it to only being needed once outside the definition ;) (hint: consider ix values)

Polishing is exercise for the reader, and if somebody has wrong expectations, they can be reset quickly.
Well, whenI have a little more time I may do the polishing just to prove my value :)

berndbausch 11-19-2015 03:23 AM

Quote:

Originally Posted by grail (Post 5451893)
@berndbausch - just remember that now this user may expect to be told answers without doing any work in the future too :(

But, as you have let the cat out of the bag, here are 2 points of interest:

1. What happens if the last 3 entries in the file are the pattern?

2. If you rethink your use of N, you could reduce it to only being needed once outside the definition ;) (hint: consider ix values)

Well an END clause can take care of #1, and my brain is full so no rethinking #2 for now.

syg00 11-19-2015 03:34 AM

Quote:

Originally Posted by berndbausch (Post 5451765)
Sorry I couldn't resist the itch and ended up writing it. Why not share it then:

:p
Quote:

Thanks to the associative nature of awk arrays, this doesn't seem to be a problem.
They have lots of unexpected behaviours - one of the most notable being that they don't guarantee order.

MadeInGermany 11-19-2015 07:00 AM

Not all awk versions print a
Code:

for (i in array)
in the correct order.
Because the order is to be kept, we can store it in a string as well
Code:

awk '
{ buf=buf sep $0; sep=RS }  # add sep and $0 to buf; undefined variables are "" in string context; RS is newline
$0!="0000" { print buf; f=0; buf=sep=""; next }  # print and clear buffer; "next" skips the following code
++f==5 { f=0; buf=sep="" }  # if 5 found then clear buffer; an undefined variable is 0 in number context
END {if (f>0) print buf}  # print a remaining buffer
' temp


grail 11-19-2015 07:41 AM

I think some of you might be getting a little too carried away with the order stuff, try and remember what is being stored in the array, ie. it is only the same pattern (0000), so really
order here is pretty irrelevant ;)


All times are GMT -5. The time now is 04:42 PM.