LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   SED/AWK - Delete all lines until empty line is found after pattern match (https://www.linuxquestions.org/questions/programming-9/sed-awk-delete-all-lines-until-empty-line-is-found-after-pattern-match-936692/)

vikas027 03-27-2012 08:50 AM

SED/AWK - Delete all lines until empty line is found after pattern match
 
Dear All,

I am sitting on a CentOS 5.4 server and trying to delete all lines in a file until a blank line is found after a string is matched.

For E.g. I have file 123
Code:

$ cat 123
hi

#hello
how
are

break

hellos
one two three \
five \
six seven

eight
nine ten
eleven
hello

abcd
efgh
hello
commuted

one two three

I need to modify this file as below; such that all lines are deleted after matching string hello until a blank line is found (not delete the blank line)
Code:

hi


break


eight
nine ten
eleven


abcd
efgh


one two three

I have googled and tried to match lines after string hello until a blank line is found.
Code:

$ sed -n '/hello/,/^$/p' 123
But I need just inverse of it.

Any suggestions ? Thanks in advance.

vikas027 03-27-2012 09:03 AM

Hi All,

I have found a partial solution.

I redirected the file and use the awk one liner but it DOES NOT preserves blank lines :scratch:
Code:

$ sed -n '/hello/,/^$/p' 123 > 456
$ awk 'FNR==NR{f[$1];next}(!($1 in f)) ' 456 123
hi
break
eight
nine ten
eleven
abcd
efgh


grail 03-27-2012 09:19 AM

How about:
Code:

awk '/hello/{while(getline && $0 != ""){}}1' 123

firstfire 03-27-2012 09:36 AM

Hi.

Code:

$ sed  '/hello/,/^$/{/^$/!d}' test.dat
hi


break


eight
nine ten
eleven

abcd
efgh

one two three


danielbmartin 03-27-2012 09:37 AM

Quote:

Originally Posted by grail (Post 4637681)
Code:

awk '/hello/{while(getline && $0 != ""){}}1' 123

I admire this concise and loop-less solution. It would help awk-learners such as me to have a few words of explanation. Thanks!

Daniel B. Martin

vikas027 03-27-2012 09:54 AM

Fantastic
 
Quote:

Originally Posted by grail (Post 4637681)
How about:
Code:

awk '/hello/{while(getline && $0 != ""){}}1' 123

Awesome Grail.
You have been my saviour number of times. Many many thanks for this perfect one liner. :hattip:

grail 03-27-2012 10:00 AM

@Daniel - sorry about that ... forget some times as just come up with solution and move on (my bad)

Ultimately the while loop must have an action associated so a blank set of braces will execute nothing for each loop. As the loop ends when it finds a line that is empty, when it leaves
the loop the 1 at the end will also print this line (which the user wanted)

Please let me know if you need further information and thanks again for the reminder ;)

danielbmartin 03-27-2012 11:23 AM

Quote:

Originally Posted by grail (Post 4637730)
Please let me know if you need further information and thanks again for the reminder ;)

This is helpful. Allow me to rephrase and elaborate the operation of the awk and then you add/correct as need be.
Code:

awk '/hello/{while(getline && $0 != ""){}}1' 123
The awk spins down the "123" input file, printing each line as it goes, until it encounters a line containing the search string hello. At that point it enters a while loop inside the {braced clause}. Each iteration of the while uses a getline to read another line from "123" and tests the newly-read line with $0 != "" to see if it is *not* a null string. If not a null string, it executes a no-op (the empty braces). If a null string, it leaves the braced clause, prints the line, and resumes searching for hello.

My Dougherty & Robbins "sed & awk" book doesn't tell about getline. In context, it reads the next line. How and why is the getline and-ed with the logical result of $0 != ""?

Daniel B. Martin

firstfire 03-27-2012 12:49 PM

Hi, Daniel.

Quote from `info gawk getline':
Quote:

The `getline' command returns one if it finds a record and zero if
it encounters the end of the file. If there is some error in getting a
record, such as a file that cannot be opened, then `getline' returns
-1. In this case, `gawk' sets the variable `ERRNO' to a string
describing the error that occurred.
So the while loop stops either at end of file or at empty line.

danielbmartin 03-27-2012 05:31 PM

Quote:

Originally Posted by firstfire (Post 4637906)
So the while loop stops either at end of file or at empty line.

Thank you, firstfire, that makes sense.

Please consider this input file which is the same as that posted by OP but for two lines added at the end.
Code:

how
are

break

hellos
one two three \
five \
six seven

eight
nine ten
eleven
hello

abcd
efgh
hello
commuted

one two three
hello
bogus

OP wants to suppress all lines beginning with one containing hello and up to but not including a blank line. The last record (bogus) should not appear in the output but it does. Am I being too picky?

Daniel B. Martin

Tinkster 03-27-2012 08:11 PM

Quote:

Originally Posted by danielbmartin (Post 4638114)
Thank you, firstfire, that makes sense.

Please consider this input file which is the same as that posted by OP but for two lines added at the end.
Code:

how
are

break

hellos
one two three \
five \
six seven

eight
nine ten
eleven
hello

abcd
efgh
hello
commuted

one two three
hello
bogus

OP wants to suppress all lines beginning with one containing hello and up to but not including a blank line. The last record (bogus) should not appear in the output but it does. Am I being too picky?

Daniel B. Martin

But there's no blank line after bogus, which is not the OPs spec for deletion...

danielbmartin 03-27-2012 08:20 PM

Quote:

Originally Posted by Tinkster (Post 4638243)
But there's no blank line after bogus, which is not the OPs spec for deletion...

You are right. I interpret the OP's requirement in an algorithmic fashion:
1) Start suppressing printing whenever you see hello.
2) Stop suppressing printing whenever you see a blank line.

There was no assurance that each and every hello had a blank line "partner."

This is another grain of evidence which shows that writing an airtight spec is more difficult than it might seem.

Daniel B. Martin

grail 03-28-2012 01:42 AM

Quote:

There was no assurance that each and every hello had a blank line "partner."
Actually I would disagree with your assumption, due to the following entries:
Quote:

I am sitting on a CentOS 5.4 server and trying to delete all lines in a file until a blank line is found after a string is matched.

# and

such that all lines are deleted after matching string hello until a blank line is found (not delete the blank line)
The second infers that not only is there a blank matching line but that it is not to be removed.

The other point to remember is that any programming is only ever as good as the information supplied, hence based on example data and information the current script is a worthy solution.
Of course anything could be added to the data to cause anyone one solution to fail.

danielbmartin 03-28-2012 08:33 AM

Quote:

Originally Posted by grail (Post 4638390)
... anything could be added to the data to cause anyone one solution to fail.

You are right, and I'm not trying to be annoying. Real-world data often has defects in format or content. It's impossible to code for every possible "glitch" but it's also a mistake to imagine there will be none.

Daniel B. Martin


All times are GMT -5. The time now is 01:55 PM.