LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   awk, sed, grep and paragraphs (http://www.linuxquestions.org/questions/linux-newbie-8/awk-sed-grep-and-paragraphs-791196/)

ThinkLinux 02-24-2010 02:49 AM

awk, sed, grep and paragraphs
 
Hi,

I need to extract paragraphs that is more than 4 lines from a text file.
The paragraph length may vary according to the results from a wget request. The paragraphs are separated by blank lines and I need the entire contents of that paragraph to be returned in order to follow the redirects.

What would be the best way of doing this?

Thanks

Tinkster 02-24-2010 04:08 AM

Hi,

welcome to LQ!

The quick & easy way:
Code:

awk 'BEGIN{RS=ORS="\n\n";FS=OFS="\n"}NF>=4' file
[edit]
What this does is quite simple; awk normally operates with
lines (\n) as records, and any number of whitespace as a
field separator. What we did here is to tell it that a field
is anything with a line-end (FS), and that a record is a sequence
of 2 line-endings (RS, with nothing else in between, AKA, our
empty line between paragraphs). The rest is even simpler:
if we have NF (number of fields, AKA lines with content) greater
or equal 4, perform the default action (which is print and
which we have lazily omitted). The significance of RS=ORS
and FS=OFS respectively is that we don't want the output to
be reformatted to "standard" awk separators.
[/edit]


Cheers,
Tink

Tinkster 02-25-2010 07:02 PM

OP, did you find the explanation satisfactory? Nothing left unclear?

Star_Gazer 04-09-2010 01:22 PM

Quote:

Originally Posted by Tinkster (Post 3877043)
OP, did you find the explanation satisfactory? Nothing left unclear?

It educated me some! :hattip:

Not sure if the OP is aware of what "OP" means - depends on whether they are "forum-savvy" or not. :)

Clifton


All times are GMT -5. The time now is 11:16 AM.