LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   [BASH/SHELL] grep/extract/display lines under specific strings (irregular number of lines) (https://www.linuxquestions.org/questions/programming-9/%5Bbash-shell%5D-grep-extract-display-lines-under-specific-strings-irregular-number-of-lines-4175667298/)

czezz 01-08-2020 03:45 AM

[BASH/SHELL] grep/extract/display lines under specific strings (irregular number of lines)
 
Hi,

What I want to do here, is to display all the lines under the ones beginning with capital letters.
Initially I thought I could do something like this:
Code:

cat file.out | egrep -A 3 "AAA-1|DFG-54" | egrep -v "AAA-1|DFG-54"
However, the downside of this method is that I need to give static number of lines to be displayed and it is not always 3.
I was hoping that maybe here someone would give me some better way to do that?

file.out
Code:

AAA-1
| - blablabla1
| - blablabla2
| - blablabla4
EDF-2
| - ertgertg
| - werwerwe
| - wet4rt4
| - erkhrg34
IDW-34
| - ewerewr
| - werrfgerwe
DFG-54
| - eweerterewr
| - werwerwe
| - w44dfgdf
| - ewee453
| - werertre
| - w44derterfgdf


syg00 01-08-2020 05:20 AM

You need something with a bit of logic plus regex - awk, perl, python, whatever you're comfortable with.
Find your header lines, set a flag and get the next record - print while you have the flag set. When you reach another (non-wanted) header turn the flag off.

Standard stuff with the right tool.

NevemTeve 01-08-2020 05:21 AM

'file.out' is an example input? Then what is the expected output?

czezz 01-08-2020 05:38 AM

Expected output:

Code:

| - blablabla1
| - blablabla2
| - blablabla4
| - eweerterewr
| - werwerwe
| - w44dfgdf
| - ewee453
| - werertre
| - w44derterfgdf

If I run: cat file.out | egrep -A 3 "AAA-1|DFG-54" | egrep -v "AAA-1|DFG-54" it will be more or less what I need although limited to only 3 lines whereas I need all lines in each selected section.

individual 01-08-2020 06:01 AM

Edit: Nevermind, I misunderstood the original question. If you only want specific groups (rather than "all lines under ones starting with capital letters" as you state in the OP), you will need something with "looping" capabilities.

grail 01-08-2020 06:12 AM

I am with syg00, pick your poison on which best suits you, but awk would be a doddle

boughtonp 01-08-2020 06:28 AM

This can be done with grep and a simple regex - you just need a couple of things:
* A lookbehind to locate the headers without matching (lookbehinds require Perl regex, so -P)
* The ability to match each section in one go means crossing lines, so -z prevents grep splitting on newline.

Then the regex is straightforward:
Code:

grep -Poz '(?<=AAA-1\n|DFG-54\n)(\n?\| - [^\n]+)+' file.out
| - blablabla1
| - blablabla2
| - blablabla4
| - eweerterewr
| - werwerwe
| - w44dfgdf
| - ewee453
| - werertre
| - w44derterfgdf

The lookbehind part (?<=...) contains each of the required headers/prefixes (including a newline for each one to prevent blank lines), and is easy to add new headers to: (?<=AAA-1\n|DFG-54\n|ANOTHER-1\n)

The second part matches as many sub-items as possible by checking for their literal '| - ' prefix, with an optional the newline to match when it needs to (but not after headers). The [^\n]+ part could be replaced with a specific sub-pattern if further filtering is needed.

syg00 01-08-2020 06:45 AM

Nice. I use perlre so rarely these days, I've nearly forgotten it all.

czezz 01-08-2020 06:54 AM

Thank you Boughtonp for solution and description!
I had to modify it on my system to:
Code:

grep -Poz '(?<=AAA-1\n|DFG-54)(\n?\| - [^\n]+)+' file.out
Otherwise the first line of the next section (DFG-54) was displayed in the same line as the last line of the first section (AAA-1). Its probably something specific to my grep version.

Code:

| - blablabla4| - eweerterewr
Thanks again :)

boughtonp 01-08-2020 07:21 AM

Quote:

Originally Posted by czezz (Post 6075998)
...Its probably something specific to my grep version.

Hrm, actually I think it was the system I tested on - I checked with a newer grep and also get the merged lines you mentioned.

Your modification will be fine if AAA-1 is the first section, but otherwise you might want to remove all newlines on the headers, make the optional one mandatory, then just trim the first line, i.e:
Code:

grep -Poz '(?<=AAA-1|DFG-54)(\n\| - [^\n]+)+' file.out | sed 1d

grail 01-08-2020 08:07 AM

Here is the simple awk (could perhaps be smaller if more is known about data)
Code:

awk 'x{if(/^\|/)print;else x=0}/AAA-1|DFG-54/{x=1}' file.out

MadeInGermany 01-10-2020 02:54 AM

The same idea, but my way of coding:
Code:

awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=0} $0~search {prt=1} prt' file.out
There is a criterion for "stop printing, prt=0" and a criterion for "start printing, prt=1", and at the appropriate place there is "prt" meaning "print if true".
In this case, knowing that "search" won't start with a | character, one can condense it to
Code:

awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=($0~search)} prt' file.out
Just seeing you do not want to print the header, so the place of the "prt" must be moved:
Code:

awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=0} prt; $0~search {prt=1}' file.out


All times are GMT -5. The time now is 07:00 AM.