[BASH/SHELL] grep/extract/display lines under specific strings (irregular number of lines)
Hi,
What I want to do here, is to display all the lines under the ones beginning with capital letters. Initially I thought I could do something like this: Code:
cat file.out | egrep -A 3 "AAA-1|DFG-54" | egrep -v "AAA-1|DFG-54" I was hoping that maybe here someone would give me some better way to do that? file.out Code:
AAA-1 |
You need something with a bit of logic plus regex - awk, perl, python, whatever you're comfortable with.
Find your header lines, set a flag and get the next record - print while you have the flag set. When you reach another (non-wanted) header turn the flag off. Standard stuff with the right tool. |
'file.out' is an example input? Then what is the expected output?
|
Expected output:
Code:
| - blablabla1 |
Edit: Nevermind, I misunderstood the original question. If you only want specific groups (rather than "all lines under ones starting with capital letters" as you state in the OP), you will need something with "looping" capabilities.
|
I am with syg00, pick your poison on which best suits you, but awk would be a doddle
|
This can be done with grep and a simple regex - you just need a couple of things:
* A lookbehind to locate the headers without matching (lookbehinds require Perl regex, so -P) * The ability to match each section in one go means crossing lines, so -z prevents grep splitting on newline. Then the regex is straightforward: Code:
grep -Poz '(?<=AAA-1\n|DFG-54\n)(\n?\| - [^\n]+)+' file.out The second part matches as many sub-items as possible by checking for their literal '| - ' prefix, with an optional the newline to match when it needs to (but not after headers). The [^\n]+ part could be replaced with a specific sub-pattern if further filtering is needed. |
Nice. I use perlre so rarely these days, I've nearly forgotten it all.
|
Thank you Boughtonp for solution and description!
I had to modify it on my system to: Code:
grep -Poz '(?<=AAA-1\n|DFG-54)(\n?\| - [^\n]+)+' file.out Code:
| - blablabla4| - eweerterewr |
Quote:
Your modification will be fine if AAA-1 is the first section, but otherwise you might want to remove all newlines on the headers, make the optional one mandatory, then just trim the first line, i.e: Code:
grep -Poz '(?<=AAA-1|DFG-54)(\n\| - [^\n]+)+' file.out | sed 1d |
Here is the simple awk (could perhaps be smaller if more is known about data)
Code:
awk 'x{if(/^\|/)print;else x=0}/AAA-1|DFG-54/{x=1}' file.out |
The same idea, but my way of coding:
Code:
awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=0} $0~search {prt=1} prt' file.out In this case, knowing that "search" won't start with a | character, one can condense it to Code:
awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=($0~search)} prt' file.out Code:
awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=0} prt; $0~search {prt=1}' file.out |
All times are GMT -5. The time now is 07:00 AM. |