LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-08-2020, 03:45 AM   #1
czezz
Member
 
Registered: Nov 2004
Distribution: Slackware/Solaris
Posts: 924

Rep: Reputation: 43
[BASH/SHELL] grep/extract/display lines under specific strings (irregular number of lines)


Hi,

What I want to do here, is to display all the lines under the ones beginning with capital letters.
Initially I thought I could do something like this:
Code:
cat file.out | egrep -A 3 "AAA-1|DFG-54" | egrep -v "AAA-1|DFG-54"
However, the downside of this method is that I need to give static number of lines to be displayed and it is not always 3.
I was hoping that maybe here someone would give me some better way to do that?

file.out
Code:
AAA-1
| - blablabla1
| - blablabla2
| - blablabla4
EDF-2
| - ertgertg
| - werwerwe
| - wet4rt4
| - erkhrg34
IDW-34
| - ewerewr
| - werrfgerwe
DFG-54
| - eweerterewr
| - werwerwe
| - w44dfgdf
| - ewee453
| - werertre
| - w44derterfgdf
 
Old 01-08-2020, 05:20 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
You need something with a bit of logic plus regex - awk, perl, python, whatever you're comfortable with.
Find your header lines, set a flag and get the next record - print while you have the flag set. When you reach another (non-wanted) header turn the flag off.

Standard stuff with the right tool.
 
Old 01-08-2020, 05:21 AM   #3
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,871
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
'file.out' is an example input? Then what is the expected output?
 
Old 01-08-2020, 05:38 AM   #4
czezz
Member
 
Registered: Nov 2004
Distribution: Slackware/Solaris
Posts: 924

Original Poster
Rep: Reputation: 43
Expected output:

Code:
| - blablabla1
| - blablabla2
| - blablabla4
| - eweerterewr
| - werwerwe
| - w44dfgdf
| - ewee453
| - werertre
| - w44derterfgdf
If I run: cat file.out | egrep -A 3 "AAA-1|DFG-54" | egrep -v "AAA-1|DFG-54" it will be more or less what I need although limited to only 3 lines whereas I need all lines in each selected section.
 
Old 01-08-2020, 06:01 AM   #5
individual
Member
 
Registered: Jul 2018
Posts: 315
Blog Entries: 1

Rep: Reputation: 233Reputation: 233Reputation: 233
Edit: Nevermind, I misunderstood the original question. If you only want specific groups (rather than "all lines under ones starting with capital letters" as you state in the OP), you will need something with "looping" capabilities.

Last edited by individual; 01-08-2020 at 06:21 AM.
 
Old 01-08-2020, 06:12 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
I am with syg00, pick your poison on which best suits you, but awk would be a doddle
 
Old 01-08-2020, 06:28 AM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554
This can be done with grep and a simple regex - you just need a couple of things:
* A lookbehind to locate the headers without matching (lookbehinds require Perl regex, so -P)
* The ability to match each section in one go means crossing lines, so -z prevents grep splitting on newline.

Then the regex is straightforward:
Code:
grep -Poz '(?<=AAA-1\n|DFG-54\n)(\n?\| - [^\n]+)+' file.out
| - blablabla1
| - blablabla2
| - blablabla4
| - eweerterewr
| - werwerwe
| - w44dfgdf
| - ewee453
| - werertre
| - w44derterfgdf
The lookbehind part (?<=...) contains each of the required headers/prefixes (including a newline for each one to prevent blank lines), and is easy to add new headers to: (?<=AAA-1\n|DFG-54\n|ANOTHER-1\n)

The second part matches as many sub-items as possible by checking for their literal '| - ' prefix, with an optional the newline to match when it needs to (but not after headers). The [^\n]+ part could be replaced with a specific sub-pattern if further filtering is needed.

Last edited by boughtonp; 01-08-2020 at 06:31 AM.
 
3 members found this post helpful.
Old 01-08-2020, 06:45 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Nice. I use perlre so rarely these days, I've nearly forgotten it all.
 
Old 01-08-2020, 06:54 AM   #9
czezz
Member
 
Registered: Nov 2004
Distribution: Slackware/Solaris
Posts: 924

Original Poster
Rep: Reputation: 43
Thank you Boughtonp for solution and description!
I had to modify it on my system to:
Code:
grep -Poz '(?<=AAA-1\n|DFG-54)(\n?\| - [^\n]+)+' file.out
Otherwise the first line of the next section (DFG-54) was displayed in the same line as the last line of the first section (AAA-1). Its probably something specific to my grep version.

Code:
| - blablabla4| - eweerterewr
Thanks again
 
Old 01-08-2020, 07:21 AM   #10
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,616

Rep: Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554Reputation: 2554
Quote:
Originally Posted by czezz View Post
...Its probably something specific to my grep version.
Hrm, actually I think it was the system I tested on - I checked with a newer grep and also get the merged lines you mentioned.

Your modification will be fine if AAA-1 is the first section, but otherwise you might want to remove all newlines on the headers, make the optional one mandatory, then just trim the first line, i.e:
Code:
grep -Poz '(?<=AAA-1|DFG-54)(\n\| - [^\n]+)+' file.out | sed 1d
 
Old 01-08-2020, 08:07 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Here is the simple awk (could perhaps be smaller if more is known about data)
Code:
awk 'x{if(/^\|/)print;else x=0}/AAA-1|DFG-54/{x=1}' file.out
 
Old 01-10-2020, 02:54 AM   #12
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,806

Rep: Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207Reputation: 1207
The same idea, but my way of coding:
Code:
awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=0} $0~search {prt=1} prt' file.out
There is a criterion for "stop printing, prt=0" and a criterion for "start printing, prt=1", and at the appropriate place there is "prt" meaning "print if true".
In this case, knowing that "search" won't start with a | character, one can condense it to
Code:
awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=($0~search)} prt' file.out
Just seeing you do not want to print the header, so the place of the "prt" must be moved:
Code:
awk -v search="AAA-1|DFG-54" '$0!~/^|/ {prt=0} prt; $0~search {prt=1}' file.out

Last edited by MadeInGermany; 01-10-2020 at 03:06 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Appending matching strings to specific lines (sed/bash) suntzu Programming 18 09-08-2012 03:29 PM
BASH: replace strings in on file by the strings in another one cristalp Programming 5 10-28-2011 09:47 AM
bash script to count number of lines with a specific property7 hhamid Programming 10 08-13-2010 01:35 AM
Irregular behavior of KNetworkManager (0.1) under OpenSUSE 10.2 isg91xf Linux - Newbie 7 01-06-2010 11:12 AM
Extract lines containing some strings without affectting sequential order cgcamal Programming 7 11-06-2008 11:57 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:16 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration