LinuxQuestions.org - [SOLVED] Grep varying no. of lines between two patterns

Page 1 of 2

Show 50 post(s) from this thread on one page

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Grep varying no. of lines between two patterns (https://www.linuxquestions.org/questions/linux-newbie-8/grep-varying-no-of-lines-between-two-patterns-874313/)

Tauro

04-11-2011 08:44 AM

Grep varying no. of lines between two patterns

I have a file that goes like this
>pattern 1
xyz
xyz
abc
asdfg
>pattern 2
xyz
>pattern 1
adbf
sfni
>pattern 2
bla bla
xyz

I need to grep the lines between pattern 1 and pattern 2 and not the lines following pattern 2. Cannot use grep -A(num), as there are varying number of lines following pattern 1. Also, used awk one-liners, but results are erroneous.

I'll be glad if someone comes up with a good one-liner for this

sycamorex

04-11-2011 09:03 AM

Hi and welcome to LQ.

Try using SED to accomplish the task. If you are stuck at any point, feel free to post your code. We'll be happy to assist you.

MTK358

04-11-2011 09:46 AM

AWK code:

Code:

BEGIN {

    inside = 0;

}



/>pattern 1/ {

    inside = 1;

}



/>pattern 2/ {

    inside = 0;

}



/your pattern/ && inside {

    do stuff

}

sycamorex

04-11-2011 10:07 AM

Ok, since we have started giving solutions, the sed one (if I understand the problem correctly) would be as follows:

Code:

sed -n '/<pattern 1/,/<pattern 2/p' infile

grail

04-11-2011 10:26 AM

Assuming header and footer also not wanted:

Code:

awk '/>pattern 1/,/>pattern 2/{if(!/pattern/)print}' file

Or maybe:

Code:

awk '!(NR % 2)' RS=">pattern [12]\n" ORS="" file

Tauro

04-12-2011 01:18 AM

@sycamorex
Thanx :)
Used a combination of sed n grep.. as I do not need the line containing pattern 2.

grail

04-12-2011 02:39 AM

Don't forget to mark as SOLVED once you have a solution.

Tauro

04-12-2011 03:14 AM

Also, in the same file I have certain patterns that go on this way.

>pattern1
xyz
zz
sss
dd
>pattern2
ggg
ddd
aa
>pattern1
cwefw
swd
>pattern1 pattern2
ggg
ss
aaa
s
>pattern2

In this case, the sed one liner wont pick up the lines following ">pattern1 pattern2".. based on sed -n '/pattern1/,/pattern2/p' file.

mayursingru

04-12-2011 07:01 AM

Hi Tauro,
Try this out

Code:

sed -n '/pattern1/,/pattern2/p;/pattern1 pattern2/,/pattern2/p' file

Regards,
Mayur Singru

kurumi

04-12-2011 07:15 AM

using Ruby

Code:

$ ruby -0777 -ne 'puts $_.scan(/pattern 1(.*?)pattern 2/m)' file



xyz

xyz

abc

asdfg

>



adbf

sfni

>

sycamorex

04-12-2011 07:57 AM

Quote:

Originally Posted by Tauro (Post 4321950)

What about:

Code:

sed -n '/pattern1/,/>pattern2/p' file

grail

04-12-2011 08:07 AM

Quote:

In this case, the sed one liner wont pick up the lines following ">pattern1 pattern2".. based on sed -n '/pattern1/,/pattern2/p' file.

There are plenty of patterns that will not fit your original query. Also, you would have to explain again what you want to be the output, ie. should it display the single space
between '>pattern1 pattern2' or should it now display until '>pattern2' is found at the start of the line.

sycamorex

04-12-2011 08:11 AM

As grail pointed out, it'd be helpful if you could provide us with more specific information (ideally posting the exact input file and how the output should look like)

Tauro

04-12-2011 12:15 PM

@grail and sycamorex
Alright.. Here is what I specifically want. Below is 0.1% of my data.

>Q53HC2_HUMAN/218-253 PF10417.3;1-cysPrx_C;
ALQYVETHGEVCPANWTPDSPTIKPSPAASKEYFQK

>A4JFS8_BURVG/507-580 PF12796.1;Ank_2;
ACDAGDHYPLHLLVWKNDYRQLEKELQGQNVEAVDPRGRTLLHLAVSLGH
LESARVLLRHKADVTKENRQGWTVLHEAVSTGDPEMVYTVLQHRDYHNTS

>B4DZA5_HUMAN/287-857 PF04547.6;Anoctamin;
IRKYYGEKIGIYFAWLGYYTQMLLLAAVVGVACFLYGYLNQDNCTWSKEV
CHPDIGGKIIMCPQCDRLCPFWKLNITCESSKKLCIFDSFGTLVFAVFMG
VWVTLFLEFWKRRQAELEYEWDTVELQQEEQARPEYEARCTHVVIDEITQ
EEERIPFTAWGKCIRITLCASAVFFWILLIIASVIGIIVYRLSVFIVFSA

>ANFC_HUMAN/94-126 PF00212.12;ANP;
NARKYKGANKKGLSKGCFGLKLDRIGSMSGLGC

I need the lines containing HUMAN and the lines following it till it hits the next pattern ">".
When the third one is considered here, sed one liner picks up ' >B4DZA5_HUMAN... >ANFC_HUMAN' and not the line following ANFC_HUMAN.
I think I made it clear now.
:)
Thnx in advance for helping

sycamorex

04-12-2011 01:59 PM

Is there any common pattern in the pattern 2 lines (A4JFS8_BURVG/507-580 PF12796.1;Ank_2;)?

All times are GMT -5. The time now is 07:24 PM.

Page 1 of 2

Show 50 post(s) from this thread on one page