LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Grep varying no. of lines between two patterns (https://www.linuxquestions.org/questions/linux-newbie-8/grep-varying-no-of-lines-between-two-patterns-874313/)

Tauro 04-11-2011 08:44 AM

Grep varying no. of lines between two patterns
 
I have a file that goes like this
>pattern 1
xyz
xyz
abc
asdfg
>pattern 2
xyz
>pattern 1
adbf
sfni
>pattern 2
bla bla
xyz

I need to grep the lines between pattern 1 and pattern 2 and not the lines following pattern 2. Cannot use grep -A(num), as there are varying number of lines following pattern 1. Also, used awk one-liners, but results are erroneous.

I'll be glad if someone comes up with a good one-liner for this

sycamorex 04-11-2011 09:03 AM

Hi and welcome to LQ.

Try using SED to accomplish the task. If you are stuck at any point, feel free to post your code. We'll be happy to assist you.

MTK358 04-11-2011 09:46 AM

AWK code:

Code:

BEGIN {
    inside = 0;
}

/>pattern 1/ {
    inside = 1;
}

/>pattern 2/ {
    inside = 0;
}

/your pattern/ && inside {
    do stuff
}


sycamorex 04-11-2011 10:07 AM

Ok, since we have started giving solutions, the sed one (if I understand the problem correctly) would be as follows:

Code:

sed -n '/<pattern 1/,/<pattern 2/p' infile

grail 04-11-2011 10:26 AM

Assuming header and footer also not wanted:
Code:

awk '/>pattern 1/,/>pattern 2/{if(!/pattern/)print}' file
Or maybe:
Code:

awk '!(NR % 2)' RS=">pattern [12]\n" ORS="" file

Tauro 04-12-2011 01:18 AM

@sycamorex
Thanx :)
Used a combination of sed n grep.. as I do not need the line containing pattern 2.

grail 04-12-2011 02:39 AM

Don't forget to mark as SOLVED once you have a solution.

Tauro 04-12-2011 03:14 AM

Also, in the same file I have certain patterns that go on this way.

>pattern1
xyz
zz
sss
dd
>pattern2
ggg
ddd
aa
>pattern1
cwefw
swd
>pattern1 pattern2
ggg
ss
aaa
s
>pattern2

In this case, the sed one liner wont pick up the lines following ">pattern1 pattern2".. based on sed -n '/pattern1/,/pattern2/p' file.

mayursingru 04-12-2011 07:01 AM

Hi Tauro,
Try this out
Code:

sed -n '/pattern1/,/pattern2/p;/pattern1 pattern2/,/pattern2/p' file


Regards,
Mayur Singru

kurumi 04-12-2011 07:15 AM

using Ruby

Code:

$ ruby -0777 -ne 'puts $_.scan(/pattern 1(.*?)pattern 2/m)' file

xyz
xyz
abc
asdfg
>

adbf
sfni
>


sycamorex 04-12-2011 07:57 AM

Quote:

Originally Posted by Tauro (Post 4321950)
Also, in the same file I have certain patterns that go on this way.

>pattern1
xyz
zz
sss
dd
>pattern2
ggg
ddd
aa
>pattern1
cwefw
swd
>pattern1 pattern2
ggg
ss
aaa
s
>pattern2

In this case, the sed one liner wont pick up the lines following ">pattern1 pattern2".. based on sed -n '/pattern1/,/pattern2/p' file.


What about:

Code:

sed -n '/pattern1/,/>pattern2/p' file

grail 04-12-2011 08:07 AM

Quote:

In this case, the sed one liner wont pick up the lines following ">pattern1 pattern2".. based on sed -n '/pattern1/,/pattern2/p' file.
There are plenty of patterns that will not fit your original query. Also, you would have to explain again what you want to be the output, ie. should it display the single space
between '>pattern1 pattern2' or should it now display until '>pattern2' is found at the start of the line.

sycamorex 04-12-2011 08:11 AM

As grail pointed out, it'd be helpful if you could provide us with more specific information (ideally posting the exact input file and how the output should look like)

Tauro 04-12-2011 12:15 PM

@grail and sycamorex
Alright.. Here is what I specifically want. Below is 0.1% of my data.

>Q53HC2_HUMAN/218-253 PF10417.3;1-cysPrx_C;
ALQYVETHGEVCPANWTPDSPTIKPSPAASKEYFQK

>A4JFS8_BURVG/507-580 PF12796.1;Ank_2;
ACDAGDHYPLHLLVWKNDYRQLEKELQGQNVEAVDPRGRTLLHLAVSLGH
LESARVLLRHKADVTKENRQGWTVLHEAVSTGDPEMVYTVLQHRDYHNTS

>B4DZA5_HUMAN/287-857 PF04547.6;Anoctamin;
IRKYYGEKIGIYFAWLGYYTQMLLLAAVVGVACFLYGYLNQDNCTWSKEV
CHPDIGGKIIMCPQCDRLCPFWKLNITCESSKKLCIFDSFGTLVFAVFMG
VWVTLFLEFWKRRQAELEYEWDTVELQQEEQARPEYEARCTHVVIDEITQ
EEERIPFTAWGKCIRITLCASAVFFWILLIIASVIGIIVYRLSVFIVFSA

>ANFC_HUMAN/94-126 PF00212.12;ANP;
NARKYKGANKKGLSKGCFGLKLDRIGSMSGLGC

I need the lines containing HUMAN and the lines following it till it hits the next pattern ">".
When the third one is considered here, sed one liner picks up ' >B4DZA5_HUMAN... >ANFC_HUMAN' and not the line following ANFC_HUMAN.
I think I made it clear now.
:)
Thnx in advance for helping

sycamorex 04-12-2011 01:59 PM

Is there any common pattern in the pattern 2 lines (A4JFS8_BURVG/507-580 PF12796.1;Ank_2;)?


All times are GMT -5. The time now is 07:24 PM.