Grep varying no. of lines between two patterns
I have a file that goes like this
>pattern 1 xyz xyz abc asdfg >pattern 2 xyz >pattern 1 adbf sfni >pattern 2 bla bla xyz I need to grep the lines between pattern 1 and pattern 2 and not the lines following pattern 2. Cannot use grep -A(num), as there are varying number of lines following pattern 1. Also, used awk one-liners, but results are erroneous. I'll be glad if someone comes up with a good one-liner for this |
Hi and welcome to LQ.
Try using SED to accomplish the task. If you are stuck at any point, feel free to post your code. We'll be happy to assist you. |
AWK code:
Code:
BEGIN { |
Ok, since we have started giving solutions, the sed one (if I understand the problem correctly) would be as follows:
Code:
sed -n '/<pattern 1/,/<pattern 2/p' infile |
Assuming header and footer also not wanted:
Code:
awk '/>pattern 1/,/>pattern 2/{if(!/pattern/)print}' file Code:
awk '!(NR % 2)' RS=">pattern [12]\n" ORS="" file |
@sycamorex
Thanx :) Used a combination of sed n grep.. as I do not need the line containing pattern 2. |
Don't forget to mark as SOLVED once you have a solution.
|
Also, in the same file I have certain patterns that go on this way.
>pattern1 xyz zz sss dd >pattern2 ggg ddd aa >pattern1 cwefw swd >pattern1 pattern2 ggg ss aaa s >pattern2 In this case, the sed one liner wont pick up the lines following ">pattern1 pattern2".. based on sed -n '/pattern1/,/pattern2/p' file. |
Hi Tauro,
Try this out Code:
sed -n '/pattern1/,/pattern2/p;/pattern1 pattern2/,/pattern2/p' file Regards, Mayur Singru |
using Ruby
Code:
$ ruby -0777 -ne 'puts $_.scan(/pattern 1(.*?)pattern 2/m)' file |
Quote:
What about: Code:
sed -n '/pattern1/,/>pattern2/p' file |
Quote:
between '>pattern1 pattern2' or should it now display until '>pattern2' is found at the start of the line. |
As grail pointed out, it'd be helpful if you could provide us with more specific information (ideally posting the exact input file and how the output should look like)
|
@grail and sycamorex
Alright.. Here is what I specifically want. Below is 0.1% of my data. >Q53HC2_HUMAN/218-253 PF10417.3;1-cysPrx_C; ALQYVETHGEVCPANWTPDSPTIKPSPAASKEYFQK >A4JFS8_BURVG/507-580 PF12796.1;Ank_2; ACDAGDHYPLHLLVWKNDYRQLEKELQGQNVEAVDPRGRTLLHLAVSLGH LESARVLLRHKADVTKENRQGWTVLHEAVSTGDPEMVYTVLQHRDYHNTS >B4DZA5_HUMAN/287-857 PF04547.6;Anoctamin; IRKYYGEKIGIYFAWLGYYTQMLLLAAVVGVACFLYGYLNQDNCTWSKEV CHPDIGGKIIMCPQCDRLCPFWKLNITCESSKKLCIFDSFGTLVFAVFMG VWVTLFLEFWKRRQAELEYEWDTVELQQEEQARPEYEARCTHVVIDEITQ EEERIPFTAWGKCIRITLCASAVFFWILLIIASVIGIIVYRLSVFIVFSA >ANFC_HUMAN/94-126 PF00212.12;ANP; NARKYKGANKKGLSKGCFGLKLDRIGSMSGLGC I need the lines containing HUMAN and the lines following it till it hits the next pattern ">". When the third one is considered here, sed one liner picks up ' >B4DZA5_HUMAN... >ANFC_HUMAN' and not the line following ANFC_HUMAN. I think I made it clear now. :) Thnx in advance for helping |
Is there any common pattern in the pattern 2 lines (A4JFS8_BURVG/507-580 PF12796.1;Ank_2;)?
|
All times are GMT -5. The time now is 07:24 PM. |