LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need help with sed command: if a line contains >2 colons (:) delete it and line above (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-with-sed-command-if-a-line-contains-2-colons-delete-it-and-line-above-920612/)

kmkocot 12-26-2011 04:42 PM

Need help with sed command: if a line contains >2 colons (:) delete it and line above
 
Hi all,

I'm trying to put together a simple sed script but I need some help with a regexp. I want to delete/exclude each line that contains >2 colon characters (but containing exactly 2 colons is OK) AND delete the line directly above lines that contain >2 colons. The lines containing colons are full of other ASCII characters.

Example input file:
Code:

>E52LN6201CQPC2 length=145 xy=1007_0596 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTCTCCTTCTCTTCT
>E52LN6201CJ7P2 length=134 xy=0933_0904 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTTC
>E52LN6201BHNP5 length=220 xy=0494_0203 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 81 : 220 -- length 140 score 136 unit unk
AAT:TAAAAAGATTTCTAAAATGTGTAT:ATACATAAAGGTAGGCAGTGTGTGAAAG:AAAGAAAGAATGAGATGGTAGAGAA:AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG

In the example above, the last sequence contains 4 colons so I would want it deleted. The desired output would be the following:
Code:

>E52LN6201CQPC2 length=145 xy=1007_0596 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTCTCCTTCTCTTCT
>E52LN6201CJ7P2 length=134 xy=0933_0904 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTTC

Here's what I have so far for the sed command (where out.txt would be the file I want to keep):
Code:

sed -n '/????/{n;x;d;};x;1d;p; ${x;p}' in.txt > out.txt
The part in bold (????) is the part that I don't know how to specify. Any help / suggested reading would be greatly appreciated!

Thanks,
Kevin

crts 12-27-2011 08:51 AM

Hi,

try this:
Code:

sed -r '/^>/{h;d};/([^:]+:){3}/d;x;p;x'
Awk is also interesting for this kind of task:
Code:

awk -F ":" '{if (/^>/){a=$0;next;}if (NF<4){print a;print}}


All times are GMT -5. The time now is 09:00 PM.