kmkocot |
12-26-2011 04:42 PM |
Need help with sed command: if a line contains >2 colons (:) delete it and line above
Hi all,
I'm trying to put together a simple sed script but I need some help with a regexp. I want to delete/exclude each line that contains >2 colon characters (but containing exactly 2 colons is OK) AND delete the line directly above lines that contain >2 colons. The lines containing colons are full of other ASCII characters.
Example input file:
Code:
>E52LN6201CQPC2 length=145 xy=1007_0596 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTCTCCTTCTCTTCT
>E52LN6201CJ7P2 length=134 xy=0933_0904 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTTC
>E52LN6201BHNP5 length=220 xy=0494_0203 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 81 : 220 -- length 140 score 136 unit unk
AAT:TAAAAAGATTTCTAAAATGTGTAT:ATACATAAAGGTAGGCAGTGTGTGAAAG:AAAGAAAGAATGAGATGGTAGAGAA:AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG
In the example above, the last sequence contains 4 colons so I would want it deleted. The desired output would be the following:
Code:
>E52LN6201CQPC2 length=145 xy=1007_0596 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTCTCCTTCTCTTCT
>E52LN6201CJ7P2 length=134 xy=0933_0904 region=1 run=R_2008_02_25_06_58_44_tetranucleotide 29 : 128 -- length 100 score 61 unit AGAT
TTCCTACCTACCTATCCATCTATCTCTC:CTATCTATCTATCTATCCATCTATCTATCTATCTATCTATCTATCTATCCATCTATCTATCTATCTGTCTATCTATCCATCTATCTACCTATCTATCTAT:TTTTTC
Here's what I have so far for the sed command (where out.txt would be the file I want to keep):
Code:
sed -n '/????/{n;x;d;};x;1d;p; ${x;p}' in.txt > out.txt
The part in bold (????) is the part that I don't know how to specify. Any help / suggested reading would be greatly appreciated!
Thanks,
Kevin
|