LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need sed help - how to delete all but first two occurrences of a regexp per line (https://www.linuxquestions.org/questions/linux-newbie-8/need-sed-help-how-to-delete-all-but-first-two-occurrences-of-a-regexp-per-line-4175551551/)

kmkocot 08-23-2015 07:34 PM

Need sed help - how to delete all but first two occurrences of a regexp per line
 
Hi all,

I have a file with many lines that look like this:
Code:

>3|HEMI_Tbah|example|nonsense
MTDFFEKTENQQLVILVTPAGLLQPQLEWPSNLKQKAVYFTRKTKDAVQKDNIRNVLAYGDLSYSPLEQLSALVDEVLVPLLSNPRNHEQWPHVVSQDVLR
>3|HEMI_Tbah|m.2826
ADLKLDLGVQYMKAGVKNIGTVFLMTDAQVADEKFLVLINDLLASGEIPDLFPDEEVENILAGVKNEVKGMGIQDTRENCWKFFIERVRRQLKVVLCFSPVGNTLRVRSRKFPAVVNCTCIDWFHEWPEAALMSVSQRFLEEIDLLDAELKESVAQFMSFVHQSVNEISKVYLANERRYNYTTPKSFLEQIKLYDNLLEMKKKELLQKMDRLENGLTKLQSTASQVDDLKAKLAAQEVELTQKNEDADKLIQIVGVETEKVSKEKAIADDEEKKVAVIAEEVGRKQRDCEADLAKAEPALLAAQEALNTLNKNNLTELKSFGSPPEAVVSVVASVMVLLAPNGKVPKDRSWKAGKIMM
>3|HEMI_Tbah|m.6826
TIPLFPAAVLSYDGKIMMGKVDAFLDQLINYDKENVHENSLKAIRPYLNDPNFEPDFIRNKSGAAAGLCSWVINVIRFYEVYCDVEPKRLALNQANSDLASAQDKLATIKSKITELDANLAELTAKFEAATAAKLKCQQEAESTAKTIELANRLVGGLASENVRWAEAVANFKEQEKTLPGDVLLITAFVSYSGCFIKSYRMELMDEKWLVFLKELKPPIPITENLDPLSLLTDDAAIASWNNEGLPSDRMSTENATILSNCERWPLMIDPQLQGIKWIKKKYGEDLRLVRLGQRGYLDVIERAISSGDTVLIENLEEEMD
>3|HEMI_Tbah|m.20815
AITAGEWPLDKMALQCDVTKKSKEDFSGAPREGSYVHGLYMEGARWDTQTGMLAESRLKELTPAMPVIFIKAIPVDKMETRNIYECPVYKTKDRGPTYVWTFNLKSRDKAARWILGGVALILQV
>3|HEMI_Tbah|m.20028
PILVQRHLSKLFDNMAKLKFEGEAEGEEEEIDSETKVALGMFSKEGEYCDFDNPCECTGQVEVWLNRLQDTMRSTVKFNFSEAVISYEEKPRDQWLFDYAAQVAL
>3|ECHI_Ajap|m.18262
FNPQSFLTAIMQSMARKNEWPLDKMCLQCDVTKKNKEDINSPPREGSYVHGLFMEGARWDTQTGMIADARLKELTPNMPVIFIRAIPVDKQDTRNIYQCPVYKTKQRGPTFVWTFNPKTKEKAAKWTL
>3|ONYC_Oope|cds.c68866_g1_i2|m.4812
KVTAVKIDEARELYRPAAARSSLLYFILGDLYKINPIYQFSLRAFSVVFHKAIERAEQADEVLARVNNLIDCITFSVYIYTTRGLFECDKLIFAAQMTFLILTMAKLIDPQELVIY

The lines that begin with ">" are headers. I want to replace all but the first two occurrences of the pipe symbol ("|") on the headers with underscores ("_"). I know this should be easy with sed and the right combination of 1, 2, and !, but I can't figure it out. Any assistance would be greatly appreciated.

Desired output:
Code:

>3|HEMI_Tbah|example_nonsense
MTDFFEKTENQQLVILVTPAGLLQPQLEWPSNLKQKAVYFTRKTKDAVQKDNIRNVLAYGDLSYSPLEQLSALVDEVLVPLLSNPRNHEQWPHVVSQDVLR
>3|HEMI_Tbah|m.2826
ADLKLDLGVQYMKAGVKNIGTVFLMTDAQVADEKFLVLINDLLASGEIPDLFPDEEVENILAGVKNEVKGMGIQDTRENCWKFFIERVRRQLKVVLCFSPVGNTLRVRSRKFPAVVNCTCIDWFHEWPEAALMSVSQRFLEEIDLLDAELKESVAQFMSFVHQSVNEISKVYLANERRYNYTTPKSFLEQIKLYDNLLEMKKKELLQKMDRLENGLTKLQSTASQVDDLKAKLAAQEVELTQKNEDADKLIQIVGVETEKVSKEKAIADDEEKKVAVIAEEVGRKQRDCEADLAKAEPALLAAQEALNTLNKNNLTELKSFGSPPEAVVSVVASVMVLLAPNGKVPKDRSWKAGKIMM
>3|HEMI_Tbah|m.6826
TIPLFPAAVLSYDGKIMMGKVDAFLDQLINYDKENVHENSLKAIRPYLNDPNFEPDFIRNKSGAAAGLCSWVINVIRFYEVYCDVEPKRLALNQANSDLASAQDKLATIKSKITELDANLAELTAKFEAATAAKLKCQQEAESTAKTIELANRLVGGLASENVRWAEAVANFKEQEKTLPGDVLLITAFVSYSGCFIKSYRMELMDEKWLVFLKELKPPIPITENLDPLSLLTDDAAIASWNNEGLPSDRMSTENATILSNCERWPLMIDPQLQGIKWIKKKYGEDLRLVRLGQRGYLDVIERAISSGDTVLIENLEEEMD
>3|HEMI_Tbah|m.20815
AITAGEWPLDKMALQCDVTKKSKEDFSGAPREGSYVHGLYMEGARWDTQTGMLAESRLKELTPAMPVIFIKAIPVDKMETRNIYECPVYKTKDRGPTYVWTFNLKSRDKAARWILGGVALILQV
>3|HEMI_Tbah|m.20028
PILVQRHLSKLFDNMAKLKFEGEAEGEEEEIDSETKVALGMFSKEGEYCDFDNPCECTGQVEVWLNRLQDTMRSTVKFNFSEAVISYEEKPRDQWLFDYAAQVAL
>3|ECHI_Ajap|m.18262
FNPQSFLTAIMQSMARKNEWPLDKMCLQCDVTKKNKEDINSPPREGSYVHGLFMEGARWDTQTGMIADARLKELTPNMPVIFIRAIPVDKQDTRNIYQCPVYKTKQRGPTFVWTFNPKTKEKAAKWTL
>3|ONYC_Oope|cds.c68866_g1_i2_m.4812
KVTAVKIDEARELYRPAAARSSLLYFILGDLYKINPIYQFSLRAFSVVFHKAIERAEQADEVLARVNNLIDCITFSVYIYTTRGLFECDKLIFAAQMTFLILTMAKLIDPQELVIY

Thank you!
Kevin

syg00 08-23-2015 08:10 PM

You are probably over-thinking it. See the note in the doco re the gnu sed implementation of using a number and "g" combined.

syg00 08-25-2015 03:42 AM

Have you made any progress ?.
Show us your code.


All times are GMT -5. The time now is 11:48 PM.