LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-23-2015, 07:34 PM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 122

Rep: Reputation: 15
Question Need sed help - how to delete all but first two occurrences of a regexp per line


Hi all,

I have a file with many lines that look like this:
Code:
>3|HEMI_Tbah|example|nonsense
MTDFFEKTENQQLVILVTPAGLLQPQLEWPSNLKQKAVYFTRKTKDAVQKDNIRNVLAYGDLSYSPLEQLSALVDEVLVPLLSNPRNHEQWPHVVSQDVLR
>3|HEMI_Tbah|m.2826
ADLKLDLGVQYMKAGVKNIGTVFLMTDAQVADEKFLVLINDLLASGEIPDLFPDEEVENILAGVKNEVKGMGIQDTRENCWKFFIERVRRQLKVVLCFSPVGNTLRVRSRKFPAVVNCTCIDWFHEWPEAALMSVSQRFLEEIDLLDAELKESVAQFMSFVHQSVNEISKVYLANERRYNYTTPKSFLEQIKLYDNLLEMKKKELLQKMDRLENGLTKLQSTASQVDDLKAKLAAQEVELTQKNEDADKLIQIVGVETEKVSKEKAIADDEEKKVAVIAEEVGRKQRDCEADLAKAEPALLAAQEALNTLNKNNLTELKSFGSPPEAVVSVVASVMVLLAPNGKVPKDRSWKAGKIMM
>3|HEMI_Tbah|m.6826
TIPLFPAAVLSYDGKIMMGKVDAFLDQLINYDKENVHENSLKAIRPYLNDPNFEPDFIRNKSGAAAGLCSWVINVIRFYEVYCDVEPKRLALNQANSDLASAQDKLATIKSKITELDANLAELTAKFEAATAAKLKCQQEAESTAKTIELANRLVGGLASENVRWAEAVANFKEQEKTLPGDVLLITAFVSYSGCFIKSYRMELMDEKWLVFLKELKPPIPITENLDPLSLLTDDAAIASWNNEGLPSDRMSTENATILSNCERWPLMIDPQLQGIKWIKKKYGEDLRLVRLGQRGYLDVIERAISSGDTVLIENLEEEMD
>3|HEMI_Tbah|m.20815
AITAGEWPLDKMALQCDVTKKSKEDFSGAPREGSYVHGLYMEGARWDTQTGMLAESRLKELTPAMPVIFIKAIPVDKMETRNIYECPVYKTKDRGPTYVWTFNLKSRDKAARWILGGVALILQV
>3|HEMI_Tbah|m.20028
PILVQRHLSKLFDNMAKLKFEGEAEGEEEEIDSETKVALGMFSKEGEYCDFDNPCECTGQVEVWLNRLQDTMRSTVKFNFSEAVISYEEKPRDQWLFDYAAQVAL
>3|ECHI_Ajap|m.18262
FNPQSFLTAIMQSMARKNEWPLDKMCLQCDVTKKNKEDINSPPREGSYVHGLFMEGARWDTQTGMIADARLKELTPNMPVIFIRAIPVDKQDTRNIYQCPVYKTKQRGPTFVWTFNPKTKEKAAKWTL
>3|ONYC_Oope|cds.c68866_g1_i2|m.4812
KVTAVKIDEARELYRPAAARSSLLYFILGDLYKINPIYQFSLRAFSVVFHKAIERAEQADEVLARVNNLIDCITFSVYIYTTRGLFECDKLIFAAQMTFLILTMAKLIDPQELVIY
The lines that begin with ">" are headers. I want to replace all but the first two occurrences of the pipe symbol ("|") on the headers with underscores ("_"). I know this should be easy with sed and the right combination of 1, 2, and !, but I can't figure it out. Any assistance would be greatly appreciated.

Desired output:
Code:
>3|HEMI_Tbah|example_nonsense
MTDFFEKTENQQLVILVTPAGLLQPQLEWPSNLKQKAVYFTRKTKDAVQKDNIRNVLAYGDLSYSPLEQLSALVDEVLVPLLSNPRNHEQWPHVVSQDVLR
>3|HEMI_Tbah|m.2826
ADLKLDLGVQYMKAGVKNIGTVFLMTDAQVADEKFLVLINDLLASGEIPDLFPDEEVENILAGVKNEVKGMGIQDTRENCWKFFIERVRRQLKVVLCFSPVGNTLRVRSRKFPAVVNCTCIDWFHEWPEAALMSVSQRFLEEIDLLDAELKESVAQFMSFVHQSVNEISKVYLANERRYNYTTPKSFLEQIKLYDNLLEMKKKELLQKMDRLENGLTKLQSTASQVDDLKAKLAAQEVELTQKNEDADKLIQIVGVETEKVSKEKAIADDEEKKVAVIAEEVGRKQRDCEADLAKAEPALLAAQEALNTLNKNNLTELKSFGSPPEAVVSVVASVMVLLAPNGKVPKDRSWKAGKIMM
>3|HEMI_Tbah|m.6826
TIPLFPAAVLSYDGKIMMGKVDAFLDQLINYDKENVHENSLKAIRPYLNDPNFEPDFIRNKSGAAAGLCSWVINVIRFYEVYCDVEPKRLALNQANSDLASAQDKLATIKSKITELDANLAELTAKFEAATAAKLKCQQEAESTAKTIELANRLVGGLASENVRWAEAVANFKEQEKTLPGDVLLITAFVSYSGCFIKSYRMELMDEKWLVFLKELKPPIPITENLDPLSLLTDDAAIASWNNEGLPSDRMSTENATILSNCERWPLMIDPQLQGIKWIKKKYGEDLRLVRLGQRGYLDVIERAISSGDTVLIENLEEEMD
>3|HEMI_Tbah|m.20815
AITAGEWPLDKMALQCDVTKKSKEDFSGAPREGSYVHGLYMEGARWDTQTGMLAESRLKELTPAMPVIFIKAIPVDKMETRNIYECPVYKTKDRGPTYVWTFNLKSRDKAARWILGGVALILQV
>3|HEMI_Tbah|m.20028
PILVQRHLSKLFDNMAKLKFEGEAEGEEEEIDSETKVALGMFSKEGEYCDFDNPCECTGQVEVWLNRLQDTMRSTVKFNFSEAVISYEEKPRDQWLFDYAAQVAL
>3|ECHI_Ajap|m.18262
FNPQSFLTAIMQSMARKNEWPLDKMCLQCDVTKKNKEDINSPPREGSYVHGLFMEGARWDTQTGMIADARLKELTPNMPVIFIRAIPVDKQDTRNIYQCPVYKTKQRGPTFVWTFNPKTKEKAAKWTL
>3|ONYC_Oope|cds.c68866_g1_i2_m.4812
KVTAVKIDEARELYRPAAARSSLLYFILGDLYKINPIYQFSLRAFSVVFHKAIERAEQADEVLARVNNLIDCITFSVYIYTTRGLFECDKLIFAAQMTFLILTMAKLIDPQELVIY
Thank you!
Kevin
 
Old 08-23-2015, 08:10 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,535

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
You are probably over-thinking it. See the note in the doco re the gnu sed implementation of using a number and "g" combined.
 
Old 08-25-2015, 03:42 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,535

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
Have you made any progress ?.
Show us your code.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with sed command: if a line contains >2 colons (:) delete it and line above kmkocot Linux - Newbie 1 12-27-2011 08:51 AM
deleting a line matching two or more regexp in bash, sed maybe? patolfo Programming 21 05-21-2010 12:30 PM
[SOLVED] Need sed help: s/ command won't replace two occurrences of pattern on same line GrapefruiTgirl Programming 7 12-16-2009 02:08 AM
sed delete lines from file one if regexp are listed in file two fucinheira Programming 6 09-17-2009 08:28 AM
SED - Delete line above or below as well as matching line... OldGaf Programming 7 06-26-2008 11:51 PM


All times are GMT -5. The time now is 01:42 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration