Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to
LinuxQuestions.org , a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free.
Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please
contact us . If you need to reset your password,
click here .
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
09-01-2009, 11:54 AM
#1
Member
Registered: Dec 2007
Posts: 79
Rep:
How to grep lines containing a certain string PLUS the line following that line?
Hi all,
I have a dataset (see example below) that I would like to go through and copy all lines containing a certain string ("LGIG") plus the line immediately following that line to a new file. I have no problem grepping lines containing the string LGIG but I'm lost how to translate that to line number and shift up one line number for each instance of that string.
Thanks!
Kevin
Example input file:
>14219|CAP|227704
MEFHLPLNRDDLRQRSALNHYVVEEIYPIRIIPSKLHEFNAALRAQGAKCILSHFDVLFSVLLHGANLQPELRSQAWDYLMKVVLKLNTPLVEALEGDTI SSDNCDLLNILKMSVYLLCQFIETFEGEVSKPTVVGATGKGRQKKAQKQAFAELAHLDWDDERERAVRALLQLLQLHLQQLWSPPVVEEDFVNLVTCCCY KMLENQDVVKNKTTRDTIFQVLGVLVKKYNHALGCSLKFIQLLQHFEHLASPLALAVSVFATDYGIKSVAAEIMREIGNMDSKDLARDTSATRAYATFLV ELAEKIPCVMLPSISVLLCLLDGESYSMRNSVLGVLGEMVIRVLSKEELDAKQKCTRDQFLDKLEDHLHDVHAFTRSKTLQIWLAIVNEKALPLPRQHQL LDLVIGRLQDKSSSVRKCALQLLTAVLRMNPFAAKLPLEELKEGYEKESAKLREMQPEEPQLTAADIAKQELEKDWMRMHRGIKKAIALREEKEEDEEEE KGSDEEKVISDEDTVESVKSQIIAALKDGDYLRCVLLFTAAREEWPTDPAFVCSIPMQLDDEEDEGERKNQEVIKCIRNLFFDEVLHSNAFKEALQPQDL DSSRRSSHDTSLVNELTKQQVLVQYLKDSMNFVGQVQQSVPIVCQLLGSKNISDVLESVNFFVTGFKFGVSNSMTGIRRMLVLVWSKEQGVRDAVVEAYR NLYLNLEENNPRLRALAVVNNLTALSLGASLGDLTSLEELVCEFVRSDDLDTHVVQLLWERFAMKIPNTSAAESRAALVLLGMAAGAKVDIVRSNVDVLV KEGLGPRGELDLMLVRDTCSALCKLVPKNKDKTGVAKEPFRFPQDHEAFKRLEFILQNSVSHLESRFWVPMAEQAVNVIYSLAEHPDAITASIIKNVAKQ VVLLERFPLRRETEGEGPLCSPESPTKGVKCPTAYVTRLLSLVGHTALRQLIHLDVSTFGEMKRRHQMQEGKGAKQGTSASARNRSKNNGSSSDDEEDDM GIGGAIAEDAEAEFILKVTEKEVVGGEGLLAALQPLLVGICSNQSKYPDPELQAAASLALAKYMLVSSEFCESHLQLLFTILERSPHAVIRANTIIAMGD LTFRFPNLIEPWTPNLYARLRDPSPQVRKNTLMVLTHLILNDMVKVKGQISELASCIVDDDTRITGLAKLFFHELSRKGNAIYNIMPDMVSRLSDTEVGV DEGNFRVIMKQVECNSEIQKFILCFFCYRYLFSFIQKDRQCESLVEKLCHRFRVTRVERQWRDLAFCLSMLSYSDKSIRKLQENVGCFSDKMADEDVYNS FVTIMNNAKKFAKPETKSLVEEFEQKLEQYHSKGVEDADAEDKAAKSKRSPGRRKAKGRTPGTSRARRQRRSGGDDERDFLGNDSHPTEAGDKPRPKPAI TFDSDDSDIELFKVQEDKAPTQTSQLPSDLENSDPNVSFESPGLRRLKRRHKSVNNKNIVASSSQSSSRTTRARHRNQ
>14219|LGIG|61640
MSFEFTIPINLDCLLSKTNVSQYVVEEVLPLRIIPGAVQDFKFAVRNDNFAVLNHFDTAYSLLSLQKDFEDAIKEEMWDILLQVGQCVTNEIAHGLEDPE LTPDFKLQLLNTLKMTCYLICQFIDMFEVEDTKPGIQINGRGKSKKTTVNKSGRDWEKEKKRGVQTLLNIIQPNLNRLWDPPIAEEEFVNLVSNCCYRLL ENPAIVRTKDIRDVISQLLGVLIKKYNHSLGASLKIMQLLQHFEHLVVPIVQILEIFVNQYQDKSIISELMREIGRLDSRDIAKDTSGTRILAQFLVELS HQLPAAMLPTISVLIGHLDGESYTLRNGVLSVIGEMLVKVLSQENLEDKLKSTRECFLDKLEDHIHDVNAFVRSRVLQIWLHIVNEKCLPLPRQENLVNL ILGRLQDKSSQVRKYAIQLMIALMKNNPFASKLPVEELQASYEKEKEKLKEMTPEETDISDLEECWEAVEKKLRDHVFNHDEDEAETPEEEPSTCTENTQ EELNAEKLCLYKILIMVFREKDLNSEEELEEESERELEIENEGIISCLKDIYLVQKKSEVLSSDPASQPDTSVLNDITKQQVLVQYLKDSTTFADHIQQA VPIICQLLGSKTTSDVFEAIEFFVTGFEFGVTATMLGIRRMLVLIWSSEETIKESVVNAYKRLYLNPNAGGNQRTVALAIVKNLSALLQGASLGDITSFD ALIQQFVKSDDIGQTVIQVLWEKFIPKIPNTTIEESRAALLLLSMIAGAKPEIVKSNIDVLVNEGLGMRAETDMLLAQITCRTLLKLAAGKKTKGEVAAE PFKFSESHEMFQRLSFLLVHCKNTETKVWVPFAEQAINVIYKLSEQPDIIAGEIIKNIAKEVVKSYQSIDIDQPISTVSTLVLTRLLSVSGHVALRHLVH LDSNVFGEMKRRRAIQEEKKEKEQANKKSTRISENIEDELGLAGAAAEDAEAEYIRRICETDIVTGENLLSTLHPLIVAVCTDSTKYPDTQLRTAATLAL AKFTMVSSEFCDAHLQLLFTILEKSPNPAIRANTIIALGDLSFRFPNLIEPWTPHLYGRLRDESAQVRKNTLQVLTHLILNDMVKVKGQISEL
>14237|CAP|174526
MEGPLAVNHHSSSSSSSQEPAPPPAPSLAPAPLAIPQQQVNAASLCRIGQELVHDIVLKASEIFVLLKNMQLPNGATTSQPQNYPETKAKLDDNMKQMLM NFKKLKILYIKVHEHTANLDSRPIEEMLPIEVNGEVKGEGPKLTDEVKYASEEHKEVISQLRNRNRELKQIIDKLRLTVWEINTMLATRKS
>14237|LGIG|86853
PPAGPQQPMVSPNKIVNAATFCRFGQEYIHEIITKATEIFGSRGSFWKCSKFLFQLTNYQDRKTKLEEELKTLCVTFKKLRAIYDKVRETFGGEVETLPV SEVYPYVLEEHRELVETFAHRPRLEPTTTVNMVFFFIFQQVRVKNQQLKEIIDQIRSINWEINTMITMR
>14237|HROB|164180
MSLIVKFENQFHETGVTDVFTEPVDIIPWKFFLLHLFSVQLDSIFDEYLCARINVRDLNLMVIYRSPNSSNDNNDNLIGVLNEFGDLSGRHLVLGYFNFP DIDWKLRVCAGSDYKIENRFLQLIDDRFWLQHVNKPTRYGTNSCPHVLDLIITSEDCVSDLQYSSPLGRKPKKMMSNSQSITNMSQAASSAPPGQTPNQS SSIAINFESPSKKLTPVAMCKYGQELVQEIMHKSYDIFGQLKSIQLVPNERKSRLEDSLHKIEIAFQRLHYVYDSVCNMTKYLEHKPIESYLPMKDKCQD FTTQNIRSDEESKLTEQIQAKNIELKELIDLMRTILWDIDTMMALRKT
>14237|NVEC|200470
MSPLTYTFIAGSSMDSWPSLQSSVMSALKEEEKHLPQPKPPAKPVENPITLTIQGQRTVQEIADKAIELFRKLQNARLSTDTTSRQSQAQTKDIRDQLNT LKGLLAKLRNVYNDTKRAVTIPAGENVENLIPLECSPRNEMDTSDGGGPVEQERAELQKKLKEKNDQLKEVIDKLRIAVWDINTMIMLKPS
>14286|LGIG|234779
MYIASFVLKMVSNRFLVKVAIGGAIFTLTSISGMKIYIENKFQRQDFYLKSMDLLRNYEPAEERLGKPIIDRTINIGELHHNYTDGINARLRIPLKGSIT NGALYVLASRETPKESWHIDKLDLEIPSHFQRWTFYYDPRDSSKVKITGGSDEADETSIIDNNAVQDTQAS
Desired output file for this example:
>14219|LGIG|61640
MSFEFTIPINLDCLLSKTNVSQYVVEEVLPLRIIPGAVQDFKFAVRNDNFAVLNHFDTAYSLLSLQKDFEDAIKEEMWDILLQVGQCVTNEIAHGLEDPE LTPDFKLQLLNTLKMTCYLICQFIDMFEVEDTKPGIQINGRGKSKKTTVNKSGRDWEKEKKRGVQTLLNIIQPNLNRLWDPPIAEEEFVNLVSNCCYRLL ENPAIVRTKDIRDVISQLLGVLIKKYNHSLGASLKIMQLLQHFEHLVVPIVQILEIFVNQYQDKSIISELMREIGRLDSRDIAKDTSGTRILAQFLVELS HQLPAAMLPTISVLIGHLDGESYTLRNGVLSVIGEMLVKVLSQENLEDKLKSTRECFLDKLEDHIHDVNAFVRSRVLQIWLHIVNEKCLPLPRQENLVNL ILGRLQDKSSQVRKYAIQLMIALMKNNPFASKLPVEELQASYEKEKEKLKEMTPEETDISDLEECWEAVEKKLRDHVFNHDEDEAETPEEEPSTCTENTQ EELNAEKLCLYKILIMVFREKDLNSEEELEEESERELEIENEGIISCLKDIYLVQKKSEVLSSDPASQPDTSVLNDITKQQVLVQYLKDSTTFADHIQQA VPIICQLLGSKTTSDVFEAIEFFVTGFEFGVTATMLGIRRMLVLIWSSEETIKESVVNAYKRLYLNPNAGGNQRTVALAIVKNLSALLQGASLGDITSFD ALIQQFVKSDDIGQTVIQVLWEKFIPKIPNTTIEESRAALLLLSMIAGAKPEIVKSNIDVLVNEGLGMRAETDMLLAQITCRTLLKLAAGKKTKGEVAAE PFKFSESHEMFQRLSFLLVHCKNTETKVWVPFAEQAINVIYKLSEQPDIIAGEIIKNIAKEVVKSYQSIDIDQPISTVSTLVLTRLLSVSGHVALRHLVH LDSNVFGEMKRRRAIQEEKKEKEQANKKSTRISENIEDELGLAGAAAEDAEAEYIRRICETDIVTGENLLSTLHPLIVAVCTDSTKYPDTQLRTAATLAL AKFTMVSSEFCDAHLQLLFTILEKSPNPAIRANTIIALGDLSFRFPNLIEPWTPHLYGRLRDESAQVRKNTLQVLTHLILNDMVKVKGQISEL
>14237|LGIG|86853
PPAGPQQPMVSPNKIVNAATFCRFGQEYIHEIITKATEIFGSRGSFWKCSKFLFQLTNYQDRKTKLEEELKTLCVTFKKLRAIYDKVRETFGGEVETLPV SEVYPYVLEEHRELVETFAHRPRLEPTTTVNMVFFFIFQQVRVKNQQLKEIIDQIRSINWEINTMITMR
>14286|LGIG|234779
MYIASFVLKMVSNRFLVKVAIGGAIFTLTSISGMKIYIENKFQRQDFYLKSMDLLRNYEPAEERLGKPIIDRTINIGELHHNYTDGINARLRIPLKGSIT NGALYVLASRETPKESWHIDKLDLEIPSHFQRWTFYYDPRDSSKVKITGGSDEADETSIIDNNAVQDTQAS
09-01-2009, 12:17 PM
#2
Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 5,314
Try this:
Code:
sed -n '/LGIG/,+1p' file.txt >newfile.txt
The -n option quiets all unwanted output. The expression uses /text/ to match any line with "LGIG" in it and +1 to add the following line. Finally 'p' is added to print the matched lines.
Edit: P.S. You should use [code][/code] tags around your code and data to protect the formatting, both of it and the page it's posted on.
Last edited by David the H.; 09-01-2009 at 12:29 PM .
09-01-2009, 12:59 PM
#3
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,618
grep -A1 pattern file > newfile
09-01-2009, 01:33 PM
#4
Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 5,314
Quote:
Originally Posted by
Tinkster
grep -A1 pattern file > newfile
I thought about that, but grep puts a "--" group separator between blocks of matches, and I'm not aware of any way to change or turn it off either. I doubt he'd want any of those in his file.
Of course, you could always run it through grep again to filter them out.
09-01-2009, 03:30 PM
#5
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,618
Well spotted; I hadn't thought of that, and my sample data only had one
match for the search =}
Your sed solution is cleaner & faster
09-01-2009, 03:54 PM
#6
Member
Registered: Dec 2007
Posts: 79
Original Poster
Rep:
You guys are great!
Just a note, I had to change the search term to LGIG| so that it didn't also count lines that had amino acid sequences LGIG.
Thanks,
Kevin
Thread Tools
Search this Thread
Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
All times are GMT -5. The time now is 11:39 PM .
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know .
Latest Threads
LQ News