LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 09-01-2009, 12:54 PM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 112

Rep: Reputation: 15
Question How to grep lines containing a certain string PLUS the line following that line?


Hi all,

I have a dataset (see example below) that I would like to go through and copy all lines containing a certain string ("LGIG") plus the line immediately following that line to a new file. I have no problem grepping lines containing the string LGIG but I'm lost how to translate that to line number and shift up one line number for each instance of that string.

Thanks!
Kevin


Example input file:
>14219|CAP|227704
MEFHLPLNRDDLRQRSALNHYVVEEIYPIRIIPSKLHEFNAALRAQGAKCILSHFDVLFSVLLHGANLQPELRSQAWDYLMKVVLKLNTPLVEALEGDTI SSDNCDLLNILKMSVYLLCQFIETFEGEVSKPTVVGATGKGRQKKAQKQAFAELAHLDWDDERERAVRALLQLLQLHLQQLWSPPVVEEDFVNLVTCCCY KMLENQDVVKNKTTRDTIFQVLGVLVKKYNHALGCSLKFIQLLQHFEHLASPLALAVSVFATDYGIKSVAAEIMREIGNMDSKDLARDTSATRAYATFLV ELAEKIPCVMLPSISVLLCLLDGESYSMRNSVLGVLGEMVIRVLSKEELDAKQKCTRDQFLDKLEDHLHDVHAFTRSKTLQIWLAIVNEKALPLPRQHQL LDLVIGRLQDKSSSVRKCALQLLTAVLRMNPFAAKLPLEELKEGYEKESAKLREMQPEEPQLTAADIAKQELEKDWMRMHRGIKKAIALREEKEEDEEEE KGSDEEKVISDEDTVESVKSQIIAALKDGDYLRCVLLFTAAREEWPTDPAFVCSIPMQLDDEEDEGERKNQEVIKCIRNLFFDEVLHSNAFKEALQPQDL DSSRRSSHDTSLVNELTKQQVLVQYLKDSMNFVGQVQQSVPIVCQLLGSKNISDVLESVNFFVTGFKFGVSNSMTGIRRMLVLVWSKEQGVRDAVVEAYR NLYLNLEENNPRLRALAVVNNLTALSLGASLGDLTSLEELVCEFVRSDDLDTHVVQLLWERFAMKIPNTSAAESRAALVLLGMAAGAKVDIVRSNVDVLV KEGLGPRGELDLMLVRDTCSALCKLVPKNKDKTGVAKEPFRFPQDHEAFKRLEFILQNSVSHLESRFWVPMAEQAVNVIYSLAEHPDAITASIIKNVAKQ VVLLERFPLRRETEGEGPLCSPESPTKGVKCPTAYVTRLLSLVGHTALRQLIHLDVSTFGEMKRRHQMQEGKGAKQGTSASARNRSKNNGSSSDDEEDDM GIGGAIAEDAEAEFILKVTEKEVVGGEGLLAALQPLLVGICSNQSKYPDPELQAAASLALAKYMLVSSEFCESHLQLLFTILERSPHAVIRANTIIAMGD LTFRFPNLIEPWTPNLYARLRDPSPQVRKNTLMVLTHLILNDMVKVKGQISELASCIVDDDTRITGLAKLFFHELSRKGNAIYNIMPDMVSRLSDTEVGV DEGNFRVIMKQVECNSEIQKFILCFFCYRYLFSFIQKDRQCESLVEKLCHRFRVTRVERQWRDLAFCLSMLSYSDKSIRKLQENVGCFSDKMADEDVYNS FVTIMNNAKKFAKPETKSLVEEFEQKLEQYHSKGVEDADAEDKAAKSKRSPGRRKAKGRTPGTSRARRQRRSGGDDERDFLGNDSHPTEAGDKPRPKPAI TFDSDDSDIELFKVQEDKAPTQTSQLPSDLENSDPNVSFESPGLRRLKRRHKSVNNKNIVASSSQSSSRTTRARHRNQ
>14219|LGIG|61640
MSFEFTIPINLDCLLSKTNVSQYVVEEVLPLRIIPGAVQDFKFAVRNDNFAVLNHFDTAYSLLSLQKDFEDAIKEEMWDILLQVGQCVTNEIAHGLEDPE LTPDFKLQLLNTLKMTCYLICQFIDMFEVEDTKPGIQINGRGKSKKTTVNKSGRDWEKEKKRGVQTLLNIIQPNLNRLWDPPIAEEEFVNLVSNCCYRLL ENPAIVRTKDIRDVISQLLGVLIKKYNHSLGASLKIMQLLQHFEHLVVPIVQILEIFVNQYQDKSIISELMREIGRLDSRDIAKDTSGTRILAQFLVELS HQLPAAMLPTISVLIGHLDGESYTLRNGVLSVIGEMLVKVLSQENLEDKLKSTRECFLDKLEDHIHDVNAFVRSRVLQIWLHIVNEKCLPLPRQENLVNL ILGRLQDKSSQVRKYAIQLMIALMKNNPFASKLPVEELQASYEKEKEKLKEMTPEETDISDLEECWEAVEKKLRDHVFNHDEDEAETPEEEPSTCTENTQ EELNAEKLCLYKILIMVFREKDLNSEEELEEESERELEIENEGIISCLKDIYLVQKKSEVLSSDPASQPDTSVLNDITKQQVLVQYLKDSTTFADHIQQA VPIICQLLGSKTTSDVFEAIEFFVTGFEFGVTATMLGIRRMLVLIWSSEETIKESVVNAYKRLYLNPNAGGNQRTVALAIVKNLSALLQGASLGDITSFD ALIQQFVKSDDIGQTVIQVLWEKFIPKIPNTTIEESRAALLLLSMIAGAKPEIVKSNIDVLVNEGLGMRAETDMLLAQITCRTLLKLAAGKKTKGEVAAE PFKFSESHEMFQRLSFLLVHCKNTETKVWVPFAEQAINVIYKLSEQPDIIAGEIIKNIAKEVVKSYQSIDIDQPISTVSTLVLTRLLSVSGHVALRHLVH LDSNVFGEMKRRRAIQEEKKEKEQANKKSTRISENIEDELGLAGAAAEDAEAEYIRRICETDIVTGENLLSTLHPLIVAVCTDSTKYPDTQLRTAATLAL AKFTMVSSEFCDAHLQLLFTILEKSPNPAIRANTIIALGDLSFRFPNLIEPWTPHLYGRLRDESAQVRKNTLQVLTHLILNDMVKVKGQISEL
>14237|CAP|174526
MEGPLAVNHHSSSSSSSQEPAPPPAPSLAPAPLAIPQQQVNAASLCRIGQELVHDIVLKASEIFVLLKNMQLPNGATTSQPQNYPETKAKLDDNMKQMLM NFKKLKILYIKVHEHTANLDSRPIEEMLPIEVNGEVKGEGPKLTDEVKYASEEHKEVISQLRNRNRELKQIIDKLRLTVWEINTMLATRKS
>14237|LGIG|86853
PPAGPQQPMVSPNKIVNAATFCRFGQEYIHEIITKATEIFGSRGSFWKCSKFLFQLTNYQDRKTKLEEELKTLCVTFKKLRAIYDKVRETFGGEVETLPV SEVYPYVLEEHRELVETFAHRPRLEPTTTVNMVFFFIFQQVRVKNQQLKEIIDQIRSINWEINTMITMR
>14237|HROB|164180
MSLIVKFENQFHETGVTDVFTEPVDIIPWKFFLLHLFSVQLDSIFDEYLCARINVRDLNLMVIYRSPNSSNDNNDNLIGVLNEFGDLSGRHLVLGYFNFP DIDWKLRVCAGSDYKIENRFLQLIDDRFWLQHVNKPTRYGTNSCPHVLDLIITSEDCVSDLQYSSPLGRKPKKMMSNSQSITNMSQAASSAPPGQTPNQS SSIAINFESPSKKLTPVAMCKYGQELVQEIMHKSYDIFGQLKSIQLVPNERKSRLEDSLHKIEIAFQRLHYVYDSVCNMTKYLEHKPIESYLPMKDKCQD FTTQNIRSDEESKLTEQIQAKNIELKELIDLMRTILWDIDTMMALRKT
>14237|NVEC|200470
MSPLTYTFIAGSSMDSWPSLQSSVMSALKEEEKHLPQPKPPAKPVENPITLTIQGQRTVQEIADKAIELFRKLQNARLSTDTTSRQSQAQTKDIRDQLNT LKGLLAKLRNVYNDTKRAVTIPAGENVENLIPLECSPRNEMDTSDGGGPVEQERAELQKKLKEKNDQLKEVIDKLRIAVWDINTMIMLKPS
>14286|LGIG|234779
MYIASFVLKMVSNRFLVKVAIGGAIFTLTSISGMKIYIENKFQRQDFYLKSMDLLRNYEPAEERLGKPIIDRTINIGELHHNYTDGINARLRIPLKGSIT NGALYVLASRETPKESWHIDKLDLEIPSHFQRWTFYYDPRDSSKVKITGGSDEADETSIIDNNAVQDTQAS

Desired output file for this example:
>14219|LGIG|61640
MSFEFTIPINLDCLLSKTNVSQYVVEEVLPLRIIPGAVQDFKFAVRNDNFAVLNHFDTAYSLLSLQKDFEDAIKEEMWDILLQVGQCVTNEIAHGLEDPE LTPDFKLQLLNTLKMTCYLICQFIDMFEVEDTKPGIQINGRGKSKKTTVNKSGRDWEKEKKRGVQTLLNIIQPNLNRLWDPPIAEEEFVNLVSNCCYRLL ENPAIVRTKDIRDVISQLLGVLIKKYNHSLGASLKIMQLLQHFEHLVVPIVQILEIFVNQYQDKSIISELMREIGRLDSRDIAKDTSGTRILAQFLVELS HQLPAAMLPTISVLIGHLDGESYTLRNGVLSVIGEMLVKVLSQENLEDKLKSTRECFLDKLEDHIHDVNAFVRSRVLQIWLHIVNEKCLPLPRQENLVNL ILGRLQDKSSQVRKYAIQLMIALMKNNPFASKLPVEELQASYEKEKEKLKEMTPEETDISDLEECWEAVEKKLRDHVFNHDEDEAETPEEEPSTCTENTQ EELNAEKLCLYKILIMVFREKDLNSEEELEEESERELEIENEGIISCLKDIYLVQKKSEVLSSDPASQPDTSVLNDITKQQVLVQYLKDSTTFADHIQQA VPIICQLLGSKTTSDVFEAIEFFVTGFEFGVTATMLGIRRMLVLIWSSEETIKESVVNAYKRLYLNPNAGGNQRTVALAIVKNLSALLQGASLGDITSFD ALIQQFVKSDDIGQTVIQVLWEKFIPKIPNTTIEESRAALLLLSMIAGAKPEIVKSNIDVLVNEGLGMRAETDMLLAQITCRTLLKLAAGKKTKGEVAAE PFKFSESHEMFQRLSFLLVHCKNTETKVWVPFAEQAINVIYKLSEQPDIIAGEIIKNIAKEVVKSYQSIDIDQPISTVSTLVLTRLLSVSGHVALRHLVH LDSNVFGEMKRRRAIQEEKKEKEQANKKSTRISENIEDELGLAGAAAEDAEAEYIRRICETDIVTGENLLSTLHPLIVAVCTDSTKYPDTQLRTAATLAL AKFTMVSSEFCDAHLQLLFTILEKSPNPAIRANTIIALGDLSFRFPNLIEPWTPHLYGRLRDESAQVRKNTLQVLTHLILNDMVKVKGQISEL
>14237|LGIG|86853
PPAGPQQPMVSPNKIVNAATFCRFGQEYIHEIITKATEIFGSRGSFWKCSKFLFQLTNYQDRKTKLEEELKTLCVTFKKLRAIYDKVRETFGGEVETLPV SEVYPYVLEEHRELVETFAHRPRLEPTTTVNMVFFFIFQQVRVKNQQLKEIIDQIRSINWEINTMITMR
>14286|LGIG|234779
MYIASFVLKMVSNRFLVKVAIGGAIFTLTSISGMKIYIENKFQRQDFYLKSMDLLRNYEPAEERLGKPIIDRTINIGELHHNYTDGINARLRIPLKGSIT NGALYVLASRETPKESWHIDKLDLEIPSHFQRWTFYYDPRDSSKVKITGGSDEADETSIIDNNAVQDTQAS
 
Old 09-01-2009, 01:17 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Try this:
Code:
sed -n '/LGIG/,+1p' file.txt >newfile.txt
The -n option quiets all unwanted output. The expression uses /text/ to match any line with "LGIG" in it and +1 to add the following line. Finally 'p' is added to print the matched lines.

Edit: P.S. You should use [code][/code] tags around your code and data to protect the formatting, both of it and the page it's posted on.

Last edited by David the H.; 09-01-2009 at 01:29 PM.
 
Old 09-01-2009, 01:59 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,004
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
grep -A1 pattern file > newfile
 
Old 09-01-2009, 02:33 PM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Quote:
Originally Posted by Tinkster View Post
grep -A1 pattern file > newfile
I thought about that, but grep puts a "--" group separator between blocks of matches, and I'm not aware of any way to change or turn it off either. I doubt he'd want any of those in his file.

Of course, you could always run it through grep again to filter them out.
 
Old 09-01-2009, 04:30 PM   #5
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,004
Blog Entries: 11

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Well spotted; I hadn't thought of that, and my sample data only had one
match for the search =}

Your sed solution is cleaner & faster
 
Old 09-01-2009, 04:54 PM   #6
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 112

Original Poster
Rep: Reputation: 15
You guys are great!

Just a note, I had to change the search term to LGIG| so that it didn't also count lines that had amino acid sequences LGIG.

Thanks,
Kevin
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ text file line by line/each line to string/array Dimitris Programming 15 03-11-2008 09:22 AM
How to identify a line and replace another string on that line using Shell script? Sid2007 Programming 10 10-01-2007 09:49 PM
shell script find a line and the next line (grep?) metalx1000 Programming 5 07-24-2007 09:41 PM
Parsing a string line-by-line in PHP enigma_0Z Programming 3 04-21-2006 09:07 AM


All times are GMT -5. The time now is 06:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration