LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-29-2011, 04:52 PM   #1
Tauro
LQ Newbie
 
Registered: Apr 2011
Posts: 24

Rep: Reputation: 1
Awk varying patterns to different file


File1
Code:
>ENSP00000202967 pep:known gene:ENSG00000089163 
MKMSFALTFRSAKGRWIANPSQPCSKASIGLFVPASPPLDPEKVKELQRFITLSKRLLVM
TGAGISTESGIPDYRSEKVGLYARTDRRPIQHGDFVRSAPIRQRYWARNFVGWPQFSSHQ
PNPAHWALSTWEKLV

>ENSP00000282074 pep:known  gene:ENSG00000152253 
MVEDELALFDKSINEFWNKFKSTDTSCQMAGLRDTYKDSIKAFAEKLSVKLKEEERMVEM

>ENSP00000397517 pep:known gene:ENSG00000100429 
MGTALVYHEDMTATRLLWDDPECEIERPERLTAALDRLRQRGLEQRCLRLSAREASEEEL
GLVHRVPFTARGWPQGLDCSWWTLCSLELCKMGLP

>ENSP00000216271 pep:known gene:ENSG00000100429 
MGTALVYHEDMTATRLLWDDPECEIERPERLTAALDRLRQRGLEQRCLRLSAREASEEEL
GLVHSPEYVSLVRETQVLGKEELQALSGQFDAIYFHPSTFHCARLAAGAGLQLVDAVLTG
AVQNGLALVRPPGHHGQRAAANGFCVFNNVAIAAAHAKQKHGLHRILVVDWDVHHGQGIQ
YLFEDDPSVLYFSWHRYEHGRFWPFLRESDADAVGRGQGLGFTVNLPWNQVGMGNADYVA
What I want to do is when the records have identical $3 i.e. same gene:blabla, I want to put them in a file with $3.out (P.S. along with the lines below it)

I tried grepping out $3 first separately onto a file, and then taking each line in that file as a pattern and pulling out records using awk.
Somehow I faced probs with pulling out onto $3.out

Help needed
 
Old 07-29-2011, 05:02 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Code:
/gene/ {

  filename = $3
  
  sub(/gene:/,"",filename)
  
}

{

  print > ( filename ".out" )
  
}

Last edited by colucix; 07-29-2011 at 05:07 PM. Reason: code simplified
 
Old 07-29-2011, 05:12 PM   #3
Tauro
LQ Newbie
 
Registered: Apr 2011
Posts: 24

Original Poster
Rep: Reputation: 1
This gets me the line containing $3 and not the lines below it ..
I achieved this using grep.
I need the line containing $3 and the lines below as well.
 
Old 07-29-2011, 05:26 PM   #4
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Strange. The print statement is applied to all the records. The first awk rule above (when $0 matches "gene") just sets the output filename.
 
Old 07-29-2011, 05:34 PM   #5
Tauro
LQ Newbie
 
Registered: Apr 2011
Posts: 24

Original Poster
Rep: Reputation: 1
Colucix..
You modified.. after I ran the previous one.
Thanks for the help. Could you tell me what sub does ?
 
Old 07-29-2011, 05:55 PM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Code:
sub(/gene:/,"",filename)
This simply remove the "gene:" part from the variable filename. It substitutes "gene:" with the null string. Not actually needed if you want to keep it in the output file names.

Running the code on your example, I get:
Code:
ENSG00000152253.out
ENSG00000100429.out
ENSG00000089163.out
 
Old 07-29-2011, 05:57 PM   #7
Tauro
LQ Newbie
 
Registered: Apr 2011
Posts: 24

Original Poster
Rep: Reputation: 1
Thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Awk to extract patterns till it hits blank line (in for loop) Tauro Linux - Newbie 5 07-22-2011 12:20 AM
[SOLVED] Grep varying no. of lines between two patterns Tauro Linux - Newbie 21 04-14-2011 04:57 AM
Bash scripting: column-ize file of varying length strings Quantum0726 Programming 4 08-13-2005 07:19 PM
Remembering patterns and printing only those patterns using sed bernie82 Programming 5 05-26-2005 06:18 PM


All times are GMT -5. The time now is 02:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration