LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-29-2012, 06:25 PM   #1
progchi
LQ Newbie
 
Registered: Oct 2012
Posts: 4

Rep: Reputation: Disabled
write to file under certain conditions


Hello,

I have a file with 2 columns looking like this:

>GENE1 ACGGTTAGAGCCCAGAGTTGAGACCCGTGGAG
>GENE2 NACCCCGATCGTACGRRSTVACCCGA
>GENE3 TGCGAGCNNTTTSSR
>GENE4 CGATGCTGCGCGATCTCTAGAGAGCCCAG

I want to obtain 2 files. One file with the rows of which column 2 contains only A's, C's, T's or G's. And another file with the rows of which column 2 contains also characters other than A's, C's, T's or G's.
So in this case:
File 1:
>GENE1 ACGGTTAGAGCCCAGAGTTGAGACCCGTGGAG
>GENE4 CGATGCTGCGCGATCTCTAGAGAGCCCAG

File 2:
>GENE2 NACCCCGATCGTACGRRSTVACCCGA
>GENE3 TGCGAGCNNTTTSSR


I really tried several things, but nothing worked :-(.

Thanks in advance!
 
Old 10-29-2012, 11:38 PM   #2
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 2,959

Rep: Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268Reputation: 1268
Code:
egrep '^[^ ]+ +[ACTG]+ *$'  # A, C, T, G only
egrep -v '^[^ ]+ +[ACTG]+ *$'  # lines not matching the above
The regular expression matches one or more non-space characters at the beginning of the line, followed by one or more spaces, followed by one or more of the characters ACTG, and possible trailing space characters till the end of line. The second command simply uses the "-v" option to invert the search. A shortcoming of that second command is that it would print any lines that don't match the format. A better, but more complex, command for that second case would be:
Code:
egrep '^[^ ]+ +[^ ]*[^ACTG][^ ]* *$'
That one looks for a 2nd field that consists of any number of non-space characters, followed by one character that is not ACTG, followed by any number of non-space characters, and will print only lines with exactly two fields where the 2nd field contains a character that is not ACTG.

Last edited by rknichols; 10-29-2012 at 11:40 PM.
 
Old 10-30-2012, 04:13 AM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Another suggestion using awk:
Code:
awk '{ if ($2 ~ /[^ACGT]/) print > "file2"; else print > "file1" }' file
 
Old 10-30-2012, 11:48 AM   #4
progchi
LQ Newbie
 
Registered: Oct 2012
Posts: 4

Original Poster
Rep: Reputation: Disabled
Thank you both very much. This helped me a lot!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How to write a .sh script to read configuration file and write into .txt file ISStaras Linux - Newbie 8 09-06-2012 07:03 PM
Create the file, write into that file but can't delete file in Linux pandunr Linux - Newbie 3 06-15-2011 09:45 AM
grip : no write access to write encoded file bidouilleur Linux - Software 5 10-09-2010 10:23 PM
Create a file from two files with specific search conditions.(a bit Difficult) vysakh@gmail.com Linux - Server 4 05-06-2010 10:18 AM
Grip- "no write access to write encoded file" Alvis Linux - Software 4 01-06-2004 05:18 PM


All times are GMT -5. The time now is 02:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration