LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-12-2008, 05:52 AM   #1
bioinformatics_guy
Member
 
Registered: Aug 2008
Posts: 54

Rep: Reputation: 15
SED/AWK help


I have a file of the following format

>SC_12345_1
AACTGCTGATGATCGTAGTCGTAGTCGTCGTAGTCGTGACTGCTCG
>SC_12345_2
ATCGTAGCTGATCGATGCTAGCTGCTGCATGTCGTACGTAGCTAGTGCTAGCTACGTAC
>SC_12345_3
ACTGCTAGCTGATCGTACGTACGTCAGTCG

and so on...

What I want to do is break this file down into indiviual files that are named >SC........ as well as the first line to have the >SC..... as well as all the sequence underneath. So for this example, it would have 3 individual files of name >SC...........1,2,3 , which all confer to individual files with:


>SC_12345_1
AACTGCTGATGATCGTAGTCGTAGTCGTCGTAGTCGTGACTGCTCG

>SC_12345_2
ATCGTAGCTGATCGATGCTAGCTGCTGCATGTCGTACGTAGCTAGTGCTAGCTACGTAC

>SC_12345_3
ACTGCTAGCTGATCGTACGTACGTCAGTCG

Any suggestions? The name if the file is really unimportant, I just need to know the order of the files (the last number in the identifier, ^>SC_\d+_(\d+) , basically)
 
Old 08-12-2008, 06:07 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
It is not a good idea to have the ">" character in a file name. However, if the name of the file is not important, you can do something like this:
Code:
awk '/^>SC_/{suffix=gensub(/.*_/,"","g");
                   print > "file_"suffix;
                   getline;
                   print > "file_"suffix}' file
The gensub statement strips out the longest string terminating with an underscore. The suffix will be the number, whatever be its length. Then simply use redirection to a file to print the required lines.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
awk or sed help cmontr Programming 16 05-14-2008 11:59 AM
awk and/or sed linux2man Linux - General 7 01-22-2007 11:02 AM
Reformatting help with sed or awk mjmwired Programming 4 06-08-2006 01:22 PM
Sed and Awk Gins Programming 7 04-19-2006 11:32 AM
awk/sed help pantera Programming 1 05-14-2004 12:59 AM


All times are GMT -5. The time now is 01:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration