LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-04-2015, 08:57 AM   #1
nouse
LQ Newbie
 
Registered: Sep 2013
Posts: 21

Rep: Reputation: Disabled
Merging two lines in a text file if they start with the same expression


Hi!

I have one file that looks like this

Code:
>Unc14086 
AGAGUUUGAU CCUGGCUCAG AAUCAACGCU GGCGGCGUGC CUAACACAUG
CAAGUCGAAC GAGAAAGUGG AGCAAUCCAU GAGUAAAGUG GCGCACGGGU
GAGUAACACG UGACUAACCU ACCCUUGAGU GGGGGAUAAC UGAGGGAAAC
>Unc35443
GCACGAGAAA GUGGAGCAAU CCAUGAGUAA AGUGGCGUAC GGGUGAGUAA
CACGUGACUA ACCUACCCUC GAGUGGGGAA UAACUUCGGG AAACCGGAGC
UAAUACCGCA UAACACCUAC GGGUCAAAGG AGCAAUUCGC UUGAGGAGGG
So, every n (n may vary) lines the next line starts with ">", that is the beginning of a new block of information.

I have another tab-delimited file:

Code:
Unc14806 InformationalTextExample
Unf35433 InformationalTextExampleII
My goal is to parse the second file with information found in lines starting with ">" in the first file. Whenever a matching pair occurs, i want to write "InformationalTextExample" in that line, possibly separated by "_":


Code:
>Unc14086_InformationalTextExample
AGAGUUUGAU CCUGGCUCAG AAUCAACGCU GGCGGCGUGC CUAACACAUG
CAAGUCGAAC GAGAAAGUGG AGCAAUCCAU GAGUAAAGUG GCGCACGGGU
GAGUAACACG UGACUAACCU ACCCUUGAGU GGGGGAUAAC UGAGGGAAAC
>Unc35443_InformationalTextExampleII
GCACGAGAAA GUGGAGCAAU CCAUGAGUAA AGUGGCGUAC GGGUGAGUAA
CACGUGACUA ACCUACCCUC GAGUGGGGAA UAACUUCGGG AAACCGGAGC
UAAUACCGCA UAACACCUAC GGGUCAAAGG AGCAAUUCGC UUGAGGAGGG
How would that be possible?

Thank you!
 
Old 02-04-2015, 09:15 AM   #2
J Martin Rushton
Member
 
Registered: Jan 2015
Location: England
Distribution: Mainly CentOS
Posts: 31

Rep: Reputation: Disabled
If I were tackling that I'd fall back on AWK. Read in the second file and save the data as an array (either use a program switch or do this read in the BEGIN block):
Code:
names[$1] = $2
The main body is then trivially:
Code:
substr($0, 1, 1) == ">" { $0 = $0 "_" names[substr($0, 2)] }
{print}
NB, code not checked/debugged, I leave that as an exercise for the reader!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular expression to match lines in a file that do not start with # or a blank space threeonethree Programming 3 12-25-2010 04:00 AM
Deleting multiple lines above and below an expression in a file Ransak Programming 7 05-20-2010 09:08 PM
Remove lines in a text file based on another text file asiandude Programming 10 01-29-2009 11:59 AM
Adding lines of text to beginning of a text file BillKat Programming 2 01-19-2009 11:40 AM
Grab text lines in text file LULUSNATCH Programming 1 12-02-2005 11:55 AM


All times are GMT -5. The time now is 05:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration