Help answer threads with 0 replies.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 04-15-2011, 02:44 PM   #1
LQ Newbie
Registered: Apr 2011
Posts: 4

Rep: Reputation: 0
Sed, Awk, Perl - Merge lines unless they match a certain string

What is the best way to merge lines, in sed, awk or perl, that occur between certain strings?
I'm new to sed scripting and I have been working on this for some time now.
I have a large file (sample below) that I need to edit.


What I need looks something like this.

I'm working with a very large file so simply merging all the lines then adding a new line character before ">contig" and after "translated" won't work, at least not with sed.
Old 04-15-2011, 02:51 PM   #2
Sergei Steshenko
Senior Member
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
It can be done in a very straightforward manner in Perl. I do not understand where the capacity problem comes from - you just write to your output file (sequences) between '>contig...........translated' stripping the former of "\n" ('chomp' function in Perl).
Old 04-15-2011, 04:23 PM   #3
Nominal Animal
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946Reputation: 946
Originally Posted by Dsw0002 View Post
What is the best way to merge lines, in sed, awk or perl, that occur between certain strings?
I don't know about best, but one possibility is this awk script,
awk 'BEGIN      { RS="[\r\n]+" ; nl="" ; sep=" " }
     /^>contig/ { printf("%s%s%s", nl, $0, RT); nl="" ; sp="" ; next }
                { printf("%s%s", sp, $0) ; nl=RT ; sp=sep }
     END        { printf("%s", nl) }' file
which streams the input file (only one line or so in memory at any time) quite efficiently, and keeps whatever newline convention you might be using.
The sep=" " in the first line specifies the delimiter that replaces the newlines in merged lines. Use sep="" if you want to merge the lines without any intervening separator.

This awk script is a bit more complex than absolutely necessary, but I only recently found out how to retain the newline convention efficiently, and wanted to apply that

If you need something even more efficient, or want to work with unlimited-length lines (not having to read even a single complete line into RAM), I'd write a small utility in C. I think it'd only take about a hundred lines of code, even if you used unistd.h low-level I/O for maximum efficiency.

Last edited by Nominal Animal; 04-15-2011 at 04:25 PM.
Old 04-16-2011, 03:07 AM   #4
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,467

Rep: Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856Reputation: 2856
Seems I am on the same wave length as Nominal:
awk '/contig/{ret="\n";if(a)nl=ret}{printf "%s%s%s",nl,$0,ret;a=1;nl=ret=""}' file
Old 04-16-2011, 04:12 AM   #5
Registered: Apr 2010
Posts: 228

Rep: Reputation: 45
$ ruby -ne 'print /contig/? "\n"+$_: $_.chomp' file
Old 04-16-2011, 05:20 AM   #6
Senior Member
Registered: Jan 2010
Posts: 1,608

Rep: Reputation: 449Reputation: 449Reputation: 449Reputation: 449Reputation: 449
sed  ':a />contig/! {$bb;N;ba};:b s/\n//g;1!s/>contig/\n&/' file
If you last line is '>contig...' then the jumppoint ':b' in the above command is obsolete.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Merge lines in a file using sed arobic Programming 8 01-20-2012 02:11 PM
awk merge and sum lines problem lalo4080 Programming 4 08-12-2008 10:21 AM
Sed/Awk: print lines between n'th and (n+1)'th match of "foo" xaverius Programming 17 08-20-2007 11:39 AM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 09:19 AM

All times are GMT -5. The time now is 12:50 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration