Sed, Awk, Perl - Merge lines unless they match a certain string
What is the best way to merge lines, in sed, awk or perl, that occur between certain strings?
I'm new to sed scripting and I have been working on this for some time now. I have a large file (sample below) that I need to edit. >contig...........translated (sequences) (sequences) (sequences) >contig...........translated (sequences) . . . What I need looks something like this. >contig...........translated (sequences)(sequences)(sequences) >contig...........translated (sequences)............. I'm working with a very large file so simply merging all the lines then adding a new line character before ">contig" and after "translated" won't work, at least not with sed. |
It can be done in a very straightforward manner in Perl. I do not understand where the capacity problem comes from - you just write to your output file (sequences) between '>contig...........translated' stripping the former of "\n" ('chomp' function in Perl).
|
Quote:
Code:
awk 'BEGIN { RS="[\r\n]+" ; nl="" ; sep=" " } The sep=" " in the first line specifies the delimiter that replaces the newlines in merged lines. Use sep="" if you want to merge the lines without any intervening separator. This awk script is a bit more complex than absolutely necessary, but I only recently found out how to retain the newline convention efficiently, and wanted to apply that ;) If you need something even more efficient, or want to work with unlimited-length lines (not having to read even a single complete line into RAM), I'd write a small utility in C. I think it'd only take about a hundred lines of code, even if you used unistd.h low-level I/O for maximum efficiency. |
Seems I am on the same wave length as Nominal:
Code:
awk '/contig/{ret="\n";if(a)nl=ret}{printf "%s%s%s",nl,$0,ret;a=1;nl=ret=""}' file |
Code:
$ ruby -ne 'print /contig/? "\n"+$_: $_.chomp' file |
Code:
sed ':a />contig/! {$bb;N;ba};:b s/\n//g;1!s/>contig/\n&/' file |
All times are GMT -5. The time now is 10:39 AM. |