Quote:
Originally Posted by Dsw0002
What is the best way to merge lines, in sed, awk or perl, that occur between certain strings?
|
I don't know about best, but one possibility is this awk script,
Code:
awk 'BEGIN { RS="[\r\n]+" ; nl="" ; sep=" " }
/^>contig/ { printf("%s%s%s", nl, $0, RT); nl="" ; sp="" ; next }
{ printf("%s%s", sp, $0) ; nl=RT ; sp=sep }
END { printf("%s", nl) }' file
which streams the input file (only one line or so in memory at any time) quite efficiently, and keeps whatever newline convention you might be using.
The
sep=" " in the first line specifies the delimiter that replaces the newlines in merged lines. Use
sep="" if you want to merge the lines without any intervening separator.
This awk script is a bit more complex than absolutely necessary, but I only recently found out how to retain the newline convention efficiently, and wanted to apply that
If you need something even more efficient, or want to work with unlimited-length lines (not having to read even a single complete line into RAM), I'd write a small utility in C. I think it'd only take about a hundred lines of code, even if you used unistd.h low-level I/O for maximum efficiency.