LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 04-15-2011, 03:44 PM   #1
Dsw0002
LQ Newbie
 
Registered: Apr 2011
Posts: 4

Rep: Reputation: 0
Sed, Awk, Perl - Merge lines unless they match a certain string


What is the best way to merge lines, in sed, awk or perl, that occur between certain strings?
I'm new to sed scripting and I have been working on this for some time now.
I have a large file (sample below) that I need to edit.

>contig...........translated
(sequences)
(sequences)
(sequences)
>contig...........translated
(sequences)
.
.
.

What I need looks something like this.
>contig...........translated
(sequences)(sequences)(sequences)
>contig...........translated
(sequences).............

I'm working with a very large file so simply merging all the lines then adding a new line character before ">contig" and after "translated" won't work, at least not with sed.
 
Old 04-15-2011, 03:51 PM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
It can be done in a very straightforward manner in Perl. I do not understand where the capacity problem comes from - you just write to your output file (sequences) between '>contig...........translated' stripping the former of "\n" ('chomp' function in Perl).
 
Old 04-15-2011, 05:23 PM   #3
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
Quote:
Originally Posted by Dsw0002 View Post
What is the best way to merge lines, in sed, awk or perl, that occur between certain strings?
I don't know about best, but one possibility is this awk script,
Code:
awk 'BEGIN      { RS="[\r\n]+" ; nl="" ; sep=" " }
     /^>contig/ { printf("%s%s%s", nl, $0, RT); nl="" ; sp="" ; next }
                { printf("%s%s", sp, $0) ; nl=RT ; sp=sep }
     END        { printf("%s", nl) }' file
which streams the input file (only one line or so in memory at any time) quite efficiently, and keeps whatever newline convention you might be using.
The sep=" " in the first line specifies the delimiter that replaces the newlines in merged lines. Use sep="" if you want to merge the lines without any intervening separator.

This awk script is a bit more complex than absolutely necessary, but I only recently found out how to retain the newline convention efficiently, and wanted to apply that

If you need something even more efficient, or want to work with unlimited-length lines (not having to read even a single complete line into RAM), I'd write a small utility in C. I think it'd only take about a hundred lines of code, even if you used unistd.h low-level I/O for maximum efficiency.

Last edited by Nominal Animal; 04-15-2011 at 05:25 PM.
 
Old 04-16-2011, 04:07 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Seems I am on the same wave length as Nominal:
Code:
awk '/contig/{ret="\n";if(a)nl=ret}{printf "%s%s%s",nl,$0,ret;a=1;nl=ret=""}' file
 
Old 04-16-2011, 05:12 AM   #5
kurumi
Member
 
Registered: Apr 2010
Posts: 223

Rep: Reputation: 45
Code:
$ ruby -ne 'print /contig/? "\n"+$_: $_.chomp' file
 
Old 04-16-2011, 06:20 AM   #6
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Code:
sed  ':a />contig/! {$bb;N;ba};:b s/\n//g;1!s/>contig/\n&/' file
If you last line is '>contig...' then the jumppoint ':b' in the above command is obsolete.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Merge lines in a file using sed arobic Programming 8 01-20-2012 03:11 PM
awk merge and sum lines problem lalo4080 Programming 4 08-12-2008 11:21 AM
Sed/Awk: print lines between n'th and (n+1)'th match of "foo" xaverius Programming 17 08-20-2007 12:39 PM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 09:55 AM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 10:19 AM


All times are GMT -5. The time now is 12:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration