LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-05-2012, 06:40 AM   #1
sarenace
Member
 
Registered: Feb 2012
Posts: 57

Rep: Reputation: Disabled
concatenating non blank lines with sed


Hello,
I have a question so easy that I'm embarrassed to write it up as a post on this forum. I want sed to substitute the newline character with the tab character on any non-blank line. Feeling like that was WELL within my admittedly beginner level bash scripting abilities, I typed

Quote:
sed '/^$/!s/\n/\t/'
The first regexp tells sed to look on blank lines, in which case it is instructed to NOT substitute \n for \t. I really wish the problem was more complicated than this; but its not. It seems like it should be relatively easy and straight forward, however, the above command simply doesnt perform as expected; it literally does not effect the output. Any thoughts?

As an addendum, I really do try to solve these for myself before coming to this forum; and it pains me to come to you guys with such a simple problem, but I can't figure it out.
 
Old 12-05-2012, 07:45 AM   #2
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 623

Rep: Reputation: 364Reputation: 364Reputation: 364Reputation: 364
Hi.

How about awk:
Code:
$ echo -e '1\n2\n3\n\n4' | awk '$1=$1' RS='\n\n' OFS='\t'
1       2       3
4
It is not as straightforward in sed as you think, because sed is line-oriented. You'd need to collect a block of lines up to an empty line (using N or H) and then do s/\n/\t/g.
 
Old 12-05-2012, 08:31 AM   #3
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
Code:
sed -e :a -e 'N; s/\n/\t/; ta' <filename
This works. But I did not really understand why

N is the patternspace, it holds the current line.

BTW: a very interesting problem when done with sed

Markus
 
Old 12-05-2012, 08:44 AM   #4
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 651

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by markush View Post
Code:
sed -e :a -e 'N; s/\n/\t/; ta' <filename
This works. But I did not really understand why
Markus
Not quite. This just replaces all newlines with tabs, not just after non-blank lines as required. Also, I believe the label can be put into the same expression, making it a little shorter:

Code:
sed ':a N; s/\n/\t/; ta'
I was trying to figure this out and I came up with this:

Code:
sed -n '$!{/^$/{x;s/\n/\t/g;s/^\t//;p;g;/./p;b};H;b};p'
But I think it's still very ugly and I hope somebody can put together a better solution.
 
1 members found this post helpful.
Old 12-05-2012, 08:54 AM   #5
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
millgates, thanks for correcting me
Code:
sed ':a /^$/!N; s/\n/\t/; ta'
this should work then.

Markus
 
Old 12-05-2012, 09:45 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,084

Rep: Reputation: 287Reputation: 287Reputation: 287
Quote:
Originally Posted by sarenace View Post
I want sed to substitute the newline character with the tab character on any non-blank line.
This problem statement is not entirely clear. Perhaps English is not OP's first language, and for that we should make allowances. For better comprehension, I reword the problem as follows:

"I want to substitute a tab character for the newline character on any non-blank line."

If this interpretation is correct, all blank lines in an input file should be left unchanged. Let's remember that a "blank line" could consist of zero or more blank characters.

I learn by reading forum posts and solving when I can. I also test proposed solutions offered by others. This is a test file constructed for the purpose:
Code:
This is a non-blank line with no trailing blanks.
This is a non-blank line with two trailing blanks.  
The following line is blank.
  
The preceding line was blank.
The following THREE lines are blank.
  

   
The preceding THREE lines were blank.
This is the last line in the file.
Using this test file, all previous solutions fail.

Daniel B. Martin
 
Old 12-05-2012, 09:57 AM   #7
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 651

Rep: Reputation: 269Reputation: 269Reputation: 269
Quote:
Originally Posted by markush View Post
millgates, thanks for correcting me
Code:
sed ':a /^$/!N; s/\n/\t/; ta'
this should work then.

Markus
This still does not work for me:

Code:
$ echo -e "a\nb\nc\n\nd e\n\n\nf"|sed ':a /^$/!N;s/\n/\t/; ta'
a       b       c               d e                     f
I think the problem is that the /^$/ test matches the entire pattern space, not just the new line. So,
let's have a file like this:
Code:
a
b

c
Now let's feed it to the sed:

1) sed reads the first line: a
2) /^$/ will not match, so !N will append next line to the pattern space.
3) The contents of the pattern space is now "a\nb".
4) s/\n/\t/ will replace the \n with \t and ta will return the flow back to the begining of the expression.
5) /^$/ will not match again, next line (blank) is read and appended (with a newline) to the pattern space.
6) The pattern space now contains "a\tb\n" and the substitution will again succeed.
7) /^$/ will not match (it never will except if the first line is blank), the last line is read and appended to the pattern space, newline gets substituted
8) reaches the end of input and the entire pattern space is sent to stdout: a\tb\t\t\c\n
 
Old 12-05-2012, 12:13 PM   #8
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 623

Rep: Reputation: 364Reputation: 364Reputation: 364Reputation: 364
Hi.

Here is a space-inefficient sed solution:
Code:
$ echo -e '1aaa\n2bb\n3cc\n\n\n4dd\n\n5\n6' | sed -nr 'H; ${x; s/\n\n/&\n/g;s/([^\n])\n/\1\t/g; s/^\n//p}'
1aaa    2bb     3cc


4dd

5       6
It reads whole file into hold space and then substitutes tabs for newlines. The red part allows to keep the number of blank lines, otherwise this number is decreased by 1 (which I believe is what OP wanted, because otherwise there is no way to get output without blank lines).

Last edited by firstfire; 12-05-2012 at 01:16 PM. Reason: Fixed solution.
 
Old 12-05-2012, 01:40 PM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,485

Rep: Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890
Don't have much in the way of sedfu, but the following awk seems to work:
Code:
 awk '!NF{printf "\n";if(x)next;x=1}NF{x=0}ORS=NF?"\t":"\n"'
 
Old 12-05-2012, 02:01 PM   #10
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
Quote:
Originally Posted by grail View Post
Don't have much in the way of sedfu, but the following awk seems to work:
Code:
 awk '!NF{printf "\n";if(x)next;x=1}NF{x=0}ORS=NF?"\t":"\n"'
It doesn't work here, the newlines are printed as newlines.

Markus
 
Old 12-05-2012, 03:15 PM   #11
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 651

Rep: Reputation: 269Reputation: 269Reputation: 269
Also, since I don't have anything better to do, what about perl?

Code:
perl -pe 'chomp;$_.=$_?"\t":"\n";'
or

Code:
perl -0777 -pe 's/(?<=[^\n])\n(?=[^\n])/\t/g;'

Last edited by millgates; 12-05-2012 at 03:17 PM.
 
Old 12-06-2012, 01:21 PM   #12
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,485

Rep: Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890Reputation: 1890
Quote:
Originally Posted by markush
It doesn't work here, the newlines are printed as newlines.
Not sure I follow, yes it does leave any blank line in tact ... I thought this was a pre-requisite?
As per OP:
Quote:
I want sed to substitute the newline character with the tab character on any non-blank line.
I tested with the examples provided and it seems to produce the desired output?
 
Old 12-06-2012, 02:26 PM   #13
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
grail, I think you're right, now I tried perl
Code:
perl -pe 's/(.+[a-zA-Z].+)\n/$1\t/g;'
It looks for "at least one nonwhitespace character" and then substitutes \n with \t
When one writes a complete script, it is necessary to undef $/
Code:
#!/usr/bin/perl

use strict ;
use warnings ;

undef $/ ;
open FILE, "file.txt" ;
my $file = <FILE> ;
$file =~ s/(.+[a-zA-Z].+)\n/$1\t/g ;
print $file ;
Markus

Last edited by markush; 12-06-2012 at 02:30 PM.
 
Old 12-06-2012, 02:34 PM   #14
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
The sedfaq sections 4.25 and 4.26 has solutions for concatenating lines based on their contents:

http://sed.sourceforge.net/sedfaq4.html#s4.25

To concat all non-blank lines, just use a generic matching pattern, such as '.$'.
 
Old 12-06-2012, 02:41 PM   #15
markush
Senior Member
 
Registered: Apr 2007
Location: Germany
Distribution: Slackware
Posts: 3,970

Rep: Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848Reputation: 848
Quote:
Originally Posted by David the H. View Post
...
To concat all non-blank lines, just use a generic matching pattern, such as '.$'.
But what matches the '.' in this case? in Perl . matches anything but \n
I had with sed and Perl as well some difficulties to match blank lines which contain whitespaces. I'm somewhat confused here.

Markus
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Make multiple blank lines to a single lines kimhj3715 Programming 5 06-10-2012 10:35 AM
[SOLVED] Perl: how to replace blank lines in a file with given lines from another karamaz0v Programming 8 04-19-2012 06:48 AM
Sed. Delete blank lines between two patterns supersoni3 Programming 5 07-29-2010 10:40 AM
sed/awk: Three consecutive blank lines in a file, how to delete two of them? recomboDNA Programming 8 06-17-2010 09:50 AM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 09:19 AM


All times are GMT -5. The time now is 02:49 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration