LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   concatenating non blank lines with sed (https://www.linuxquestions.org/questions/programming-9/concatenating-non-blank-lines-with-sed-4175440125/)

sarenace 12-05-2012 06:40 AM

concatenating non blank lines with sed
 
Hello,
I have a question so easy that I'm embarrassed to write it up as a post on this forum. I want sed to substitute the newline character with the tab character on any non-blank line. Feeling like that was WELL within my admittedly beginner level bash scripting abilities, I typed

Quote:

sed '/^$/!s/\n/\t/'
The first regexp tells sed to look on blank lines, in which case it is instructed to NOT substitute \n for \t. I really wish the problem was more complicated than this; but its not. It seems like it should be relatively easy and straight forward, however, the above command simply doesnt perform as expected; it literally does not effect the output. Any thoughts?

As an addendum, I really do try to solve these for myself before coming to this forum; and it pains me to come to you guys with such a simple problem, but I can't figure it out.

firstfire 12-05-2012 07:45 AM

Hi.

How about awk:
Code:

$ echo -e '1\n2\n3\n\n4' | awk '$1=$1' RS='\n\n' OFS='\t'
1      2      3
4

It is not as straightforward in sed as you think, because sed is line-oriented. You'd need to collect a block of lines up to an empty line (using N or H) and then do s/\n/\t/g.

markush 12-05-2012 08:31 AM

Code:

sed -e :a -e 'N; s/\n/\t/; ta' <filename
This works. But I did not really understand why ;)

N is the patternspace, it holds the current line.

BTW: a very interesting problem when done with sed

Markus

millgates 12-05-2012 08:44 AM

Quote:

Originally Posted by markush (Post 4843225)
Code:

sed -e :a -e 'N; s/\n/\t/; ta' <filename
This works. But I did not really understand why ;)
Markus

Not quite. This just replaces all newlines with tabs, not just after non-blank lines as required. Also, I believe the label can be put into the same expression, making it a little shorter:

Code:

sed ':a N; s/\n/\t/; ta'
I was trying to figure this out and I came up with this:

Code:

sed -n '$!{/^$/{x;s/\n/\t/g;s/^\t//;p;g;/./p;b};H;b};p'
But I think it's still very ugly and I hope somebody can put together a better solution.

markush 12-05-2012 08:54 AM

millgates, thanks for correcting me
Code:

sed ':a /^$/!N; s/\n/\t/; ta'
this should work then.

Markus

danielbmartin 12-05-2012 09:45 AM

Quote:

Originally Posted by sarenace (Post 4843167)
I want sed to substitute the newline character with the tab character on any non-blank line.

This problem statement is not entirely clear. Perhaps English is not OP's first language, and for that we should make allowances. For better comprehension, I reword the problem as follows:

"I want to substitute a tab character for the newline character on any non-blank line."

If this interpretation is correct, all blank lines in an input file should be left unchanged. Let's remember that a "blank line" could consist of zero or more blank characters.

I learn by reading forum posts and solving when I can. I also test proposed solutions offered by others. This is a test file constructed for the purpose:
Code:

This is a non-blank line with no trailing blanks.
This is a non-blank line with two trailing blanks. 
The following line is blank.
 
The preceding line was blank.
The following THREE lines are blank.
 

 
The preceding THREE lines were blank.
This is the last line in the file.

Using this test file, all previous solutions fail.

Daniel B. Martin

millgates 12-05-2012 09:57 AM

Quote:

Originally Posted by markush (Post 4843239)
millgates, thanks for correcting me
Code:

sed ':a /^$/!N; s/\n/\t/; ta'
this should work then.

Markus

This still does not work for me:

Code:

$ echo -e "a\nb\nc\n\nd e\n\n\nf"|sed ':a /^$/!N;s/\n/\t/; ta'
a      b      c              d e                    f

I think the problem is that the /^$/ test matches the entire pattern space, not just the new line. So,
let's have a file like this:
Code:

a
b

c

Now let's feed it to the sed:

1) sed reads the first line: a
2) /^$/ will not match, so !N will append next line to the pattern space.
3) The contents of the pattern space is now "a\nb".
4) s/\n/\t/ will replace the \n with \t and ta will return the flow back to the begining of the expression.
5) /^$/ will not match again, next line (blank) is read and appended (with a newline) to the pattern space.
6) The pattern space now contains "a\tb\n" and the substitution will again succeed.
7) /^$/ will not match (it never will except if the first line is blank), the last line is read and appended to the pattern space, newline gets substituted
8) reaches the end of input and the entire pattern space is sent to stdout: a\tb\t\t\c\n

firstfire 12-05-2012 12:13 PM

Hi.

Here is a space-inefficient sed solution:
Code:

$ echo -e '1aaa\n2bb\n3cc\n\n\n4dd\n\n5\n6' | sed -nr 'H; ${x; s/\n\n/&\n/g;s/([^\n])\n/\1\t/g; s/^\n//p}'
1aaa    2bb    3cc


4dd

5      6

It reads whole file into hold space and then substitutes tabs for newlines. The red part allows to keep the number of blank lines, otherwise this number is decreased by 1 (which I believe is what OP wanted, because otherwise there is no way to get output without blank lines).

grail 12-05-2012 01:40 PM

Don't have much in the way of sedfu, but the following awk seems to work:
Code:

awk '!NF{printf "\n";if(x)next;x=1}NF{x=0}ORS=NF?"\t":"\n"'

markush 12-05-2012 02:01 PM

Quote:

Originally Posted by grail (Post 4843384)
Don't have much in the way of sedfu, but the following awk seems to work:
Code:

awk '!NF{printf "\n";if(x)next;x=1}NF{x=0}ORS=NF?"\t":"\n"'

It doesn't work here, the newlines are printed as newlines.

Markus

millgates 12-05-2012 03:15 PM

Also, since I don't have anything better to do, what about perl?

Code:

perl -pe 'chomp;$_.=$_?"\t":"\n";'
or

Code:

perl -0777 -pe 's/(?<=[^\n])\n(?=[^\n])/\t/g;'

grail 12-06-2012 01:21 PM

Quote:

Originally Posted by markush
It doesn't work here, the newlines are printed as newlines.

Not sure I follow, yes it does leave any blank line in tact ... I thought this was a pre-requisite?
As per OP:
Quote:

I want sed to substitute the newline character with the tab character on any non-blank line.
I tested with the examples provided and it seems to produce the desired output?

markush 12-06-2012 02:26 PM

grail, I think you're right, now I tried perl
Code:

perl -pe 's/(.+[a-zA-Z].+)\n/$1\t/g;'
It looks for "at least one nonwhitespace character" and then substitutes \n with \t
When one writes a complete script, it is necessary to undef $/
Code:

#!/usr/bin/perl

use strict ;
use warnings ;

undef $/ ;
open FILE, "file.txt" ;
my $file = <FILE> ;
$file =~ s/(.+[a-zA-Z].+)\n/$1\t/g ;
print $file ;

Markus

David the H. 12-06-2012 02:34 PM

The sedfaq sections 4.25 and 4.26 has solutions for concatenating lines based on their contents:

http://sed.sourceforge.net/sedfaq4.html#s4.25

To concat all non-blank lines, just use a generic matching pattern, such as '.$'.

markush 12-06-2012 02:41 PM

Quote:

Originally Posted by David the H. (Post 4844046)
...
To concat all non-blank lines, just use a generic matching pattern, such as '.$'.

But what matches the '.' in this case? in Perl . matches anything but \n
I had with sed and Perl as well some difficulties to match blank lines which contain whitespaces. I'm somewhat confused here.

Markus


All times are GMT -5. The time now is 05:35 PM.