[SOLVED] sed

danielbmartin · 01-22-2012, 08:22 PM

Each line in an input file consists of a line number followed by a variable number of blank-delimited words.
Example:

Quote:

000006 only this and nothing more

The desired transformation will insert the numeric between each pair of words.
Example:

Quote:

000006 only 000006 this 000006 and 000006 nothing 000006 more

This sed ...

Code:

|sed 's/\([0-9]*\) \([a-z]*\) \([a-z]*\) \([a-z]*\) \([a-z]*\)/\1 \2 \1 \3 \1 \4 \1 \5/'

... works for the 5-word example but I don't know how to generalize it for lines of any length.

A data snippet to play with...

Quote:

000001 once upon a midnight dreary while i pondered weak and weary
000002 over many a quaint and curious volume of forgotten lore
000003 while i nodded nearly napping suddenly there came a tapping
000004 as of some one gently rapping rapping at my chamber door
000005 tis some visitor i muttered tapping at my chamber door
000006 only this and nothing more

Ideas?

Daniel B. Martin

grail · 01-22-2012, 11:45 PM

Quote:

Ideas?

Yes, use the right tool for the job:

Code:

awk '{OFS = " "$1" ";$1 = "";$2 = $2;sub(/^ /,"")}1' file

If there is a common delimiter than awk is nearly always a better choice than sed.

firstfire · 01-23-2012, 12:09 AM

Hi.

Here is a quite ugly solution:

Code:

$ sed -r 's/ +/\n/g; :a; s/\n/ /; s/(([0-9]+) [^\n]*)\n/\1 \2\n/; ta;' infile.txt
000001 once 000001 upon 000001 a 000001 midnight 000001 dreary 000001 while 000001 i 000001 pondered 000001 weak 000001 and 000001 weary
000002 over 000002 many 000002 a 000002 quaint 000002 and 000002 curious 000002 volume 000002 of 000002 forgotten 000002 lore
000003 while 000003 i 000003 nodded 000003 nearly 000003 napping 000003 suddenly 000003 there 000003 came 000003 a 000003 tapping
000004 as 000004 of 000004 some 000004 one 000004 gently 000004 rapping 000004 rapping 000004 at 000004 my 000004 chamber 000004 door
000005 tis 000005 some 000005 visitor 000005 i 000005 muttered 000005 tapping 000005 at 000005 my 000005 chamber 000005 door
000006 only 000006 this 000006 and 000006 nothing 000006 more

Make sure there are no trailing spaces in data.

danielbmartin · 01-23-2012, 09:05 AM

Quote:

Originally Posted by grail

Yes, use the right tool for the job:

Code:

awk '{OFS = " "$1" ";$1 = "";$2 = $2;sub(/^ /,"")}1' file

If there is a common delimiter than awk is nearly always a better choice than sed.

Thank you for this concise solution. I've barely scratched the surface of awk but am coming to understand the wisdom of your "right tool" maxim.

Daniel B. Martin

danielbmartin · 01-23-2012, 09:10 AM

Quote:

Originally Posted by firstfire

Here is a quite ugly solution:

Code:

$ sed -r 's/ +/\n/g; :a; s/\n/ /; s/(([0-9]+) [^\n]*)\n/\1 \2\n/; ta;' infile.txt

Beauty is in the eye of the beholder. Your solution is complicated but instructive, and therefore beautiful. Thank you.

Daniel B. Martin