-   Programming (
-   -   sed - loop construct for text processing (

danielbmartin 01-24-2012 11:03 AM

sed - loop construct for text processing
I'm learning the sed loop construct and having difficulty.

The input file consists of a variable number of blank-delimited fields.
Sample input file:

000001 now is the time
000002 for all good men
000003 to come to the aid
000004 of their country

The desired transformation replaces the 2nd, 3rd, ... nth field
with whatever appears in the first field.
In this example the first field is a line number but it could be
anything (color, city, car model, etc.)
Sample output file:

000001 000001 000001 000001 000001
000002 000002 000002 000002 000002
000003 000003 000003 000003 000003 000003
000004 000004 000004 000004

Is sed-with-loop a suitable technique?
If so, please show how it is done.

Daniel B. Martin

druuna 01-24-2012 11:47 AM


Not sure if I would use sed to do this, awk comes to mind:

awk '{ for ( z = 1; z <= NF; z++ ) printf("%s ",$1) }{ print "" } ' infile
NF is an awk internal variable that holds the amount of fields on the line that is being worked on, which is used in a loop.

Hope this helps.

firstfire 01-24-2012 06:06 PM

Hi again.

Here is a commented sed solution, I'm relatively satisfied with (put it to file and make executable)

#!/bin/sed -rf

# squeeze spaces
s/ +/ /g
# place newline (used as a marker) after second word
s/ /\n/2


# print intermediate steps for debugging

# w1 w2\n -> w1 w1\n
s/([^ ]+) ([^ ]+)\n/\1 \1\n/

# if newline is at the end of string -- remove marker and go to the end
/\n$/ {s/\n//; b}

# advance marker
s/\n([^ ]+) ?/ \1\n/


Here is my previous attempt if you are interested:

sed -r 's/$/ =/; h; s/ .*$//;x; s/ +/\n/g; :a; s/[^\n]*\n//; G; /^=/be; ba; :e; s/\n/ /g; s/= //'
= is a marker symbol. It is not unique, so this may cause problems.

crts 01-24-2012 09:31 PM


you could try this:

sed -r ':a;s/([0-9]+) [a-zA-Z]+(.*)/\1 \1\2/g;ta'
However, it has limits. It only works if there are no other numbers in the other fields. E.g., this will fail:

000001 now is the 3rd time

danielbmartin 01-24-2012 10:38 PM


Originally Posted by firstfire (Post 4583517)
Here is a commented sed solution ...

Thank you for this impressive solution, and for commenting so well. It will take me a while to digest and understand it.

Daniel B. Martin

danielbmartin 01-24-2012 10:42 PM


sed -r ':a;s/([0-9]+) [a-zA-Z]+(.*)/\1 \1\2/g;ta'
Thank you for providing this solution. It is relatively concise, and that's a "plus" for comprehension. The restriction is tolerable for my purposes. I'll test all proposed solutions with larger files than the original sample.

Daniel B. Martin

All times are GMT -5. The time now is 03:51 AM.