Regex or Sed

grob115 · 10-02-2012, 10:14 AM

Hi, I need to perform a simple substitution by replacing whitespaces with a character but only within the middle of a specific identifiable patterns. For example, if I have the following:
Name, Sex, Address
Tom, M, 15 Broadway
Mary, F, 80 Maple Street

I need to transform it to the following (not in Excel but with command lines please) by adding hyphens to the address field. It can be anything but this is just an example.
Name, Sex, Address
Tom, M, 15-Broadway
Mary, F, 80-Maple-Street

I need to have the flexibility of specifying some type of pattern to frame where I want the translation or replacement to take place. In this case:
Start Pattern = ^.*,\s(M|F),\s
Stop Pattern = $

Can someone please show me how this can be done? Thanks. Thought about using sed but it can do only the whole line as far as I know and not a portion of the line. Not sure how to use regex to replace aside from specifying start and stop patterns.

whizje · 10-02-2012, 11:22 AM

Code:

echo "Mary, F, 80 Maple Street" | sed -r 's/([^,])[ ]/\1-/g'
Mary, F, 80-Maple-Street

Code:

-r   - use regular expression
's   - substitute
/    -start regular expression which needs to be replaced in this case we want to replace a space if it is between chars except a comma
(    -start block
[^,] -for the space can occur any char except comma
)    -end block we save this char else it also get's replaced exp:

Code:

echo "Mary, F, 80 Maple Street" | sed -r 's/([^,])[ ]/-/g'
Mary, F, 8-Mapl-Street

Code:

[ ]  - space eventually you can use [ ]* for multiple spaces
/    -end regular expression start replace part
\1   -print chars between start block and end block
-    -print -
/    -end replace
g    -do this globally else only the first space is converted exp:

Code:

echo "Mary, F, 80 Maple Street" | sed -r 's/([^,])[ ]/\1-/'
Mary, F, 80-Maple Street

firstfire · 10-02-2012, 11:30 AM

Hi.

Here is awk approach:

Code:

$ cat in 
Name, Sex, Address
Tom, M, 15 Broadway
Mary, F, 80 Maple Street
$ awk  '{gsub(" +", "-", $3);}1' FS=' *, *' OFS=, in
Name, Sex, Address
Tom,M,15-Broadway
Mary,F,80-Maple-Street

Magic `1' here is a "pattern", which always evaluates to TRUE and, because there are no associated action, this action defaults to 'print $0'. $3 means that we want to perform substitution only on 3rd field (delimited by commas).

danielbmartin · 10-02-2012, 06:54 PM

OP has this input file:

Code:

$ cat in 
Name, Sex, Address
Tom, M, 15 Broadway
Mary, F, 80 Maple Street

firstfire, your awk ...

Code:

$ awk  '{gsub(" +", "-", $3);}1' FS=' *, *' OFS=, in

... produced this...

Code:

Name, Sex, Address
Tom,M,15-Broadway
Mary,F,80-Maple-Street

... but OP wanted this ...

Code:

Name, Sex, Address
Tom, M, 15-Broadway
Mary, F, 80-Maple-Street

Easy fix:

Code:

awk '{gsub(" +", "-", $3);}1' FS=' *, *' OFS=', ' $InFile

Daniel B. Martin

Tinkster · 10-02-2012, 08:04 PM

Quote:

Originally Posted by whizje

Code:

echo "Mary, F, 80 Maple Street" | sed -r 's/([^,])[ ]/\1-/g'
Mary, F, 80-Maple-Street

Code:

-r   - use regular expression
's   - substitute
/    -start regular expression which needs to be replaced in this case we want to replace a space if it is between chars except a comma
(    -start block
[^,] -for the space can occur any char except comma
)    -end block we save this char else it also get's replaced exp:

Code:

echo "Mary, F, 80 Maple Street" | sed -r 's/([^,])[ ]/-/g'
Mary, F, 8-Mapl-Street

Code:

[ ]  - space eventually you can use [ ]* for multiple spaces
/    -end regular expression start replace part
\1   -print chars between start block and end block
-    -print -
/    -end replace
g    -do this globally else only the first space is converted exp:

Code:

echo "Mary, F, 80 Maple Street" | sed -r 's/([^,])[ ]/\1-/'
Mary, F, 80-Maple Street

You can easily enough incorporate his "condition", too:

Code:

echo "Name, Sex, Street Address
Tom, M, 15 Broadway
Mary, F, 80 Maple Street
" |  sed -r '/^[^ ]+, (M|F),/ s/([^,])[ ]/-/g'
Name, Sex, Street Address
Tom, M, 1-Broadway
Mary, F, 8-Mapl-Street

I chucked the Street only in there for illustration of the
fact that the condition works ;}

Cheers,
Tink

danielbmartin · 10-02-2012, 08:29 PM

A couple of the proposed solutions in this thread took "80 Maple" and turned it into "8-Mapl". A 0 and an e were lost. That isn't right, is it?

Daniel B. Martin

Tinkster · 10-02-2012, 09:00 PM

Quote:

Originally Posted by danielbmartin

A couple of the proposed solutions in this thread took "80 Maple" and turned it into "8-Mapl". A 0 and an e were lost. That isn't right, is it?

Daniel B. Martin

No, no it's not ... and my apologies for not actually checking
the output of the command I quoted against the input, and blindly
assuming it did what was needed :)

Code:

echo "Name, Sex, Street Address
Tom, M, 15 Broadway
Mary, F, 80 Maple Street
" |  sed -r '/^[^ ]+, (M|F),/ s/([a-zA-Z0-9]+) +/\1-/g'
Name, Sex, Street Address
Tom, M, 15-Broadway
Mary, F, 80-Maple-Street

This seems to do better :)

Cheers,
Tink