sed add , before .[0-9]

donnied · 09-29-2007, 07:08 PM

I would like to use sed to replace 1979 with ,1979

I was thinking something along the lines of:

sed 's/*.^[0-9]/,/' mycoolfile

but that's not quite right. How do I specify replacing the space before something?

jschiwal · 09-29-2007, 09:05 PM

Code:

sed 's/1979/,1979/' yourcoolfile

This is for your first line. I don't understand the last question. If you want to replace a space before any 4 digit number,

Code:

$ cat sample
abcd 1234 2007 3456 11111
abcde 111 111 22 33 1267

jschiwal@hpmedia ~
$ sed 's/ \([[:digit:]]\{4\}[^[:digit:]]\)/,\1/g' sample
abcd,1234 2007,3456 11111
abcde 111 111 22 33 1267

Guess what, it's wrong. This rule doesn't work if the 4 digit number is at the end of a line.My bad, but it illustrates the importance of testing sed scripts before using them.

Code:

$ cat sample
abcd 1234 2007 3456 11111
abcde 111 111 22 33 1267

jschiwal@hpmedia ~
$ sed 's/ \([[:digit:]]\{4\}[^[:digit:]]\)/,\1/g;s/ \([[:digit:]]\{4\}\)$/,\1/'
 sample
abcd,1234 2007,3456 11111
abcde 111 111 22 33,1267

Here I added a second rule. You can have more than one rule on the same line by seperating them with a semicolon. For more complicated sed instructions, create a separate file.

You didn't make clear whether the number needs to be exactly 4 digits. You only gave one example that resembled a year. It is important using sed, awk or any regular expression to be as precise as you need to be. Otherwise you will either miss some replacements like my first attempt, or have a false positive match with could cause a replacement you don't want.

angrybanana · 09-30-2007, 02:46 AM

Quote:

Originally Posted by jschiwal

Code:

$ cat sample
abcd 1234 2007 3456 11111
abcde 111 111 22 33 1267

jschiwal@hpmedia ~
$ sed 's/ \([[:digit:]]\{4\}[^[:digit:]]\)/,\1/g;s/ \([[:digit:]]\{4\}\)$/,\1/'
 sample
abcd,1234 2007,3456 11111
abcde 111 111 22 33,1267

shouldn't 2007 have a ',' before it? running the sed expression twice should fix this issue.

donnied · 09-30-2007, 10:20 AM

Quote:

Originally Posted by jschiwal

Code:

sed 's/1979/,1979/' yourcoolfile

If you want to replace a space before any 4 digit number,

Thank you. I thought of that later. However, you're right. My true intent is to place a comma before a four digit string of numbers.

Quote:

Originally Posted by jschiwal

Code:

$ sed 's/ \([[:digit:]]\{4\}[^[:digit:]]\)/,\1/g;s/ \([[:digit:]]\{4\}\)$/,\1/'
 sample

I hadn't seen posix character classes until last night. It looks like the way to go. I was able to identify four digit years with [0000-9999]; however, the replacement part didn't work out well. I can specify

Code:

sed 's/ .^[0000-9999]/ ,but how do keep the same numbers when I replace?

Quote:

Originally Posted by jschiwal

You didn't make clear whether the number needs to be exactly 4 digits. You only gave one example that resembled a year. It is important using sed, awk or any regular expression to be as precise as you need to be. Otherwise you will either miss some replacements like my first attempt, or have a false positive match with could cause a replacement you don't want.

Thank you again. This was helpful. I'll try to deconstruct it.

On a side note:
What if I wanted the comma before any size string of numbers? Is there a way without :digit: or setting variables?
how do I specify a line break? (I want to replace a line break and three tabs with the last entry on the line that did not consist of tabs.)

Some Guy- wrote this book 2007
{tabx3} wrote this book 2006
{tabx3} wrote another book 2005

then becomes

Some Guy, book a, 2007
Some Guy, book b, 2006
Some Guy, book c, 2005

Thanks again. I've been reading through O'reilly "Learning Sed and Awk", man pages, and some online articles, but the answers aren't always obvious (to me). Is there another resource that would be worth looking at?

angrybanana · 09-30-2007, 07:32 PM

Quote:

Originally Posted by donnied

Code:

sed 's/ .^[0000-9999]/ ,but how do keep the same numbers when I replace?

You keep the number by using groups. enclose whatever you want to keep in '()' then call it back using '\1', multiple groups will have multiple numbers \1 \2 \3..etc.. ex.

Code:

$ echo "foobar"|sed 's/.*\(oo\).*/m\1/'
moo

the 'oo' = group #1, however the whole match is replaced with 'm'+group1. hence 'moo'

Here's a more relevant example.

Code:

echo "book 1979"|sed 's/ \([0-9]\{4\}\)/, \1/'
book, 1979

This matches *space* then 0-9 (4 times). Only the 4 numbers are put into group one. The whole match (space + number) is replaced with ', '+group 1

Hope that makes sense..

Quote:

Originally Posted by donnied

What if I wanted the comma before any size string of numbers? Is there a way without :digit: or setting variables?
how do I specify a line break? (I want to replace a line break and three tabs with the last entry on the line that did not consist of tabs.)

1. [[:digit:]] == [0-9]. Not using either of those two will just be difficult. [0-9]+ will match 1 or more repetitions of [0-9]
2. ^ matches the start of a line $ matches the end of a line. '^some guy' matches lines that start with 'some guy'.

I'm not too good with awk/sed, but I'll try to give you an awk solution for your example in a bit...if i figure it out

Quote:

Originally Posted by donnied

Thanks again. I've been reading through O'reilly "Learning Sed and Awk", man pages, and some online articles, but the answers aren't always obvious (to me). Is there another resource that would be worth looking at?

sed = http://sed.sf.net/sedfaq.html | http://xrl.us/sedintro#uh-0 | http://xrl.us/sedstd | http://www.gnu.org/software/sed/manual/
awk = http://www.gnu.org/software/gawk/manual/ | http://catonmat.net/download/awk.cheat.sheet.txt

those are the topic's at irc.freenode.net #awk #sed channels. Which is another great source of info if you're ever stuck trying to figure out something.

Edit:
woohoo! I did it.

Code:

$ cat sample
Some Guy- wrote this book 2007
                        wrote this book 2006
                        wrote another book 2005
other guy- wrote this book 2008
                        book b 2009

$ awk -F'- ' 'BEGIN {OFS="- "}
{if (!/^\t\t\t/) name=$1;
else {sub("^\t\t\t", "", $0);$2=$0;$1=name}}
{gsub(" [0-9][0-9][0-9][0-9]", ",&",$2);print $0}' "sample"

Some Guy- wrote this book, 2007
Some Guy- wrote this book, 2006
Some Guy- wrote another book, 2005
other guy- wrote this book, 2008
other guy- book b, 2009

ghostdog74 · 09-30-2007, 08:27 PM

Quote:

Originally Posted by donnied

My true intent is to place a comma before a four digit string of numbers.

assuming this is really what you want. data tested is from jschiwal's post.

Code:

awk '{
        for(i=1;i<=NF;i++){
         if( $i+0 && length($i) == 4  ) {
           $i = ","$i           
         }
         printf "%s " ,$i
         
        }
        printf "\n"
}

output:

Code:

# ./test.sh
abcd ,1234 ,2007 ,3456 11111
abcde 111 111 22 33 ,1267

Quote:

I've been reading through O'reilly "Learning Sed and Awk", man pages, and some online articles, but the answers aren't always obvious (to me). Is there another resource that would be worth looking at?

are you looking for the answers in that book? or are you learning how to get to the answers?

donnied · 10-01-2007, 06:11 AM

Quote:

Originally Posted by angrybanana

You keep the number by using groups. enclose whatever you want to keep in '()' then call it back using '\1', multiple groups will have multiple numbers \1 \2 \3..etc.. ex.

I'm not too good with awk/sed, but I'll try to give you an awk solution for your example in a bit...if i figure it out

sed = http://sed.sf.net/sedfaq.html | http://xrl.us/sedintro#uh-0 | http://xrl.us/sedstd | http://www.gnu.org/software/sed/manual/
awk = http://www.gnu.org/software/gawk/manual/ | http://catonmat.net/download/awk.cheat.sheet.txt

Wow! Thank you for the information. You explained really well. That was an amazing amount of work you did. I appreciate it and it has helped my understanding.

jschiwal · 10-02-2007, 02:26 AM

I'll have to look at the example I posted to see why the 2007 isn't handled. I think that the "[^[:digit:]]" goobles up the space before the next number. Adding the first command again solves the problem.

Code:

sed 's/ \([0-9]\{4\}[^[0-9]\)/,\1/g;s/ \([0-9]\{4\}[^[0-9]\)/,\1/g;s/ \([[:digit:]]\{4\}\)$/,\1/' sample