LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Using sed to search and replace backwards (http://www.linuxquestions.org/questions/programming-9/using-sed-to-search-and-replace-backwards-840275/)

jimieee 10-25-2010 04:34 AM

Using sed to search and replace backwards
 
Hi All,

I'm trying to use sed to search and replace backwards. The problem is that I have a shell script that is required to put commas into big numbers. For example

9999999 as 9,999,999

I've tried a few things, but none seem to work:

Code:

$ echo 9999999 | sed -e 's/\([0-9]\{3\}\)/,\1/g'
,999,9999

$ echo 9999999 | sed -e 's/\([0-9]\{3\}\)$/\1,/g' -e 's/\([0-9]\{3\}\)/\1,/g'
999,999,9,

$ echo 9999999 | sed -e 's/\([0-9]\{3\}\)$/\1,/g' -e 's/\([0-9]\{3\}\)/,\1/g'
,999,9999,

$ echo 9999999 | sed -e 's/\([0-9]\{3\}\)$/,\1/g' -e 's/\([0-9]\{3\}\)/,\1/g'
,9999,,999

$ echo 9999999 | sed -e 's/\([0-9]\{3\}\)$/,\1/g' -e 's/\([0-9]\{3\}\)/,\1/g'

It would be much easier if I could search backwards! For example Bash parameter substitution style:

Code:

$ echo 9999999 | sed -e 's%\([0-9]\{3\}\)%,\1%g'
Or may be someone has a better way to do this altogether...

colucix 10-25-2010 04:44 AM

Keep it simple:
Code:

printf "%'d\n" 9999999

jimieee 10-25-2010 05:04 AM

This doesn't seem to work for me.

Code:

printf "%'d\n" 9999999
9999999

What version of bash is this that you are using? This is mine...

Code:

$ bash --version
GNU bash, version 3.00.16(1)-release (i386-pc-solaris2.10)
Copyright (C) 2004 Free Software Foundation, Inc.


colucix 10-25-2010 05:09 AM

It should depend from your current locale, in particular from the LC_NUMERIC variable. For example, the POSIX locale does not have a thousands separator, while en_US or en_GB should work. What is the output of the locale command (without arguments it should list your current setup)?

In any case something like:
Code:

env LC_NUMERIC=en_GB printf "%'d\n" 9999999
should work.

jimieee 10-25-2010 05:21 AM

Code:

LC_NUMERIC="C"
It still doesn't work as you described though:

Code:

$ env LC_NUMERIC=en_GB printf "%'d\n" 9999999
'd

$ LC_NUMERIC=en_GB printf "%'d\n" 9999999
9999999


jimieee 10-25-2010 05:27 AM

I've found out how to do this in sed now:

Code:

sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'
Thanks to command line fu

http://www.commandlinefu.com/command...or-within-pipe

jimieee 10-25-2010 05:34 AM

I've marked this as 'solved' although I would still be interested to see how I can make the printf solution work and how to actually get sed to search backwards (if this is possible).

GrapefruiTgirl 10-25-2010 05:42 AM

Not sure how exactly to ask sed to search backwards, however here's a workaround for reading a variable backwards:

The `rev` tool:
Code:

root@reactor: var=123456
root@reactor: echo "$var" | rev
654321

So you could do:
Code:

echo "$var" | rev | sed '/blah blah blah/' | rev
However the solution in post #6 looks nice and neat meanwhile. Try this for fun sometime too:
Code:

tac some-file | rev
The locale-based printf doesn't work for me either. My default LC_NUMERIC is "POSIX" but I tried "C", "en_GB" and "en_US" and none changed the output.

colucix 10-25-2010 06:00 AM

Glad to see you found a solution. Regarding the printf issue, does the same happen in awk?
Code:

echo 9999999 | env LC_NUMERIC=en_US awk '{printf "%'\''d\n", $1}'

GrapefruiTgirl 10-25-2010 06:07 AM

@ colucix,

post #9 above does work for me; and, I've discovered that while my shell's `printf` does not work for this, I can use /bin/printf and it works:
Code:

root@reactor: LC_NUMERIC=en_US /bin/printf "%'d\n" 9999999
9,999,999
root@reactor:


colucix 10-25-2010 07:13 AM

@ GrapefruiTgirl,

thank you for reporting that. I also tried on a Solaris machine and I cannot make it to work in any way (even with en_US locale set for all applications). I suspect the "%'d" format has been introduced in a C language specification which has not been adopted by all systems. Maybe, better to stick with the regular expression solution!

jimieee 10-25-2010 07:33 AM

I can see that there are a number of ways to achieve this.

I wanted something quick and easy in sed, or something simple using printf.

I am aware of the more convoluted methods using perl and awk, for example:

Code:

echo "123456" | perl -nwe 'chomp; print reverse($_) . "\n"' | sed -e 's/\([0-9]\{3\}\)/\1,/g' | perl -nwe 'chomp; print reverse($_) . "\n"' | sed -e 's/^,//'
Or just pure perl

Code:

$ echo "12345679" | perl -nwe 'chomp; $reverse=reverse($_); $reverse =~ s/(\d{3})/$1,/g; $reverse =~ s/,$//; print reverse($reverse) . "\n"'
12,345,679

It's not very satisfactory though, because I can't see a good reason why one shouldn't be able to search/replace in reverse order with sed.
Sorry if that makes me sound ignorant - I don't mean to be!


My results with awk:

Code:

$ echo 9999999 | env LC_NUMERIC=en_US awk '{printf "%'\''d\n", $1}'
9999999

tac doesn't exist in my environment (I'm using Solaris), but I guess I could do the same thing with perl...


Using /bin/printf
Code:

$ LC_NUMERIC=en_GB /bin/printf "%'d\n" 9999999
'd

May be there's something specific to Solaris that I'm missing here? It's good to know that it works in (I assume this is where you're testing it) Linux!

colucix 10-25-2010 07:45 AM

Quote:

Originally Posted by jimieee (Post 4138553)
May be there's something specific to Solaris that I'm missing here?

I just checked on the GNU coreutils and GNU C library documentation and it clearly states the format specifier for thousands separator is a GNU specific extension. So there is no way to let it work with native solaris utilities. Here is the relevant link.

jimieee 10-25-2010 08:14 AM

It's a pity.

(Generally) you don't want to actually update the value in your variable - as you might want to go on an use it for mathematical purposes. Most tools will interpret the value as a string if it contains characters other than [0-9].

Being able to printf the thousands separator would be the perfect solution... May be I need to ask my SA to install GNU printf...

grail 10-25-2010 09:37 AM

Thought I would give you a prettied up version of your sed too, only slightly shorter:
Code:

sed -r ':a s/([0-9]+)([^,]{3})/\1,\2/;ta'
-r is your friend when getting rid of pesky back slashes :)


All times are GMT -5. The time now is 06:30 AM.