LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Sed and regex: how to replace up to a certain string (http://www.linuxquestions.org/questions/linux-general-1/sed-and-regex-how-to-replace-up-to-a-certain-string-761776/)

olliesa 10-14-2009 04:26 AM

Sed and regex: how to replace up to a certain string
 
Hi,
I'm trying to use sed to match a particular string and everything up until some other string. For example, I want to replace "foo" and everything up until "XX" with "bar". However, if "XX" is not in the string, then it should replace everything up until the end of the line.

example:
"foo blah blah XX more text" should become "bar XX more text"
"foo blah blah more text" should become "bar"

Here is what I have so far:

sed 's/foo\(.\(\?!XX\)\)*/bar/g' < input.txt

Thanks for your help!

vonbiber 10-14-2009 05:02 AM

I did the following test.
1. wrote this script
<code>
#!/bin/sh

cat > input.txt <<EOF
foo blah blah XX more text
foo blah blah more text
blah blah more text
XX foo blah blah more text
EOF

cat input.txt
echo '-----------------'
sed 's?XX??g' input.txt | \
sed 's?^[^]*$?&?' |
sed 's?^\([^]*\)\(.\)?bar XX\2?g' |
sed 's?^[^]*$?bar?'
</code>
2. then I ran the script and got the output below
<code>
$ ./bogus.sh
foo blah blah XX more text
foo blah blah more text
blah blah more text
XX foo blah blah more text
-----------------
bar XX more text
bar
bar
bar XX foo blah blah more text
</code>

Is that what you're looking for?

The first seder replaces XX by the degree character that
shouldn't be present in your input file.
Then this degree character is placed at the end of all the lines that
don't have one already
The 3rd seder replaces the '... degree' by 'bar XX' in the lines where
the degree character is followed by at least one character
The 4th seder replace all the lines that have only one degree
character at the end by 'bar'

olliesa 10-21-2009 02:49 AM

Thanks for the help. That will work, but it is a little cumbersome... I thought there would be a way to do this using negative lookahead in the sed regular expression.


All times are GMT -5. The time now is 10:34 AM.