[SOLVED] sed

danielbmartin · 01-16-2012, 05:18 PM

I want to replace OLD with NEW but only where OLD is somewhere between double quotes. I think this is done with regions but haven't been able to dope it out.

Example:

Quote:

Even in his old age Emerson said, "The old houses are better."

would become:

Quote:

Even in his old age Emerson said, "The new houses are better."

Daniel B. Martin

arshadul · 01-16-2012, 06:01 PM

try the following:

sed -e 's/$\".*$old$.*\"$/\1new\2/g'

as in

$ echo "Even in his old age Emerson said, \"The old houses are better.\"" | sed -e 's/$\".*$old$.*\"$/\1new\2/g'
Even in his old age Emerson said, "The new houses are better."

danielbmartin · 01-16-2012, 09:47 PM

Quote:

Originally Posted by arshadul

try the following:
sed -e 's/$\".*$old$.*\"$/\1new\2/g'

Thank you, arshadul, for your prompt response. It works for the given example, but not in all cases. Allow me to restate the question, hoping to clarify.

I want to replace OLD with NEW anywhere and everywhere that OLD is somewhere between double quotes.

Examples:

Quote:

Even in his old age Emerson said, "The old houses are better."
old dogs old habits "old houses old cars" old men old computers

would become:

Quote:

Even in his old age Emerson said, "The new houses are better."
old dogs old habits "new houses new cars" old men old computers

Your suggestion handles the first line correctly but not the second.

Daniel B. Martin

grail · 01-16-2012, 11:07 PM

I believe you would need to use the testing options of either 'b' or 't' for sed to process this correctly (could be wrong of course).
As an alternative, awk can do this rather easily:

Code:

awk 'BEGIN{ RS="\"" }{ORS = RT}!(NR%2){gsub(/old/,"new")}1' file

David the H. · 01-17-2012, 08:41 AM

Yes, a "t" loop probably is the best way to solve the above.

Code:

sed -r -e ':loop; s/["](.*)\bold\b(.*)["]/"\1new\2"/ ; t loop'

Using the -r option and [..] character class brackets also avoids the need for backslashing everything, making it a bit more readable.

One more minor issue is that "old" would also match sub-strings in words like "cold" and "olden", so I added the \b word boundary anchor at each end. To still match word variations like old/older/oldest, you can stick in yet another set of capture parentheses.

Code:

sed -r -e ':loop; s/["](.*)\bold(er|est)?\b(.*)["]/"\1new\2\3"/ ; t loop'

danielbmartin · 01-17-2012, 09:14 AM

Thank you, David the H., for your timely response. With further testing a flaw is detected. This flaw may be attributed to an ambiguity in the phrase "between double quotes."

Input line:

Quote:

old dogs "old habits" old sayings "old men old coins" old houses

Result:

Quote:

old dogs "new habits" new sayings "new men new coins" old houses

Desired result:

Quote:

old dogs "new habits" old sayings "new men new coins" old houses

Allow me to restate the question (again), to sharpen the spec.

I want to replace OLD with NEW anywhere and everywhere that OLD is somewhere between *pairs of* double quotes.

Daniel B. Martin

David the H. · 01-17-2012, 10:22 AM

Hmm, I see. that is a problem. And yes, I understand the requirements.

However, technically, I think that is what sed's doing. The "old sayings" string is between a pair of double-quotes, so the loop is affecting that too.

The thing is, handling matched pairs of characters like this, quotes, parentheses, whatever, has always been a very tricky thing to deal with. There have been quite a few long threads here discussing such things.

Ok, here's one more try that works on your sample text, at least.

Code:

sed -r -e ':loop; s/\B["]([^"]*)\bold(er|est)?\b([^"]*)["]\B/"\1new\2\3"/ ; t loop'

Overall, there probably isn't any single sed solution that would be able to handle every possible variation of text. You may have to go with awk or perl instead, like with grail's above, and just tackle each specific situation as it comes up.

grail · 01-17-2012, 12:08 PM

The awk solution still works for the current example and if the requirement is like David has said that words may also contain old and should not be changed then simply use \<old\>
for the regex

theNbomr · 01-17-2012, 01:00 PM

I think the problem is with the description of the objective, which I believe would be better stated as 'I want to replace OLD with NEW anywhere and everywhere that OLD is somewhere between balanced double quotes.' Even that is somewhat ambiguous for some cases.
--- rod.

danielbmartin · 01-17-2012, 09:21 PM

[QUOTE=David the H.;4576977]

Code:

sed -r -e ':loop; s/\B["]([^"]*)\bold(er|est)?\b([^"]*)["]\B/"\1new\2\3"/ ; t loop'

I'm happy with this and modified it to suit the actual application. Thank you, David the H. for your effort and advice. Let's mark this one SOLVED!

Daniel B. Martin