[SOLVED] sed - text substitution in certain places
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I want to replace OLD with NEW but only where OLD is somewhere between double quotes. I think this is done with regions but haven't been able to dope it out.
Example:
Quote:
Even in his old age Emerson said, "The old houses are better."
would become:
Quote:
Even in his old age Emerson said, "The new houses are better."
$ echo "Even in his old age Emerson said, \"The old houses are better.\"" | sed -e 's/\(\".*\)old\(.*\"\)/\1new\2/g'
Even in his old age Emerson said, "The new houses are better."
try the following:
sed -e 's/\(\".*\)old\(.*\"\)/\1new\2/g'
Thank you, arshadul, for your prompt response. It works for the given example, but not in all cases. Allow me to restate the question, hoping to clarify.
I want to replace OLD with NEW anywhere and everywhere that OLD is somewhere between double quotes.
Examples:
Quote:
Even in his old age Emerson said, "The old houses are better."
old dogs old habits "old houses old cars" old men old computers
would become:
Quote:
Even in his old age Emerson said, "The new houses are better."
old dogs old habits "new houses new cars" old men old computers
Your suggestion handles the first line correctly but not the second.
I believe you would need to use the testing options of either 'b' or 't' for sed to process this correctly (could be wrong of course).
As an alternative, awk can do this rather easily:
Yes, a "t" loop probably is the best way to solve the above.
Code:
sed -r -e ':loop; s/["](.*)\bold\b(.*)["]/"\1new\2"/ ; t loop'
Using the -r option and [..] character class brackets also avoids the need for backslashing everything, making it a bit more readable.
One more minor issue is that "old" would also match sub-strings in words like "cold" and "olden", so I added the \b word boundary anchor at each end. To still match word variations like old/older/oldest, you can stick in yet another set of capture parentheses.
Code:
sed -r -e ':loop; s/["](.*)\bold(er|est)?\b(.*)["]/"\1new\2\3"/ ; t loop'
Last edited by David the H.; 01-17-2012 at 08:48 AM.
Reason: minor code correction
Thank you, David the H., for your timely response. With further testing a flaw is detected. This flaw may be attributed to an ambiguity in the phrase "between double quotes."
Input line:
Quote:
old dogs "old habits" old sayings "old men old coins" old houses
Result:
Quote:
old dogs "new habits" new sayings "new men new coins" old houses
Desired result:
Quote:
old dogs "new habits" old sayings "new men new coins" old houses
Allow me to restate the question (again), to sharpen the spec.
I want to replace OLD with NEW anywhere and everywhere that OLD is somewhere between *pairs of* double quotes.
Hmm, I see. that is a problem. And yes, I understand the requirements.
However, technically, I think that is what sed's doing. The "old sayings" string is between a pair of double-quotes, so the loop is affecting that too.
The thing is, handling matched pairs of characters like this, quotes, parentheses, whatever, has always been a very tricky thing to deal with. There have been quite a few long threads here discussing such things.
Ok, here's one more try that works on your sample text, at least.
Code:
sed -r -e ':loop; s/\B["]([^"]*)\bold(er|est)?\b([^"]*)["]\B/"\1new\2\3"/ ; t loop'
Overall, there probably isn't any single sed solution that would be able to handle every possible variation of text. You may have to go with awk or perl instead, like with grail's above, and just tackle each specific situation as it comes up.
The awk solution still works for the current example and if the requirement is like David has said that words may also contain old and should not be changed then simply use \<old\>
for the regex
I think the problem is with the description of the objective, which I believe would be better stated as 'I want to replace OLD with NEW anywhere and everywhere that OLD is somewhere between balanced double quotes.' Even that is somewhat ambiguous for some cases.
--- rod.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.