[SOLVED] sed: spaces, quotes, alternative patterns, substitution
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
sed: spaces, quotes, alternative patterns, substitution
Hi all,
I've been struggling with sed for over 2 hours now and thought I'd post my problem.
I need the following to be changed into the following:
Code:
THIS --> THAT
"hello " --> "hello" (quotations included)
" hello" --> "hello" (quotations always included)
" hello " --> "hello" ...
" John F. Kennedy " --> "John F. Kennedy"
" Secret Agent 007 " --> "Secret Agent 007"
"(space)+(anything but a space)+((space)?(not a space)+)*(space)+" --> "(anything but a space)+((space)?(not a space)+)*"
I just can't figure it out! Not with sed, I've done it in SQL scripts because I can regroup patterns with parentheses. However I'm not sure how to do this with sed.
sed -r 's/"[[:blank:]]*(.+[^[:blank:]])[[:blank:]]*"/"\1"/g' file
The part highlighted in green is mandatory to take in account spaces in the middle of the string. It means a non-blank character followed by a sequence of blanks (if any) immediately before the closing quotes.
Ciao ragazzi,
Thanks for the replies. Dark Helmet's solution works with the g option, not without, colucix's works with or without the g option, I don't know why.
Ciao. The solution by Dark_Helmet requires the g option to do more than one substitution, since it has an alternate pattern, that is it substitutes a pattern OR another pattern. Without g the first matched pattern is substituted and the rest is ignored. My solution has a unique pattern that spans all over the input line.
Furthermore, I noticed that Dark's solution makes extra (and maybe unwanted) substitutions if the quoted text is inside a longer line, e.g.
Code:
$ echo 'Io li vidi da lontano e dissi " Ciao ragazzi!! " e lei si voltò verso di me' | sed -r 's@("[[:space:]]+|[[:space:]]+")@"@g'
Io li vidi da lontano e dissi" Ciao ragazzi!!" e lei si voltò verso di me
Notice the space before the opening quote has been removed and the spaces after it has been preserved. Just for the sake of exactness!
Both colucix's solution and mine operate on assumptions regarding the data set. Both assumptions are valid given the sample data you provided.
As colucix correctly pointed out, a string outside of those assumptions will give unexpected/undesireable results. The same is true for his solution as well.
Code:
echo 'He said, " where are they? ", and she responded with, " right there! "' | sed -r 's/"[[:blank:]]*(.+[^[:blank:]])[[:blank:]]*"/"\1"/g'
He said, "where are they? ", and she responded with, " right there!"
My solution assumes that there will be no instance where spaces need to be removed both before and after a double quote whereas colucix's solution assumes there is only one double quote pair on the line.
That's the thing about regular expressions. The more detail you give about the data set, the more accurate the solution.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.