LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 03-12-2012, 04:42 PM   #1
rm_-rf_windows
Member
 
Registered: Jun 2007
Location: Europe
Distribution: Ubuntu
Posts: 292

Rep: Reputation: 27
sed: spaces, quotes, alternative patterns, substitution


Hi all,

I've been struggling with sed for over 2 hours now and thought I'd post my problem.

I need the following to be changed into the following:
Code:
THIS --> THAT
"hello       " --> "hello" (quotations included)
"       hello" --> "hello" (quotations always included)
"    hello    " --> "hello" ...
"      John F. Kennedy    " --> "John F. Kennedy"
"  Secret Agent 007    " --> "Secret Agent 007" 
"(space)+(anything but a space)+((space)?(not a space)+)*(space)+" --> "(anything but a space)+((space)?(not a space)+)*"
I just can't figure it out! Not with sed, I've done it in SQL scripts because I can regroup patterns with parentheses. However I'm not sure how to do this with sed.

Many thanks,

rm
 
Old 03-12-2012, 04:52 PM   #2
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Code:
sed -r 's@("[[:space:]]+|[[:space:]]+")@"@g'
For example:
Code:
user@localhost$ echo '"hello       "' | sed -r 's@("[[:space:]]+|[[:space:]]+")@"@g'
"hello"
user@localhost$ echo '"    hello    "' | sed -r 's@("[[:space:]]+|[[:space:]]+")@"@g'
"hello"
user@localhost$ echo '"      John F. Kennedy    "' | sed -r 's@("[[:space:]]+|[[:space:]]+")@"@g'
"John F. Kennedy"
 
Old 03-12-2012, 04:54 PM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Tried this?
Code:
sed -r 's/"[[:blank:]]*(.+[^[:blank:]])[[:blank:]]*"/"\1"/g' file
The part highlighted in green is mandatory to take in account spaces in the middle of the string. It means a non-blank character followed by a sequence of blanks (if any) immediately before the closing quotes.
 
Old 03-12-2012, 06:41 PM   #4
rm_-rf_windows
Member
 
Registered: Jun 2007
Location: Europe
Distribution: Ubuntu
Posts: 292

Original Poster
Rep: Reputation: 27
Ciao ragazzi,

Thanks for the replies. Dark Helmet's solution works with the g option, not without, colucix's works with or without the g option, I don't know why.

Code:
$ echo '"     Ciao ragazzi!!     "' | sed -r 's@("[[:space:]]+|[[:space:]]+")@"@g'
"Ciao ragazzi!!"
$ echo '"     Ciao ragazzi!!     "' | sed -r 's@("[[:space:]]+|[[:space:]]+")@"@'
"Ciao ragazzi!!     "
$ echo '"     Ciao ragazzi!!     "' |  sed -r 's/"[[:blank:]]*(.+[^[:blank:]])[[:blank:]]*"/"\1"/'
"Ciao ragazzi!!"
$ echo '"     Ciao ragazzi!!     "' |  sed -r 's/"[[:blank:]]*(.+[^[:blank:]])[[:blank:]]*"/"\1"/g'
"Ciao ragazzi!!"
$
In any case, the problem is solved, this thread can be closed. I don't know how to mark this "[SOLVED]". That was quick, effective.

Many thanks!

rm
 
Old 03-13-2012, 03:20 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by rm_-rf_windows View Post
Ciao ragazzi,
Thanks for the replies. Dark Helmet's solution works with the g option, not without, colucix's works with or without the g option, I don't know why.
Ciao. The solution by Dark_Helmet requires the g option to do more than one substitution, since it has an alternate pattern, that is it substitutes a pattern OR another pattern. Without g the first matched pattern is substituted and the rest is ignored. My solution has a unique pattern that spans all over the input line.

Furthermore, I noticed that Dark's solution makes extra (and maybe unwanted) substitutions if the quoted text is inside a longer line, e.g.
Code:
$ echo 'Io li vidi da lontano e dissi "   Ciao ragazzi!! " e lei si voltò verso di me' | sed -r 's@("[[:space:]]+|[[:space:]]+")@"@g'
Io li vidi da lontano e dissi"   Ciao ragazzi!!" e lei si voltò verso di me
Notice the space before the opening quote has been removed and the spaces after it has been preserved. Just for the sake of exactness!
 
Old 03-13-2012, 06:45 AM   #6
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Both colucix's solution and mine operate on assumptions regarding the data set. Both assumptions are valid given the sample data you provided.

As colucix correctly pointed out, a string outside of those assumptions will give unexpected/undesireable results. The same is true for his solution as well.

Code:
echo 'He said,   "    where are they?  ", and she responded with, "   right there!  "' | sed -r 's/"[[:blank:]]*(.+[^[:blank:]])[[:blank:]]*"/"\1"/g'
He said,   "where are they?  ", and she responded with, "   right there!"
My solution assumes that there will be no instance where spaces need to be removed both before and after a double quote whereas colucix's solution assumes there is only one double quote pair on the line.

That's the thing about regular expressions. The more detail you give about the data set, the more accurate the solution.
 
Old 03-13-2012, 08:46 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by Dark_Helmet View Post
colucix's solution assumes there is only one double quote pair on the line.

That's the thing about regular expressions. The more detail you give about the data set, the more accurate the solution.
Totally agreed!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Export path with spaces - escape characters and quotes not working - Ubuntu 10.04 clustro Linux - Newbie 2 02-04-2012 06:29 AM
SED - substitution carolflb Linux - Newbie 5 02-06-2010 12:20 AM
Problems with a substitution using sed wtaicken Programming 4 12-15-2008 04:04 AM
sed substitution with p flag 7stud Linux - Newbie 2 03-03-2007 04:15 AM
Remembering patterns and printing only those patterns using sed bernie82 Programming 5 05-26-2005 05:18 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 03:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration