LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Sed/awk/cut to pull a repeating string out of a longer string (http://www.linuxquestions.org/questions/programming-9/sed-awk-cut-to-pull-a-repeating-string-out-of-a-longer-string-935677/)

StupidNewbie 03-21-2012 09:07 AM

Sed/awk/cut to pull a repeating string out of a longer string
 
Hi everyone,

I'm looking for a sed or awk statement to do something like the following:

With a string like dc=some,dc=domain,dc=dot,dc=com I can use tr to change all the commas to dots very easily, but then I'm left with dc=some.dc=domain.dc=dot.dc=com. Now I want to remove all the "dc=" to be left with some.domain.dot.com.

When I used sed e.g.
Code:

sed 's/[dc=]//'
it did not replace all of the instances, and also is replacing the characters individually so if there is a "c" or a "d" anywhere in the domain name it will break it. I could use cut with "dc=" as the delimiter but cut doesn't support multi-character dlimiters.

The other caveat is that I am doing this for a list of strings like this, and they might not all have 4 domain components. Some might have 6, some might have just 2. It really is unknown how many there will be as it depends on the domain.

Can anyone offer some insight on how I can dynamically cut these things out? The ultimate goal is to attach this to the end of a url e.g. http://www.example.some.domain.dot.com.

Thanks!

catkin 03-21-2012 09:13 AM

sed 's/dc=//g'

StupidNewbie 03-21-2012 09:46 AM

That did it! thanks!

David the H. 03-21-2012 01:47 PM

To clarify your mistake, the [] brackets in regular expressions constitute a list of individual characters that can be matched at that location. On its own the whole bracket expression matches only a single character on the line.

Code:

sed 's/^[lLbB]oobar/Foobar/g' infile
This will match any line starting with "loobar", "Loobar", "boobar", or "Boobar", and change that string to "Foobar". But it will not match "bBoobar" or any other multi-character combination.

Similarly. 's/[dc=]//' will match the first "d", the first "c", or the first "=" on the line (whichever is encountered first), and remove it. It will not match the whole string"dc=", although it could match one of the individual characters, if it's the first instance on the line.

I say only the first, because you need to add the "g" option at the end of the s/// command to do multiple substitutions on a single line.


I suggest you take some time to really learn how to use regular expressions. You'll be glad you did. Here are a couple of tutorials:

http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html


And more about using sed here:

http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/grabbag/
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt


All times are GMT -5. The time now is 09:30 AM.