RegEx is not working.
Hi I am trying to extract string "SOA_BLD02_CAS" from regex.txt
<ias-instance id="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au" name="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au"> My regex is cat regex.txt |sed 's;\(.*name="\)\(.*[^[:lower:]]\);\2;' and it is not working:(:( Output is as follows: SOA_BLD02_CAS.bld02cashost01.myorganization.com.au"> but that is not what i want :( Please help. Thanks |
You don't need to cat regex.txt into sed. Just add regex.txt to the end of the sed command. The sed command will not modify the file unless you use the '-i' command (see man sed).
As for your regular expression... Your attempt to use '[^[:lower:]]' is (1) not accompanied by a modifier (e.g. an asterisk or plus) and (2) ineffective because of the greedy nature of regular expressions. The '.*' in the second set of parentheses is greedy--matching everything up to the end of the line. It "overpowers" your not-in-lower-set expression--forcing the not-in-lower-set to match (I assume) the end-of-line character itself. Feel free to tinker with your expression to make it work, but the way I approached it: Code:
sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt |
Firstly, cat is not required as sed can read a file.
You are testing for not lower after you say give me everything, ie .* What i think you are looking for is not a period (.), so something like: Code:
sed -r 's/.*name="([^.]*)\..*/\1/' regex.txt |
First of all Many Thanks to both of you. Both solutions work and gives me exactly what I am looking for.Fantatsic !
If you have few mins, could you please explain me what magic it was, which worked, since I am learning and finding it hard to understand. @Dark_Helmet sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt From what I understand, first it looks for any characters upto name=" but we are not keeping in register. Later \([^.]\+\)..* could not understand this, I am pretty sure you have used register to keep in memory to refer in future @grail sed -r 's/.*name="([^.]*)\..*/\1/' regex.txt From what I understand, first it looks for any characters upto name=" but we are not keeping in register. Later ([^.]*)\..* could not understand this, Interesting enough you use () to use register I was thing register is used by \( \), not I am confused which way should register be used? Please assist in understand the concept. Thank you very much. Cheers ---------- Post added 02-13-12 at 11:46 PM ---------- Thread is solved but I am trying to understand the concept. |
The use of -r switch with sed allows you to not have to escape the brackets when saving a register.
As for ([^.]*)\..* ... this is 2 parts: 1. ([^.]*) - This says to store everything (zero or more) that is not a period ... carat, ^, inside square brackets negates what you are looking for. Also a period, . , does not have to be escaped when inside square brackets to be accepted as a literal period 2. \..* - This essentially everything from the period that we did not previously save until the end of the line. Hope that helps. PS. If you look through DH's solution you should be able to know figure out his as well as it is almost exactly the same :) |
Many thanks grail for your help, it is much appreciated.
1 question though so far as DH solution goes, DH says sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt Now so far as \..* is concerend I got that one as explained by you nicely but [^.]\+ does this means store one or more character which is not period, what does \+ means? |
I am not sure why DH escaped the + which does mean one or more of the preceding pattern. It could just be a better safe than sorry touch :)
|
In short, sed will not match the expected pattern without the \+. Try it:
Code:
echo '<ias-instance id="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au" name="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au">'| sed 's@.*name="\([^.]+\)\..*@\1@' As I understand it, without the backslash to escape it, sed will interpret the + as a literal character to match--not a pattern modifier. EDIT: I ran some tests, and the unescaped + causes some odd results. It appears to act as both: a literal + and a pattern modifier at the same time. For instance: Code:
$ echo "D_Hnope" | sed 's@\([^n]+\).*@\1@' /EDIT EDIT2: One instance of my alleged "documentation out there" appears to be this page. /EDIT2 This is a consequence of invoking sed without and with the -r option. This is the same reason why the parentheses are escaped in my expression--otherwise sed will want to match a literal open/close parenthesis. Using '-r' tells sed to use "extended regular expressions." I won't pretend to know all the differences between them, but it appears that one key difference is that, with basic regular expressions, literal text is assumed for characters more often than not. My preference is to have something to catch my attention in an expression if I'm doing something that is not a literal match. The escapes are that flag for me. But if you're the type of person that wants uncluttered expressions and prefers to escape the metacharacters to match their literal values, then the '-r' option is probably more your style. |
Re Types of regex; as pointed out in this excellent (imho) book http://regex.info/, regex is just a concept and many (all?) tools have their own regex engine quirks.
The Perl regex engine is very powerful, so some langs/tools also have a 'pcre' option to make them work more like Perl... YHBW :) |
Thanks DH :) May be I need to revisit this again in few days.
Cheers! |
All times are GMT -5. The time now is 05:39 AM. |