LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   RegEx is not working. (https://www.linuxquestions.org/questions/linux-newbie-8/regex-is-not-working-929058/)

sysmicuser 02-13-2012 02:24 AM

RegEx is not working.
 
Hi I am trying to extract string "SOA_BLD02_CAS" from regex.txt
<ias-instance id="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au" name="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au">

My regex is
cat regex.txt |sed 's;\(.*name="\)\(.*[^[:lower:]]\);\2;'

and it is not working:(:(

Output is as follows:
SOA_BLD02_CAS.bld02cashost01.myorganization.com.au">

but that is not what i want :(

Please help.

Thanks

Dark_Helmet 02-13-2012 02:59 AM

You don't need to cat regex.txt into sed. Just add regex.txt to the end of the sed command. The sed command will not modify the file unless you use the '-i' command (see man sed).

As for your regular expression...

Your attempt to use '[^[:lower:]]' is (1) not accompanied by a modifier (e.g. an asterisk or plus) and (2) ineffective because of the greedy nature of regular expressions. The '.*' in the second set of parentheses is greedy--matching everything up to the end of the line. It "overpowers" your not-in-lower-set expression--forcing the not-in-lower-set to match (I assume) the end-of-line character itself.

Feel free to tinker with your expression to make it work, but the way I approached it:
Code:

sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt
EDIT: Ha! grail and I were synchronized down to the minute and with nearly identical regex and similar responses.

grail 02-13-2012 02:59 AM

Firstly, cat is not required as sed can read a file.

You are testing for not lower after you say give me everything, ie .*

What i think you are looking for is not a period (.), so something like:
Code:

sed -r 's/.*name="([^.]*)\..*/\1/' regex.txt

sysmicuser 02-13-2012 06:45 AM

First of all Many Thanks to both of you. Both solutions work and gives me exactly what I am looking for.Fantatsic !

If you have few mins, could you please explain me what magic it was, which worked, since I am learning and finding it hard to understand.

@Dark_Helmet
sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt
From what I understand, first it looks for any characters upto name=" but we are not keeping in register.
Later \([^.]\+\)..* could not understand this, I am pretty sure you have used register to keep in memory to refer in future


@grail
sed -r 's/.*name="([^.]*)\..*/\1/' regex.txt
From what I understand, first it looks for any characters upto name=" but we are not keeping in register.
Later ([^.]*)\..* could not understand this, Interesting enough you use () to use register I was thing register is used by \( \), not I am confused which way should register be used?

Please assist in understand the concept.

Thank you very much.

Cheers

---------- Post added 02-13-12 at 11:46 PM ----------

Thread is solved but I am trying to understand the concept.

grail 02-13-2012 09:12 AM

The use of -r switch with sed allows you to not have to escape the brackets when saving a register.

As for ([^.]*)\..* ... this is 2 parts:

1. ([^.]*) - This says to store everything (zero or more) that is not a period ... carat, ^, inside square brackets negates what you are looking for. Also a period, . , does not have to be escaped
when inside square brackets to be accepted as a literal period

2. \..* - This essentially everything from the period that we did not previously save until the end of the line.

Hope that helps.

PS. If you look through DH's solution you should be able to know figure out his as well as it is almost exactly the same :)

sysmicuser 02-14-2012 05:20 AM

Many thanks grail for your help, it is much appreciated.

1 question though so far as DH solution goes,
DH says

sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt
Now so far as \..* is concerend I got that one as explained by you nicely but [^.]\+ does this means store one or more character which is not period, what does \+ means?

grail 02-14-2012 07:08 AM

I am not sure why DH escaped the + which does mean one or more of the preceding pattern. It could just be a better safe than sorry touch :)

Dark_Helmet 02-14-2012 11:25 AM

In short, sed will not match the expected pattern without the \+. Try it:
Code:

echo '<ias-instance id="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au" name="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au">'| sed 's@.*name="\([^.]+\)\..*@\1@'
On my system, that command re-prints the input--no modifications. In other words, sed saw there was no matching pattern for its substitution.

As I understand it, without the backslash to escape it, sed will interpret the + as a literal character to match--not a pattern modifier.

EDIT:
I ran some tests, and the unescaped + causes some odd results. It appears to act as both: a literal + and a pattern modifier at the same time.

For instance:
Code:

$ echo "D_Hnope" | sed 's@\([^n]+\).*@\1@'
D_Hnope
$ echo "D_H+nope" | sed 's@\([^n]+\).*@\1@'
D_H+

I'm sure there's some documentation out there that explains it. Though, there seems to be too many types/groups of regular expressions. Basic shell regular expressions, extended, Perl, and who knows what else.
/EDIT

EDIT2:
One instance of my alleged "documentation out there" appears to be this page.
/EDIT2

This is a consequence of invoking sed without and with the -r option. This is the same reason why the parentheses are escaped in my expression--otherwise sed will want to match a literal open/close parenthesis.

Using '-r' tells sed to use "extended regular expressions." I won't pretend to know all the differences between them, but it appears that one key difference is that, with basic regular expressions, literal text is assumed for characters more often than not.

My preference is to have something to catch my attention in an expression if I'm doing something that is not a literal match. The escapes are that flag for me. But if you're the type of person that wants uncluttered expressions and prefers to escape the metacharacters to match their literal values, then the '-r' option is probably more your style.

chrism01 02-15-2012 12:16 AM

Re Types of regex; as pointed out in this excellent (imho) book http://regex.info/, regex is just a concept and many (all?) tools have their own regex engine quirks.
The Perl regex engine is very powerful, so some langs/tools also have a 'pcre' option to make them work more like Perl...
YHBW :)

sysmicuser 02-16-2012 08:21 AM

Thanks DH :) May be I need to revisit this again in few days.

Cheers!


All times are GMT -5. The time now is 05:39 AM.