LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-13-2012, 03:24 AM   #1
sysmicuser
Member
 
Registered: Mar 2010
Posts: 332

Rep: Reputation: 0
Unhappy RegEx is not working.


Hi I am trying to extract string "SOA_BLD02_CAS" from regex.txt
<ias-instance id="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au" name="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au">

My regex is
cat regex.txt |sed 's;\(.*name="\)\(.*[^[:lower:]]\);\2;'

and it is not working

Output is as follows:
SOA_BLD02_CAS.bld02cashost01.myorganization.com.au">

but that is not what i want

Please help.

Thanks

Last edited by sysmicuser; 02-13-2012 at 03:31 AM.
 
Old 02-13-2012, 03:59 AM   #2
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
You don't need to cat regex.txt into sed. Just add regex.txt to the end of the sed command. The sed command will not modify the file unless you use the '-i' command (see man sed).

As for your regular expression...

Your attempt to use '[^[:lower:]]' is (1) not accompanied by a modifier (e.g. an asterisk or plus) and (2) ineffective because of the greedy nature of regular expressions. The '.*' in the second set of parentheses is greedy--matching everything up to the end of the line. It "overpowers" your not-in-lower-set expression--forcing the not-in-lower-set to match (I assume) the end-of-line character itself.

Feel free to tinker with your expression to make it work, but the way I approached it:
Code:
sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt
EDIT: Ha! grail and I were synchronized down to the minute and with nearly identical regex and similar responses.

Last edited by Dark_Helmet; 02-13-2012 at 04:00 AM.
 
1 members found this post helpful.
Old 02-13-2012, 03:59 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,247

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Firstly, cat is not required as sed can read a file.

You are testing for not lower after you say give me everything, ie .*

What i think you are looking for is not a period (.), so something like:
Code:
sed -r 's/.*name="([^.]*)\..*/\1/' regex.txt
 
1 members found this post helpful.
Old 02-13-2012, 07:45 AM   #4
sysmicuser
Member
 
Registered: Mar 2010
Posts: 332

Original Poster
Rep: Reputation: 0
First of all Many Thanks to both of you. Both solutions work and gives me exactly what I am looking for.Fantatsic !

If you have few mins, could you please explain me what magic it was, which worked, since I am learning and finding it hard to understand.

@Dark_Helmet
sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt
From what I understand, first it looks for any characters upto name=" but we are not keeping in register.
Later \([^.]\+\)..* could not understand this, I am pretty sure you have used register to keep in memory to refer in future


@grail
sed -r 's/.*name="([^.]*)\..*/\1/' regex.txt
From what I understand, first it looks for any characters upto name=" but we are not keeping in register.
Later ([^.]*)\..* could not understand this, Interesting enough you use () to use register I was thing register is used by \( \), not I am confused which way should register be used?

Please assist in understand the concept.

Thank you very much.

Cheers

---------- Post added 02-13-12 at 11:46 PM ----------

Thread is solved but I am trying to understand the concept.
 
Old 02-13-2012, 10:12 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,247

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
The use of -r switch with sed allows you to not have to escape the brackets when saving a register.

As for ([^.]*)\..* ... this is 2 parts:

1. ([^.]*) - This says to store everything (zero or more) that is not a period ... carat, ^, inside square brackets negates what you are looking for. Also a period, . , does not have to be escaped
when inside square brackets to be accepted as a literal period

2. \..* - This essentially everything from the period that we did not previously save until the end of the line.

Hope that helps.

PS. If you look through DH's solution you should be able to know figure out his as well as it is almost exactly the same
 
1 members found this post helpful.
Old 02-14-2012, 06:20 AM   #6
sysmicuser
Member
 
Registered: Mar 2010
Posts: 332

Original Poster
Rep: Reputation: 0
Many thanks grail for your help, it is much appreciated.

1 question though so far as DH solution goes,
DH says

sed 's@.*name="\([^.]\+\)\..*@\1@' regex.txt
Now so far as \..* is concerend I got that one as explained by you nicely but [^.]\+ does this means store one or more character which is not period, what does \+ means?
 
Old 02-14-2012, 08:08 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,247

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
I am not sure why DH escaped the + which does mean one or more of the preceding pattern. It could just be a better safe than sorry touch
 
Old 02-14-2012, 12:25 PM   #8
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
In short, sed will not match the expected pattern without the \+. Try it:
Code:
echo '<ias-instance id="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au" name="SOA_BLD02_CAS.bld02cashost01.myorganization.com.au">'| sed 's@.*name="\([^.]+\)\..*@\1@'
On my system, that command re-prints the input--no modifications. In other words, sed saw there was no matching pattern for its substitution.

As I understand it, without the backslash to escape it, sed will interpret the + as a literal character to match--not a pattern modifier.

EDIT:
I ran some tests, and the unescaped + causes some odd results. It appears to act as both: a literal + and a pattern modifier at the same time.

For instance:
Code:
$ echo "D_Hnope" | sed 's@\([^n]+\).*@\1@'
D_Hnope
$ echo "D_H+nope" | sed 's@\([^n]+\).*@\1@'
D_H+
I'm sure there's some documentation out there that explains it. Though, there seems to be too many types/groups of regular expressions. Basic shell regular expressions, extended, Perl, and who knows what else.
/EDIT

EDIT2:
One instance of my alleged "documentation out there" appears to be this page.
/EDIT2

This is a consequence of invoking sed without and with the -r option. This is the same reason why the parentheses are escaped in my expression--otherwise sed will want to match a literal open/close parenthesis.

Using '-r' tells sed to use "extended regular expressions." I won't pretend to know all the differences between them, but it appears that one key difference is that, with basic regular expressions, literal text is assumed for characters more often than not.

My preference is to have something to catch my attention in an expression if I'm doing something that is not a literal match. The escapes are that flag for me. But if you're the type of person that wants uncluttered expressions and prefers to escape the metacharacters to match their literal values, then the '-r' option is probably more your style.

Last edited by Dark_Helmet; 02-14-2012 at 02:33 PM.
 
1 members found this post helpful.
Old 02-15-2012, 01:16 AM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
Re Types of regex; as pointed out in this excellent (imho) book http://regex.info/, regex is just a concept and many (all?) tools have their own regex engine quirks.
The Perl regex engine is very powerful, so some langs/tools also have a 'pcre' option to make them work more like Perl...
YHBW
 
Old 02-16-2012, 09:21 AM   #10
sysmicuser
Member
 
Registered: Mar 2010
Posts: 332

Original Poster
Rep: Reputation: 0
Thanks DH May be I need to revisit this again in few days.

Cheers!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk computed regex not working as expected grail Programming 12 06-01-2011 12:16 PM
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 02:10 AM
Perl to find regex and print following 5 lines after regex casperdaghost Linux - Newbie 3 08-29-2010 09:08 PM
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 06:10 AM
Need a regex, I suck at regex's d3funct Programming 4 02-25-2002 09:28 PM


All times are GMT -5. The time now is 02:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration