LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-01-2011, 07:24 AM   #1
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Rep: Reputation: 17
regular expression query


Sorry about the rather unspecific title.


A string can assume the following valid values:
"(a, b)" "(a)" "(b)"
Any ideas how I can express this disallowing empty brackets "()"?

(I am using Python.)
 
Old 02-01-2011, 07:58 AM   #2
AlucardZero
Senior Member
 
Registered: May 2006
Location: USA
Distribution: Debian
Posts: 4,824

Rep: Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615
If your only goal is to disallow empty brackets:
Code:
\([ab ,]+\)
If your goal is to only allow the stated values:
Code:
\((a, b|a|b)\)
 
1 members found this post helpful.
Old 02-01-2011, 08:50 AM   #3
Julian Andrews
LQ Newbie
 
Registered: Jan 2011
Distribution: Ubuntu
Posts: 21

Rep: Reputation: 13
Alucard's answer is exactly what I would do - the only thing I would add is that if you're using python, you'll need to either provide the regex as a raw string, or escape the '\'s. So, either
Code:
regex = re.compile(r"\((a, b|a|b)\)")
or
Code:
regex = re.compile("\\((a, b|a|b)\\)")
I mention this because it can be one of the more confusing features of python's regex handling.
 
1 members found this post helpful.
Old 02-01-2011, 09:23 AM   #4
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
Quote:
Originally Posted by AlucardZero View Post
Code:
\((a, b|a|b)\)
That's what I suspected. Not very pretty considering that 'a' and 'b' are regular expressions themselves:

Code:
#a
\d+\-\d+
#b
\d\-)?\dpa\+?
example
(10-30, 4-5pa)
(10-30, 5pa+)
(3pa+)
(10-50)

And it will get really nasty once I try to extract the numbers, which is the ultimate objective.

Last edited by hashbang#!; 02-01-2011 at 09:31 AM.
 
Old 02-01-2011, 10:33 AM   #5
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
I think this one does it without repitition of "a" and "b":

Code:
regex = re.compile(r'(\(a(,|\)))?((\(| )b\))?\)?')
Julian, thanks for pointing out the raw strings.
 
Old 02-02-2011, 09:07 AM   #6
Julian Andrews
LQ Newbie
 
Registered: Jan 2011
Distribution: Ubuntu
Posts: 21

Rep: Reputation: 13
You can also use python's string formatting to make your expressions a little more readable:

Code:
a = "Some RE string"
b = "Some other RE string"
regex = re.compile("\\(%s, %s|%s|%s\\)" % (a, b, a, b))
Especially if your regex gets complicated and long, building individually testable components like a and b can help a lot.
 
1 members found this post helpful.
Old 02-02-2011, 09:47 AM   #7
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
Quote:
Originally Posted by Julian Andrews View Post
You can also use python's string formatting to make your expressions a little more readable:

Code:
a = "Some RE string"
b = "Some other RE string"
regex = re.compile("\\(%s, %s|%s|%s\\)" % (a, b, a, b))
I had already done that using str.replace(). But this is so much neater! I am still new to Python (not to programming) and it's great when you pick up wee nuggets of information like that in passing.

Quote:
Originally Posted by Julian Andrews View Post
Especially if your regex gets complicated and long, building individually testable components like a and b can help a lot.
My little snippet above was just one element of seven of a complex regular expression. from which I am extracting 12 named groups:

Code:
re_element = r'(?:(?:%s)(?: |\Z))?'
re_template = '^' + re_element * 6 + r'(?:(?:%s))?' + '$'
In the end I fudged my "(m, n)" element to aid duplicating my named groups (m and n each consist of 2 groups).
Code:
regex = re.compile(r'\(m?(, )?n?\)')
I might end up with empty brackets but it's not too much of a problem as long as I can extract the values I am after. The string I am parsing is auto-generated, so should be pretty well formed.

Just ran this monster over 340,000 records and it does the trick.

Last edited by hashbang#!; 02-02-2011 at 10:07 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
regular expression Ammad Linux - General 5 08-01-2008 07:41 AM
Regular Expression harkonen Programming 6 07-12-2008 12:06 PM
regular expression (.*?) uttam_h Programming 6 05-30-2008 05:45 PM
Help with regular expression Feyd-Rautha Programming 8 04-21-2008 11:18 AM
Regular Expression slizadel Programming 4 07-28-2003 05:16 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:19 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration