LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   regular expression query (https://www.linuxquestions.org/questions/programming-9/regular-expression-query-859982/)

hashbang#! 02-01-2011 07:24 AM

regular expression query
 
Sorry about the rather unspecific title.


A string can assume the following valid values:
"(a, b)" "(a)" "(b)"
Any ideas how I can express this disallowing empty brackets "()"?

(I am using Python.)

AlucardZero 02-01-2011 07:58 AM

If your only goal is to disallow empty brackets:
Code:

\([ab ,]+\)
If your goal is to only allow the stated values:
Code:

\((a, b|a|b)\)

Julian Andrews 02-01-2011 08:50 AM

Alucard's answer is exactly what I would do - the only thing I would add is that if you're using python, you'll need to either provide the regex as a raw string, or escape the '\'s. So, either
Code:

regex = re.compile(r"\((a, b|a|b)\)")
or
Code:

regex = re.compile("\\((a, b|a|b)\\)")
I mention this because it can be one of the more confusing features of python's regex handling.

hashbang#! 02-01-2011 09:23 AM

Quote:

Originally Posted by AlucardZero (Post 4244598)
Code:

\((a, b|a|b)\)

That's what I suspected. Not very pretty considering that 'a' and 'b' are regular expressions themselves:

Code:

#a
\d+\-\d+
#b
\d\-)?\dpa\+?

example
(10-30, 4-5pa)
(10-30, 5pa+)
(3pa+)
(10-50)

And it will get really nasty once I try to extract the numbers, which is the ultimate objective.

hashbang#! 02-01-2011 10:33 AM

I think this one does it without repitition of "a" and "b":

Code:

regex = re.compile(r'(\(a(,|\)))?((\(| )b\))?\)?')
Julian, thanks for pointing out the raw strings.

Julian Andrews 02-02-2011 09:07 AM

You can also use python's string formatting to make your expressions a little more readable:

Code:

a = "Some RE string"
b = "Some other RE string"
regex = re.compile("\\(%s, %s|%s|%s\\)" % (a, b, a, b))

Especially if your regex gets complicated and long, building individually testable components like a and b can help a lot.

hashbang#! 02-02-2011 09:47 AM

Quote:

Originally Posted by Julian Andrews (Post 4245887)
You can also use python's string formatting to make your expressions a little more readable:

Code:

a = "Some RE string"
b = "Some other RE string"
regex = re.compile("\\(%s, %s|%s|%s\\)" % (a, b, a, b))


I had already done that using str.replace(). But this is so much neater! I am still new to Python (not to programming) and it's great when you pick up wee nuggets of information like that in passing.

Quote:

Originally Posted by Julian Andrews (Post 4245887)
Especially if your regex gets complicated and long, building individually testable components like a and b can help a lot.

My little snippet above was just one element of seven of a complex regular expression. from which I am extracting 12 named groups:

Code:

re_element = r'(?:(?:%s)(?: |\Z))?'
re_template = '^' + re_element * 6 + r'(?:(?:%s))?' + '$'

In the end I fudged my "(m, n)" element to aid duplicating my named groups (m and n each consist of 2 groups).
Code:

regex = re.compile(r'\(m?(, )?n?\)')
I might end up with empty brackets but it's not too much of a problem as long as I can extract the values I am after. The string I am parsing is auto-generated, so should be pretty well formed.

Just ran this monster over 340,000 records and it does the trick.


All times are GMT -5. The time now is 03:11 AM.