LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Python Regex Question (http://www.linuxquestions.org/questions/programming-9/python-regex-question-4175440903/)

mwwynne 12-10-2012 10:35 PM

Python Regex Question
 
If I have a line of text with only 2 fields that I want to extract, how can I essentially ignore what is between the 2 fields, and only extract what I want?

eg.

in the line:

field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com

Thread: (can be any hostname or ip address)

All i want to extract is "Version: 150" and "Thread: www.google.com". I've been trying to find a way to do it with re.findall (and would prefer to do it that way if possible) but havent been able to get it working.

Edit: I should mention that 150 and 20 are variable in this case in case that wasn't obvious...

Any help is appreciated.

Thanks!

firstfire 12-10-2012 11:32 PM

Hi.

How about
Code:

s='Version: 150, field1: ignore, field 2, ignore, Thread: 20'
re.split(',.*, *',s) #=> ['Version: 150', 'Thread: 20']

Or if you prefer findall:
Code:

re.findall('(Version|Thread): (\d*)', s) #=> [('Version', '150'), ('Thread', '20')]

mwwynne 12-10-2012 11:42 PM

Quote:

Originally Posted by firstfire (Post 4846681)
Hi.

How about
Code:

s='Version: 150, field1: ignore, field 2, ignore, Thread: 20'
re.split(',.*, *',s) #=> ['Version: 150', 'Thread: 20']

Or if you prefer findall:
Code:

re.findall('(Version|Thread): (\d*)', s) #=> [('Version', '150'), ('Thread', '20')]



Sorry, I forgot to add the fact that there is text in front of the first field I want to extract. I edited the original post to show what the line should look like.

firstfire 12-10-2012 11:59 PM

Code:

s='field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com'
re.findall('(Version|Thread): ([^ ,]+)', s)  #=> [('Version', '150'), ('Thread', 'www.google.com')]


mwwynne 12-11-2012 12:04 AM

Quote:

Originally Posted by firstfire (Post 4846692)
Code:

s='field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com'
re.findall('(Version|Thread): ([^ ,]+)', s)  #=> [('Version', '150'), ('Thread', 'www.google.com')]


Thank you so much! I would +rep but I dont seem to be able to.

mwwynne 12-11-2012 12:05 AM

Any tips on learning regular expressions? Websites, tutorials.. etc..?

firstfire 12-11-2012 12:58 AM

I'm sure you can find lots of regex-related tutorials online. Creating of a particular regular expression often requires some trial and error, so I use little sed one-liners, for example
Code:

$ echo 'field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com' | sed -r 's/(Version|Thread): [^ ,]*/[&]/g'
field1: ignore, [Version: 150], field1: ignore, field 2, ignore, [Thread: www.google.com]

or use ipython, if I need a python solution. This way I can try different ideas and approaches very quickly. Also it is very instructive to read manual and info pages which are probably already installed on your system: man sed, info sed, man awk, info gawk, man grep, man perlre, man perlretut (from perl-doc package on Ubuntu) etc. Of course they are all about different languages, but regular expressions are almost the same.


All times are GMT -5. The time now is 08:09 PM.