LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-10-2012, 09:35 PM   #1
mwwynne
LQ Newbie
 
Registered: Aug 2012
Distribution: Ubuntu 12.04
Posts: 20

Rep: Reputation: Disabled
Python Regex Question


If I have a line of text with only 2 fields that I want to extract, how can I essentially ignore what is between the 2 fields, and only extract what I want?

eg.

in the line:

field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com

Thread: (can be any hostname or ip address)

All i want to extract is "Version: 150" and "Thread: www.google.com". I've been trying to find a way to do it with re.findall (and would prefer to do it that way if possible) but havent been able to get it working.

Edit: I should mention that 150 and 20 are variable in this case in case that wasn't obvious...

Any help is appreciated.

Thanks!

Last edited by mwwynne; 12-10-2012 at 10:55 PM. Reason: Correct input line
 
Old 12-10-2012, 10:32 PM   #2
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

How about
Code:
s='Version: 150, field1: ignore, field 2, ignore, Thread: 20'
re.split(',.*, *',s) #=> ['Version: 150', 'Thread: 20']
Or if you prefer findall:
Code:
re.findall('(Version|Thread): (\d*)', s) #=> [('Version', '150'), ('Thread', '20')]

Last edited by firstfire; 12-10-2012 at 10:36 PM.
 
Old 12-10-2012, 10:42 PM   #3
mwwynne
LQ Newbie
 
Registered: Aug 2012
Distribution: Ubuntu 12.04
Posts: 20

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by firstfire View Post
Hi.

How about
Code:
s='Version: 150, field1: ignore, field 2, ignore, Thread: 20'
re.split(',.*, *',s) #=> ['Version: 150', 'Thread: 20']
Or if you prefer findall:
Code:
re.findall('(Version|Thread): (\d*)', s) #=> [('Version', '150'), ('Thread', '20')]


Sorry, I forgot to add the fact that there is text in front of the first field I want to extract. I edited the original post to show what the line should look like.
 
Old 12-10-2012, 10:59 PM   #4
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Code:
s='field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com'
re.findall('(Version|Thread): ([^ ,]+)', s)  #=> [('Version', '150'), ('Thread', 'www.google.com')]
 
Old 12-10-2012, 11:04 PM   #5
mwwynne
LQ Newbie
 
Registered: Aug 2012
Distribution: Ubuntu 12.04
Posts: 20

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by firstfire View Post
Code:
s='field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com'
re.findall('(Version|Thread): ([^ ,]+)', s)  #=> [('Version', '150'), ('Thread', 'www.google.com')]
Thank you so much! I would +rep but I dont seem to be able to.
 
Old 12-10-2012, 11:05 PM   #6
mwwynne
LQ Newbie
 
Registered: Aug 2012
Distribution: Ubuntu 12.04
Posts: 20

Original Poster
Rep: Reputation: Disabled
Any tips on learning regular expressions? Websites, tutorials.. etc..?
 
Old 12-10-2012, 11:58 PM   #7
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
I'm sure you can find lots of regex-related tutorials online. Creating of a particular regular expression often requires some trial and error, so I use little sed one-liners, for example
Code:
$ echo 'field1: ignore, Version: 150, field1: ignore, field 2, ignore, Thread: www.google.com' | sed -r 's/(Version|Thread): [^ ,]*/[&]/g'
field1: ignore, [Version: 150], field1: ignore, field 2, ignore, [Thread: www.google.com]
or use ipython, if I need a python solution. This way I can try different ideas and approaches very quickly. Also it is very instructive to read manual and info pages which are probably already installed on your system: man sed, info sed, man awk, info gawk, man grep, man perlre, man perlretut (from perl-doc package on Ubuntu) etc. Of course they are all about different languages, but regular expressions are almost the same.
 
  


Reply

Tags
python, regex, regular expression


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] differences between shell regex and php regex and perl regex and javascript and mysql golden_boy615 Linux - General 2 04-19-2011 01:10 AM
A question about regex trist007 Linux - Newbie 12 09-16-2010 01:20 PM
regex.h question MTK358 Programming 4 06-08-2010 04:10 PM
regex question Toadman Linux - General 0 12-30-2005 12:59 PM
Regex search in files with python smokybobo Programming 0 11-08-2003 06:36 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration