LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-19-2015, 06:23 PM   #1
call_krushna
Member
 
Registered: Aug 2007
Location: India
Distribution: Ubuntu
Posts: 173

Rep: Reputation: 1
Need a regex to find a string using python .


Hi Team,

I have a file with below contents .

test.txt
------------------------------------------------------
obId>TX-4440</jobId><user>sysadmin</user><started>2014-10-02T19:18:41.000Z</started><status>FINISHED</status><type>EXPORT</type><priority>MEDIUM</priority></job><job><jobId>TX-4445</jobId><user>sysadmin</user><started>2014-10-02T19:17:42.000Z</started><status>FINISHED</status><type>EXPORT</type><priority>MEDIUM</priority></job><job><jobId>TX-4455</jobId><user>sysadmin</user><started>2014-10-03T06:37:49.000Z</started><status>FINISHED</status><type>EXPORT</type><priority>MEDIUM</priority></job><job><jobId>TX-4456</jobId><user>sysadmin</user><started>2014-10-03T06:38:06.000Z</started><status>FINISHED</status><type>EXPORT</type><priority>MEDIUM</priority></job><job><jobId>TX-4458</jobId><user>sysadmin</user><started>2014-10-03T06:55:41.000Z</started><status>FINISHED</status><type>EXPORT</type><priority>MEDIUM</priority></job><job><jobId>TX-4922</jobId><user>sysadmin</user><started>2014-10-05T00:39:40.000Z</started><status>FINISHED</status><type>EXPORT</type><priority>MEDIUM</priority></job><job><jobId>TX-5020</jobId>

----------------------- file ends here ------------------------



I need a regex to find all the job ID that start with TX ( e.g. TX-4440, TX-4456 )

I tried in below way .

reg = re.compile('[TX]+-+[0-9]+')

reg.search(mystring).group()


I am able to get very first value i.e (TX-4440) , but I need all the values as list .


Any help is highly appreciable .

Thanks & Regards
 
Old 12-19-2015, 07:32 PM   #2
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
This example in the python docs should get you started.
 
1 members found this post helpful.
Old 12-19-2015, 11:43 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
You might also wish to check your current regex as it will also match the following :- TXTXTXTX-----0

I would be guessing you would not wish to match such a combination.
 
Old 12-20-2015, 12:45 AM   #4
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,219

Rep: Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309
Code:
import re

with open('test.txt') as f:
    text = f.read()

print re.findall('>(TX-[0-9]{4})<', text)
That said, did you consider fixing the XML file so that it's proper XML, and then using an XML parser? The Python standard library includes one. Most people would use lxml.
 
2 members found this post helpful.
Old 12-23-2015, 07:43 PM   #5
call_krushna
Member
 
Registered: Aug 2007
Location: India
Distribution: Ubuntu
Posts: 173

Original Poster
Rep: Reputation: 1
Thanks Dugan,

That re code helped quickly to resolve the issue . Can u explain what it it doing really the below code .

re.findall('>(TX-[0-9]{4})<', text)) .

Thanks and regards
 
Old 12-23-2015, 07:51 PM   #6
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,219

Rep: Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309
Quote:
Originally Posted by call_krushna View Post
Thanks Dugan,

That re code helped quickly to resolve the issue . Can u explain what it it doing really the below code .

re.findall('>(TX-[0-9]{4})<', text)) .

Thanks and regards
Well, I assume you can look up what "re.findall" does. Don't dispel that assumption.

I saw that all of the instances you wanted were ">TX-####<". Tag-closing angle bracket, "TX-" constant, 4 numbers, tag-opening angle bracket. So we're searching for strings between ">" and "<". Hence: ">(...)<". The parenetheses specify a group, and what's inside the group will be found. So inside the group, I have "TX-". That's a constant. Followed by "[0-9]", a digit, followed by "{4}", repeated 4 times.

See how unwise it is to ask someone to "explain" a regular expression?

EDIT: it's unwise because it's really difficult to explain them in a way that's easy to follow and that doesn't come off as word salad.

Last edited by dugan; 12-24-2015 at 12:57 PM. Reason: The last line needed an explanation
 
2 members found this post helpful.
Old 12-23-2015, 08:23 PM   #7
call_krushna
Member
 
Registered: Aug 2007
Location: India
Distribution: Ubuntu
Posts: 173

Original Poster
Rep: Reputation: 1
Hi Dugan,

Many thank for the explanation . We are using {4} which will search numbers repeated 4 times only. It wont work if number repeated 5 times e.g.(TX-12345) . Is there any way we can search

i) any time repeated numbers , something like {*)

ii.) 4 or more than 4 time repeated numbers {4+}

Thanks and regards
 
Old 12-24-2015, 02:32 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
{4,} will allow for 4 or more assuming you do not want less than 4 which then you would just use * or +
 
2 members found this post helpful.
Old 12-24-2015, 02:34 AM   #9
call_krushna
Member
 
Registered: Aug 2007
Location: India
Distribution: Ubuntu
Posts: 173

Original Poster
Rep: Reputation: 1
Many thanks Grail .

Thanks and Regards
 
  


Reply

Tags
python



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Regex find first 5-7 occurrences of a set of digits within a string metallica1973 Programming 20 03-08-2013 10:01 PM
capture regex and print string casperdaghost Programming 1 06-15-2012 06:18 PM
Perl to find regex and print following 5 lines after regex casperdaghost Linux - Newbie 3 08-29-2010 08:08 PM
[SOLVED] regex match string from start to find unique combinations fukawi2 Programming 6 02-11-2010 05:32 PM
Python: find defined text string in a file, and replace the whole line Dark Carnival Programming 6 05-22-2007 06:02 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration