LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 03-01-2009, 05:02 PM   #1
donnied
Member
 
Registered: Oct 2006
Distribution: Debian x64
Posts: 197

Rep: Reputation: 30
Python scriptutil for patterns that don't match


I was trying to think of a way to use scriptutil.freplace to delete lines that don't match a patter. For the moment I've had to settle with a short if, then statement. However, I feel I would be a lot better off if I knew how to do a 'does not' match search.
I'm having a small problem with parentheses and brackets.
I have: ([0-9]{6}),([0-9]{4}[A-Z]{0,2}),([0-9]{1,3}),(.*?),(.*?),([0-9]{1,3}),(.*?),(.*)
I would like to eliminate patterns --for example those that don't have all eight comma delimited fields.
What about something like
[^(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?,)(.*?)] ?
Or how would I include the fields I've already specified:
[0-9]{6}),([0-9]{4}[A-Z]{0,2}),([0-9]{1,3}) etc?
 
Old 03-03-2009, 02:19 AM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 237Reputation: 237Reputation: 237
you don't have to make it that complicated if you are using Python. Show samples of text you want to match, and describe more clearly what you want to get.
 
Old 03-03-2009, 06:29 PM   #3
donnied
Member
 
Registered: Oct 2006
Distribution: Debian x64
Posts: 197

Original Poster
Rep: Reputation: 30
Here I'm thinking specifically I don't want lined that don't have eight different fields separated by commas. Or possibly if the sixth field is not a number.
I think knowing how to state [^foo] could be helpful.

I'm also curious why if I specify seven fields
(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?) and replace with \1\2\3\4\5\6 why the 7th field is tagged on? To delete the 7th field I used the back references to insert foo between \6 and \7 and delete what cam after foo. This seems a bit unnecessary and I don't remember scriptutil always behaving this way.
 
Old 03-03-2009, 09:03 PM   #4
beiller
LQ Newbie
 
Registered: Nov 2008
Posts: 22

Rep: Reputation: 15
Hey There

Dunno why you would use (.*?),(.*?)

* and ? are both modifiers. (.*),(.*) is more accurate, as the * is the kleene star It means 0 or more, which kind of implies its optional (?). Correct me if I am wrong...
 
Old 03-04-2009, 04:10 AM   #5
yassen
LQ Newbie
 
Registered: Sep 2008
Posts: 3

Rep: Reputation: 0
donnied:
Here's an advice (not a direct reply to your question, but you might find this VERY useful, as I did):

Dowbload this regexp editor-tester app (QuickREx):

http://sourceforge.net/projects/quickrex/

If you do not have/use eclipse, download the stand-alone application. You need to have Java installed to get that running, and possibly the java ./bin/ directory added to your PATH.

If you get it running, there comes the fun part: paste some test lines of your input data and write a regular expression to the corresponding text field; it will immediately show you in real time if it matches, which are the groups, etc. Works great for me, and the "JDK regexp" seems to completely match the Python regexp behavior.

Hope this will help you,
Cheers!
yassen
 
Old 03-04-2009, 04:35 AM   #6
yassen
LQ Newbie
 
Registered: Sep 2008
Posts: 3

Rep: Reputation: 0
And also, how about skipping ('continue' in the loop) lines that have line.count(',') != 8?
 
Old 03-04-2009, 05:52 PM   #7
donnied
Member
 
Registered: Oct 2006
Distribution: Debian x64
Posts: 197

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by beiller View Post
Dunno why you would use (.*?),(.*?)

* and ? are both modifiers. (.*),(.*) is more accurate, as the * is the kleene star It means 0 or more, which kind of implies its optional (?). Correct me if I am wrong...
It's my understanding that the '?' is used for the 'non-greedy' regex and it matches the limits itself to the first occurrence of a pattern. If there is a comma it will stop at the first instance and not go beyond whereas .* could include anything that goes up to a comma (including other commas).
 
Old 03-04-2009, 05:53 PM   #8
donnied
Member
 
Registered: Oct 2006
Distribution: Debian x64
Posts: 197

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by yassen View Post
And also, how about skipping ('continue' in the loop) lines that have line.count(',') != 8?
Yeah, that's pretty much what I did. I was hoping to skip the 'if,then' loop.
 
Old 03-05-2009, 12:06 PM   #9
beiller
LQ Newbie
 
Registered: Nov 2008
Posts: 22

Rep: Reputation: 15
Match all 8 comma separated fields

Yes realized that ? is non-greedy. doh

maybe ^.*?(,.*?){7}$

Last edited by beiller; 03-05-2009 at 12:14 PM.
 
  


Reply

Tags
dont, expressions, match, python, regular


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
grep patterns tekmann33 Linux - Newbie 2 07-14-2008 01:25 PM
Patterns som_kurian Programming 13 12-06-2007 09:01 AM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM
Remembering patterns and printing only those patterns using sed bernie82 Programming 5 05-26-2005 05:18 PM
Nautilus backgrounds/patterns jeickal Linux - Software 0 01-03-2005 03:26 AM


All times are GMT -5. The time now is 02:57 PM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration