LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-13-2007, 04:33 AM   #1
baddah
Member
 
Registered: Feb 2006
Location: Cape Town,South Africa
Distribution: Fedora Core 8
Posts: 188

Rep: Reputation: 30
Perl RegExpr Find Best Matches


Hi,

I have the following problem with a perl script i'm busy with.
Say i have a code 002577.

I want to match this against a big number of regular expressions and find the best match.Say the best matches is the following.

002577[6-9][0-5]
00257795[5-9]
002577[6-9][6-9]

My problem is that 002577 is less digits that all three these regular expressions.Thus something like

Code:
	
002577 =~ 002577[6-9][0-5] 
002577 =~ 00257795[5-9] 
002577 =~ 002577[6-9][6-9]

does not match in any case.I would like my script to realize that the 002577 matches the first part of the regular expression and return me with the amount of digits matching.

I have a general script that matches a lot of regular expressions(most the code i want match is longer than the regular expressions,so it works mostly,but in cases like the above i run into problems.)

Any idea how I can fix this.Any help will really be appreciated.
 
Old 09-14-2007, 12:57 AM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,355

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
I'm not entirely clear about "return me with the amount of digits matching" ??
However, to match at the start of a string use the '^' sigil eg

if( $var =~ /^002577/ )
means if var begins with '002577' then ...

HTH
 
Old 09-14-2007, 01:22 AM   #3
baddah
Member
 
Registered: Feb 2006
Location: Cape Town,South Africa
Distribution: Fedora Core 8
Posts: 188

Original Poster
Rep: Reputation: 30
Hi,

Thanks for the reply.Let me explain in more detail.I have a table of telephone codes/(All regular expressions like 0027[7-8] is South Africa).So my script takes a number say 0027215555555 and looks for it in a table and fins a match like so...

Code:
0027215555555 =~ ^0027[7-8]
This finds a match because the whole regular expression(0027[7-8]) exists in the number (0027215555555).
So my script runs fine normally,and matches the telephone numbers against the table of regular expression codes.

Here's my problem now.

I have been supplied with a list of codes from a proveder,and i need to match this against my regular expressions.Their codes is different from the ones a have.i.e for south africa they might only have a 0027.SO now i want to find all the matches for this 0027.

So,

Like in the above example the 0027215555555 becomes 0027.So if i do

Code:
0027 =~ ^0027[7-8]
It does not find a match because the whole regular expression (0027[7-8]) is not in the string.I want the script to realize that 0027 is only 4 digits long and therefore it must only look at the first 4 possible digits of the regular expression.

so i want 0027 to match the following

Code:
0027[7-8]             #first 4 digits of reg expression contains 0027
0027[1]               #first 4 digits of reg expression has 0027
0027[21][5-6]         #first 4 digits of reg expression has 0027
00[2-6][7-9]          #first 4 digits of reg expression does match 0027
etc
I hope this is more clear.Thanks for the help

Last edited by baddah; 09-14-2007 at 01:27 AM.
 
Old 09-14-2007, 12:33 PM   #4
Linux_in_NH
Member
 
Registered: Jan 2004
Location: NH
Distribution: Mandrake, Geentoo, Ubuntu
Posts: 105

Rep: Reputation: 15
I am not sure if I follow completely, but maybe you are looking for something like this:

if( $var =~ /^0027[78]?/ )

This says that the string must start with 0027 and then an optional 7 or 8

The second example would be

if( $var =~ /^0027[1]?/ )

but keep in mind that because the 7 or 8 are optional in the first check, if the are not there it will still match as long as the first 4 digits are correct. It really sounds to me like you are simply trying to match the first four digits, and the rest is frivolous.
 
Old 09-18-2007, 10:19 AM   #5
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
How about composing your regex iteratively, growing it to include more digits on each iteration? When the regex fails to match, the previous version is 'best'. This exploits the fact that perl can use a scalar variable as a regular expression against which to match.
Code:
# Untested...
my $code = getCodeFromSomePlace();
my $regex = "";
my $bestMatch = "";
my $bestRegex = "";
foreach my $c ( "1", "2", "3", "4" ){
  my $regex .= $c;
  if( $code =~ m/$regex/ ){ $bestMatch = $&; $bestRegex = $regex; }
  else{ last; }
}
print "$bestMatch: $bestMatch, length = ", length( $bestMatch ), "\n";
Pre-compose a list of components to append &/or substitute into your regex on each iteration.

--- rod.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl can't find a module that exists Poetics Programming 2 08-02-2007 04:14 PM
Help me find a Perl script rumblestrut Linux - Server 1 05-25-2007 09:46 AM
Perl regexs: How to recover an unknown number of matches? enemorales Programming 6 07-06-2006 10:59 AM
Where Can I Find perl-base? JLuv3k7 Fedora 4 04-01-2006 04:57 PM
bash: routine outputting both matches and non-matches separately??? Bebo Programming 8 07-19-2004 06:52 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration