[SOLVED] regex match string from start to find unique combinations

fukawi2 · 02-11-2010, 06:12 AM

Well it's late, and I'm way too inexperienced with perl/regex to figure this out on my own...

I'm writing a perl script to accept input (commands) from the user. I want to implement a 'closest match' type scheme on accepting the input.

Example:
- A valid command is 'update' and 'upload'
- The user should be able to type 'upd' or 'update' etc to execute the 'update' command. 'upte' is not valid.
- The user should be able to type 'upl' or 'uplo' etc to execute the 'upload' command. upod is not valid.
- The command 'up' can't be matched to a unique command.

I'm using the following regex at the moment:

Code:

/^upd?a?t?e?/
/^upl?o?a?d?/

This works EXCEPT for treating 'upte' and 'upod' as matches.

I think I need a way in the regex similar to ? except to say "match the preceding character or nothing, and stop looking" rather than "match the preceding character, or don't"

Any ideas folks?

neonsignal · 02-11-2010, 07:08 AM

You can bracket regular expressions, eg

Code:

/^upd(a(t(e)?)?)?/

Or you could just match on the first three characters and then do a second check that what they entered matches the start of the full command string.

ashok.g · 02-11-2010, 07:14 AM

I think this will work fine for you.

Code:

$a=<STDIN>;
if($a=~/^(upda?|updat?|update?|update)/)
{
print "UPDATE\n";
}
elsif($a=~/^(uplo?|uploa?|updload?|upload)/)
{
print "UPLOAD\n";
}
else
{
print "NONE\n";
}

bartonski · 02-11-2010, 08:11 AM

Just out of curiosity, why aren't 'upte' and 'upod' valid matches? If it's 'closest match', anything that could uniquely match would seem to be valid.

I think that I would use a soundex algorithm, and be done with it.

jschiwal · 02-11-2010, 10:36 AM

You could use the patterns in case statements instead of a string of if/then/else statements.

tuxdev · 02-11-2010, 10:45 AM

I would consider approaching the problem from the other direction. If say, the user typed "up", use the regex "up.*" on each valid command. Since that regex matches more than one command, it's ambiguous (and you can create a nice error message listing out the possibilities). If the user typed "upd", then the regex "upd.*" would only match "update", so that must be the desired command.

fukawi2 · 02-11-2010, 05:32 PM

Quote:

Originally Posted by neonsignal

You can bracket regular expressions, eg

Code:

/^upd(a(t(e)?)?)?/

That was my other thought, but it didn't seem 'graceful' enough, lol

Now that I'm awake a bit better, my Googling skills are working better, and I think I've found my solution in here:
http://docstore.mik.ua/orelly/perl/cookbook/ch06_21.htm
http://perldoc.perl.org/Text/Abbrev.html

Thanks for all the suggestions folks