LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-10-2003, 01:58 AM   #1
rch
Member
 
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909

Rep: Reputation: 48
Perl Regular Expression


I would like to match two words
one would be any word(say abcde)
with
the same word but including '.' s placed arbitarily like
a.bcd.e
or
ab.cd.e
or
abc.de
or
abcd.e
The lenght of the first word is not known in advance.
I thought that with ? which matches 0 or more occurences of the previous character I could do it,but the real problem is that length is not certain.
Is it possible?
I thougt about splitting the first word into individulal characters and then trying but that also does not solve the problem.
So the problem is
given any word (say <STDIN>) of arbitary length ,I have to match against another word which contains '.' which may be placed anywhere and the no of repeatation is not known in advance!
I know that Perl could do it ,but I have no idea!
Could anybody help!?
 
Old 07-10-2003, 05:02 AM   #2
powerplane
LQ Newbie
 
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26

Rep: Reputation: 15
Try this:
$> perl -e 'while(<STDIN>) { if (/\.[\d\w]+/) {print "matched:$_\n";} }'
 
Old 07-10-2003, 06:58 AM   #3
powerplane
LQ Newbie
 
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26

Rep: Reputation: 15
Quote:
Originally posted by powerplane
Try this:
$> perl -e 'while(<STDIN>) { if (/\.[\d\w]+/) {print "matched:$_\n";} }'
I was wrong.....
change to:
Code:
> perl -e 'while(<STDIN>) { if( /^a[\.]*b[\.]*c[\.]*d[\.]*e$/){if(/./){print "matched:$_\n";}}}'

Last edited by powerplane; 07-10-2003 at 09:16 AM.
 
Old 07-10-2003, 07:25 AM   #4
dharmender_rai
Member
 
Registered: Aug 2002
Location: Pune,India
Posts: 39

Rep: Reputation: 15
if the "." is placed arbitrarily then it would preceed or succeed "abcde" like ".abcde" or "abcde." . hence the general one would be :
if(/[\.]*a[\.]*b[\.]*c[\.]*d[\.]*e[\.]*/) {
## your logic goes here
}
 
Old 07-10-2003, 09:30 AM   #5
rch
Member
 
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909

Original Poster
Rep: Reputation: 48
Quote:
Originally posted by dharmender_rai
if the "." is placed arbitrarily then it would preceed or succeed "abcde" like ".abcde" or "abcde." . hence the general one would be :
if(/[\.]*a[\.]*b[\.]*c[\.]*d[\.]*e[\.]*/) {
## your logic goes here
}
Thanks for all the replies.I definately knew about this one but I think that you did not understand my question properly
So i would rephrase the question again-
Given a input (anyone from <STDIN> say that I put
$y=<STDIN> since STDIN can be any) now I have to match the <STDIN> say abcde with a word in a file which may be
a.bcd.e
or
ab.cd.e
or
abc.de
or
abcd.e
Now powerplane your RE would match any word containing .(or the word abcde) not $y.
Any ideas please?
Any perl gurus here!?
 
Old 07-10-2003, 09:48 AM   #6
rch
Member
 
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909

Original Poster
Rep: Reputation: 48
Adding here ,I thought of spliting $y into the individual characters and doing exactly as you suggested,but the real problem would be that I would never know the length of $y,and the number of characters are not possibly known in advance.There is a perl RE with which whitespaces are ignored,I wondered that if instead of whitespaces,could I replace with '.' s.
Like we can replace seperator,(which determines the record or lines in a file by doing
undef $/
and
then using $/ as required).
If I try the first approach the number of lines of code would be enormous,then why should anyone say that Perl is the best language for parsing.
 
Old 07-10-2003, 11:19 AM   #7
powerplane
LQ Newbie
 
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26

Rep: Reputation: 15
You just want to replace '.' with another character? use s///
Say replace '.' with '!':
Code:
$y =~ s/\./\!/g;
my fault..., I need to get some sleep.....

Last edited by powerplane; 07-10-2003 at 01:56 PM.
 
Old 07-10-2003, 12:45 PM   #8
TheLinuxDuck
Member
 
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349

Rep: Reputation: 33
Ok. Let's assume that the user enters a random name, of any length. We don't know what characters this file will contain, but we do know that it has the potential to match a given series of filenames that may or may not contain arbitrarily placed '.'s through them. So, by this we can deduce this:

If all '.'s are removed from both the user input and from the filename, a match is possible.

Let's assume that the filenames in question are in an array @filenames. I'm going to fill it in with dummy names.

Code:
#!/usr/bin/perl
use strict;
use warnings;

my(@filenames) = qw(bing bing.txt bingdink.donk bi.ng.txt b.i.n.g);

my($userInput, $tempFN);
my($matched) = 0;  # no matches
print "Enter filename:\n";
chomp($userInput = <STDIN>);

# remove all .'s from users input
$userInput =~ s/\.//g;
for my $filename(@filenames) {
  $tempFN = $filename;
  # remove .'s from filename
  $tempFN =~ s/\.//g;
  # remove 'and last' to match more than one
  $tempFN eq $userInput and print "$filename!\n" and $matched++ and last;
}

$matched > 0 and print "$matched matches were found\n:";
and it's output:
Code:
~/perl/rch> ./match.dot.pl
Enter filename:
bing.txt
bing.txt!
bi.ng.txt!
2 matches were found
Maybe not the most elegant way, but it works. (=

Last edited by TheLinuxDuck; 07-10-2003 at 12:46 PM.
 
Old 07-10-2003, 12:48 PM   #9
TheLinuxDuck
Member
 
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349

Rep: Reputation: 33
Btw, this very same thing could be done with a bash shell script, too, if perl isn't required.
 
Old 07-10-2003, 02:01 PM   #10
powerplane
LQ Newbie
 
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26

Rep: Reputation: 15
just use s/PATTERN_TO_FIND/REPLACEMENT/g is quite enough
I will give you neat example, say replace all '.' with '-' :

> perl -e 'while(<stdin>) {$_ =~ s/\./_/g; print "$_\n";}'
aa.sdf.werw.sdf.sdfsdf
aa_sdf_werw_sdf_sdfsdf

werwer.dsfdsl.sdfsdf.
werwer_dsfdsl_sdfsdf_

werwer.sdfsdfklwerlwe.sdfwerwpok-erwerlw;lrlwer
werwer_sdfsdfklwerlwe_sdfwerwpok-erwerlw;lrlwer

Last edited by powerplane; 07-10-2003 at 02:02 PM.
 
Old 07-10-2003, 02:54 PM   #11
TheLinuxDuck
Member
 
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349

Rep: Reputation: 33
Quote:
Originally posted by powerplane
just use s/PATTERN_TO_FIND/REPLACEMENT/g is quite enough
If you are refering to my post, then you will see that is exactly what I do in my code, except that I replace it with nothing, because of the matching that needs to be done.

If not, that doesn't solve the original dilemma of needing to match some given text to other text that may or may not have '.'s arbitrarily through the other text.
 
Old 07-11-2003, 12:02 AM   #12
rch
Member
 
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909

Original Poster
Rep: Reputation: 48
Thanks all again for the replies.First of all I must say that the answer I am seeking is not there among the replies you have send.I understand(and *know*,see my geek code) that you could replace '.' with nothing ,with awk(bash or shell) or with s///,but I still think that you have not understood the problem properly.
So for the last time I am trying to explain the problem.
1.There is a file say dictionary files which contains 10 of thousands of words.Each word in the file contains '.' placed arbitrarily.The words are not of fixed length.
2.The user has to input through <STDIN> a word which has to be matched in the dictionary file.
3.I could do it by spliting the word given by the user and inserting '.' s at different places (with substr) and searching the file for the given word.
4.I just wonder that since approach 3 is long,there is a simple RE which makes it shorter.
5.I could add everyword in the dictionary file in a array,and remove the '.' by your approach but I don't think that is resource'ful' and good skill.
I just wonder if anybody could help.
Finally TheLinuxDuck,thanks for writing all the code ,I could only require a RE nothing else.I think that I can write the rest of the code.
 
Old 07-11-2003, 09:32 AM   #13
TheLinuxDuck
Member
 
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349

Rep: Reputation: 33
rch:

I must not get it, then... because the code I posted seems to fill and meet every aspect of your needs, to match user input to another word that may or may not have arbitrarily placed .'s in them.

As far as splitting is concerned, and placing .'s into the user's input in various places, I think the code will take much longer to execute, because it will have to try all the different combinations of . placement on every word in the dictionary. By simply removing the .'s, you only have one test, and that one test will tell you if it matches or not.

And, AFA slurping all the dictionary contents into an array, well, that's easy enough to fix... don't slurp them all at once.

Code:
my($item);
open IN, "dictionary" or die "Can't open: $!\n";
while(<IN>) {
  /^$/ and next;
  chomp;
  $item = $_;
  $item =~ s/\.//g;

  ...
  
}
And presto, no more slurping.

Evidently, though, there is something I'm missing, because you say this won't work.

I hope that you find an expediant solution!
 
Old 07-11-2003, 11:48 PM   #14
gdrobson
Member
 
Registered: Jun 2002
Location: Ontario, Canada
Distribution: RH8.0
Posts: 65

Rep: Reputation: 15
Or you could eliminate the use of a reg-ex by using a Hash

for example :
# where $dict = "/home/mydictionaryfile.txt" ;

open (DICT, "< $dict") or die "Can't Open File: $dict $!\n";
while(<DICT>) {
chomp($_);
$dict{$_} = "Filler";
}

$mySTDIN =<STDIN>;
chomp($mySTDIN);

if (defined ($dict{$mySTDIN})){
# Is there an entry in the hash with the same filename ?
print "Match located\n";

} else {
print "No match found\n";
}

Should work, not sure if its what you want as it doesnt involve a regex.
 
Old 07-12-2003, 12:00 AM   #15
rch
Member
 
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909

Original Poster
Rep: Reputation: 48
Thanks everyone(specially TheLinuxDucks and gdrobson for your ideas).I should kick my ass for not seeing such a simple answer to the query,well I have to modify a little bit that of TheLinuxDucks,but I can still use his RE.
Well I did not told you that the dictionary words all start with @(but that's hardly a problem) and 3 out of 4 words there are in this format
@n <requiredword> where n is any integer,but I think that I could (and have) done the rest (matching with @/spaces/integers with the word).
So again thanks a million.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
perl regular expression problem true_atlantis Programming 4 05-27-2009 07:35 AM
Perl regular expression issue zikhermm Programming 7 09-23-2005 04:48 PM
Having trouble with a perl regular expression... jayemef Programming 3 08-26-2005 12:00 AM
Perl Regular Expression dilemma GATTACA Programming 1 03-27-2004 08:48 PM
using a perl regular expression in php markus1982 Programming 5 11-18-2002 03:31 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration