Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
07-10-2003, 01:58 AM
|
#1
|
Member
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909
Rep:
|
Perl Regular Expression
I would like to match two words
one would be any word(say abcde)
with
the same word but including '.' s placed arbitarily like
a.bcd.e
or
ab.cd.e
or
abc.de
or
abcd.e
The lenght of the first word is not known in advance.
I thought that with ? which matches 0 or more occurences of the previous character I could do it,but the real problem is that length is not certain.
Is it possible?
I thougt about splitting the first word into individulal characters and then trying but that also does not solve the problem.
So the problem is
given any word (say <STDIN>) of arbitary length ,I have to match against another word which contains '.' which may be placed anywhere and the no of repeatation is not known in advance!
I know that Perl could do it ,but I have no idea!
Could anybody help!?
|
|
|
07-10-2003, 05:02 AM
|
#2
|
LQ Newbie
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26
Rep:
|
Try this:
$> perl -e 'while(<STDIN>) { if (/\.[\d\w]+/) {print "matched:$_\n";} }'
|
|
|
07-10-2003, 06:58 AM
|
#3
|
LQ Newbie
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26
Rep:
|
Quote:
Originally posted by powerplane
Try this:
$> perl -e 'while(<STDIN>) { if (/\.[\d\w]+/) {print "matched:$_\n";} }'
|
I was wrong.....
change to:
Code:
> perl -e 'while(<STDIN>) { if( /^a[\.]*b[\.]*c[\.]*d[\.]*e$/){if(/./){print "matched:$_\n";}}}'
Last edited by powerplane; 07-10-2003 at 09:16 AM.
|
|
|
07-10-2003, 07:25 AM
|
#4
|
Member
Registered: Aug 2002
Location: Pune,India
Posts: 39
Rep:
|
if the "." is placed arbitrarily then it would preceed or succeed "abcde" like ".abcde" or "abcde." . hence the general one would be :
if(/[\.]*a[\.]*b[\.]*c[\.]*d[\.]*e[\.]*/) {
## your logic goes here
}
|
|
|
07-10-2003, 09:30 AM
|
#5
|
Member
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909
Original Poster
Rep:
|
Quote:
Originally posted by dharmender_rai
if the "." is placed arbitrarily then it would preceed or succeed "abcde" like ".abcde" or "abcde." . hence the general one would be :
if(/[\.]*a[\.]*b[\.]*c[\.]*d[\.]*e[\.]*/) {
## your logic goes here
}
|
Thanks for all the replies.I definately knew about this one but I think that you did not understand my question properly
So i would rephrase the question again-
Given a input (anyone from <STDIN> say that I put
$y=<STDIN> since STDIN can be any) now I have to match the <STDIN> say abcde with a word in a file which may be
a.bcd.e
or
ab.cd.e
or
abc.de
or
abcd.e
Now powerplane your RE would match any word containing .(or the word abcde) not $y.
Any ideas please?
Any perl gurus here!?
|
|
|
07-10-2003, 09:48 AM
|
#6
|
Member
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909
Original Poster
Rep:
|
Adding here ,I thought of spliting $y into the individual characters and doing exactly as you suggested,but the real problem would be that I would never know the length of $y,and the number of characters are not possibly known in advance.There is a perl RE with which whitespaces are ignored,I wondered that if instead of whitespaces,could I replace with '.' s.
Like we can replace seperator,(which determines the record or lines in a file by doing
undef $/
and
then using $/ as required).
If I try the first approach the number of lines of code would be enormous,then why should anyone say that Perl is the best language for parsing.
|
|
|
07-10-2003, 11:19 AM
|
#7
|
LQ Newbie
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26
Rep:
|
You just want to replace '.' with another character? use s///
Say replace '.' with '!':
my fault..., I need to get some sleep.....
Last edited by powerplane; 07-10-2003 at 01:56 PM.
|
|
|
07-10-2003, 12:45 PM
|
#8
|
Member
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349
Rep:
|
Ok. Let's assume that the user enters a random name, of any length. We don't know what characters this file will contain, but we do know that it has the potential to match a given series of filenames that may or may not contain arbitrarily placed '.'s through them. So, by this we can deduce this:
If all '.'s are removed from both the user input and from the filename, a match is possible.
Let's assume that the filenames in question are in an array @filenames. I'm going to fill it in with dummy names.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my(@filenames) = qw(bing bing.txt bingdink.donk bi.ng.txt b.i.n.g);
my($userInput, $tempFN);
my($matched) = 0; # no matches
print "Enter filename:\n";
chomp($userInput = <STDIN>);
# remove all .'s from users input
$userInput =~ s/\.//g;
for my $filename(@filenames) {
$tempFN = $filename;
# remove .'s from filename
$tempFN =~ s/\.//g;
# remove 'and last' to match more than one
$tempFN eq $userInput and print "$filename!\n" and $matched++ and last;
}
$matched > 0 and print "$matched matches were found\n:";
and it's output:
Code:
~/perl/rch> ./match.dot.pl
Enter filename:
bing.txt
bing.txt!
bi.ng.txt!
2 matches were found
Maybe not the most elegant way, but it works. (=
Last edited by TheLinuxDuck; 07-10-2003 at 12:46 PM.
|
|
|
07-10-2003, 12:48 PM
|
#9
|
Member
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349
Rep:
|
Btw, this very same thing could be done with a bash shell script, too, if perl isn't required.
|
|
|
07-10-2003, 02:01 PM
|
#10
|
LQ Newbie
Registered: Apr 2003
Location: P.R. China
Distribution: FreeBSD 4.8 stable
Posts: 26
Rep:
|
just use s/PATTERN_TO_FIND/REPLACEMENT/g is quite enough
I will give you neat example, say replace all '.' with '-' :
> perl -e 'while(<stdin>) {$_ =~ s/\./_/g; print "$_\n";}'
aa.sdf.werw.sdf.sdfsdf
aa_sdf_werw_sdf_sdfsdf
werwer.dsfdsl.sdfsdf.
werwer_dsfdsl_sdfsdf_
werwer.sdfsdfklwerlwe.sdfwerwpok-erwerlw;lrlwer
werwer_sdfsdfklwerlwe_sdfwerwpok-erwerlw;lrlwer
Last edited by powerplane; 07-10-2003 at 02:02 PM.
|
|
|
07-10-2003, 02:54 PM
|
#11
|
Member
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349
Rep:
|
Quote:
Originally posted by powerplane
just use s/PATTERN_TO_FIND/REPLACEMENT/g is quite enough
|
If you are refering to my post, then you will see that is exactly what I do in my code, except that I replace it with nothing, because of the matching that needs to be done.
If not, that doesn't solve the original dilemma of needing to match some given text to other text that may or may not have '.'s arbitrarily through the other text.
|
|
|
07-11-2003, 12:02 AM
|
#12
|
Member
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909
Original Poster
Rep:
|
Thanks all again for the replies.First of all I must say that the answer I am seeking is not there among the replies you have send.I understand(and *know*,see my geek code) that you could replace '.' with nothing ,with awk(bash or shell) or with s///,but I still think that you have not understood the problem properly.
So for the last time I am trying to explain the problem.
1.There is a file say dictionary files which contains 10 of thousands of words.Each word in the file contains '.' placed arbitrarily.The words are not of fixed length.
2.The user has to input through <STDIN> a word which has to be matched in the dictionary file.
3.I could do it by spliting the word given by the user and inserting '.' s at different places (with substr) and searching the file for the given word.
4.I just wonder that since approach 3 is long,there is a simple RE which makes it shorter.
5.I could add everyword in the dictionary file in a array,and remove the '.' by your approach but I don't think that is resource'ful' and good skill.
I just wonder if anybody could help.
Finally TheLinuxDuck,thanks for writing all the code ,I could only require a RE nothing else.I think that I can write the rest of the code.
|
|
|
07-11-2003, 09:32 AM
|
#13
|
Member
Registered: Sep 2002
Location: Tulsa, OK
Distribution: Slack, baby!
Posts: 349
Rep:
|
rch:
I must not get it, then... because the code I posted seems to fill and meet every aspect of your needs, to match user input to another word that may or may not have arbitrarily placed .'s in them.
As far as splitting is concerned, and placing .'s into the user's input in various places, I think the code will take much longer to execute, because it will have to try all the different combinations of . placement on every word in the dictionary. By simply removing the .'s, you only have one test, and that one test will tell you if it matches or not.
And, AFA slurping all the dictionary contents into an array, well, that's easy enough to fix... don't slurp them all at once.
Code:
my($item);
open IN, "dictionary" or die "Can't open: $!\n";
while(<IN>) {
/^$/ and next;
chomp;
$item = $_;
$item =~ s/\.//g;
...
}
And presto, no more slurping.
Evidently, though, there is something I'm missing, because you say this won't work.
I hope that you find an expediant solution!
|
|
|
07-11-2003, 11:48 PM
|
#14
|
Member
Registered: Jun 2002
Location: Ontario, Canada
Distribution: RH8.0
Posts: 65
Rep:
|
Or you could eliminate the use of a reg-ex by using a Hash
for example :
# where $dict = "/home/mydictionaryfile.txt" ;
open (DICT, "< $dict") or die "Can't Open File: $dict $!\n";
while(<DICT>) {
chomp($_);
$dict{$_} = "Filler";
}
$mySTDIN =<STDIN>;
chomp($mySTDIN);
if (defined ($dict{$mySTDIN})){
# Is there an entry in the hash with the same filename ?
print "Match located\n";
} else {
print "No match found\n";
}
Should work, not sure if its what you want as it doesnt involve a regex.
|
|
|
07-12-2003, 12:00 AM
|
#15
|
Member
Registered: Feb 2003
Location: Santa Clara,CA
Distribution: Mandriva
Posts: 909
Original Poster
Rep:
|
Thanks everyone(specially TheLinuxDucks and gdrobson for your ideas).I should kick my ass for not seeing such a simple answer to the query,well I have to modify a little bit that of TheLinuxDucks,but I can still use his RE.
Well I did not told you that the dictionary words all start with @(but that's hardly a problem) and 3 out of 4 words there are in this format
@n <requiredword> where n is any integer,but I think that I could (and have) done the rest (matching with @/spaces/integers with the word).
So again thanks a million.
|
|
|
All times are GMT -5. The time now is 07:36 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|