LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Perl Regular Expression dilemma (https://www.linuxquestions.org/questions/programming-9/perl-regular-expression-dilemma-163185/)

GATTACA 03-27-2004 07:19 PM

Perl Regular Expression dilemma
 
Hello.
I hope I'm not posting a topic that has already been beaten to death and back again, but if it has I apologize.

Lets say I have a string of text like this:
$var = "ATATATATAGGG";

I want to count the frequency of each 3 letter "word" possible in this string (for instance, the first one would be: ATA, the first 3 characters in the string. There are 2 ways I know of doing this and they each give different results. Here they are:
# Method 1: While-loop
$freq = 0;
while($var =~ m/ATA/g) { $freq++; }

This will assign $freq the value of 2;

The second method is a "sliding window that moves across the string one character at a time:
# Method 2: sliding-window
$freq = 0;
for($i = 0; $i < length($var) - 3; $i++) {
$word = substr($var, $i, 3);
if($word =~ m/ATA/) { $freq++; }
}

This will give $freq the value of 4.

The sliding window approach is what I want to use (it can potentially give me higher frequencies). The problem is that Method 2 is significantly slower than Method 1 for strings that are really long (I'm planning on using this on strings of length 500,000).

So I want to have my cake and eat it too. Is there a way I can get the speed of Method 1 with the higher counting method of Method 2? (ie: Is there a perl syntax that I can use in Method 1 that will give me the same number of counts in $freq as Method 2? ) Alternatively is there a way I can speed up Method 2?

Thanks for any suggestions in advance.
All help is appreciated.

GATTACA

Muzzy 03-27-2004 07:48 PM

If you want optimised searching you could use a technique like 'The Boyer-Moore Fast String Searching Algorithm' http://www.cs.utexas.edu/users/moore...ing/index.html

You will have to adapt this because it only finds the first match, but it should be easy to change it so it finds all matches. It is sublinear (not all characters need to be checked and the longer your search string, the faster the algorithm performs).

I do not know of a fancy way to implement this in perl.

Hope this help. Good luck!
Muzzy.


All times are GMT -5. The time now is 11:05 PM.