LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-27-2004, 07:19 PM   #1
GATTACA
Member
 
Registered: Feb 2002
Location: USA
Distribution: Fedora, CENTOS
Posts: 201

Rep: Reputation: 32
Perl Regular Expression dilemma


Hello.
I hope I'm not posting a topic that has already been beaten to death and back again, but if it has I apologize.

Lets say I have a string of text like this:
$var = "ATATATATAGGG";

I want to count the frequency of each 3 letter "word" possible in this string (for instance, the first one would be: ATA, the first 3 characters in the string. There are 2 ways I know of doing this and they each give different results. Here they are:
# Method 1: While-loop
$freq = 0;
while($var =~ m/ATA/g) { $freq++; }

This will assign $freq the value of 2;

The second method is a "sliding window that moves across the string one character at a time:
# Method 2: sliding-window
$freq = 0;
for($i = 0; $i < length($var) - 3; $i++) {
$word = substr($var, $i, 3);
if($word =~ m/ATA/) { $freq++; }
}

This will give $freq the value of 4.

The sliding window approach is what I want to use (it can potentially give me higher frequencies). The problem is that Method 2 is significantly slower than Method 1 for strings that are really long (I'm planning on using this on strings of length 500,000).

So I want to have my cake and eat it too. Is there a way I can get the speed of Method 1 with the higher counting method of Method 2? (ie: Is there a perl syntax that I can use in Method 1 that will give me the same number of counts in $freq as Method 2? ) Alternatively is there a way I can speed up Method 2?

Thanks for any suggestions in advance.
All help is appreciated.

GATTACA
 
Old 03-27-2004, 07:48 PM   #2
Muzzy
Member
 
Registered: Mar 2004
Location: Denmark
Distribution: Gentoo, Slackware
Posts: 333

Rep: Reputation: 30
If you want optimised searching you could use a technique like 'The Boyer-Moore Fast String Searching Algorithm' http://www.cs.utexas.edu/users/moore...ing/index.html

You will have to adapt this because it only finds the first match, but it should be easy to change it so it finds all matches. It is sublinear (not all characters need to be checked and the longer your search string, the faster the algorithm performs).

I do not know of a fancy way to implement this in perl.

Hope this help. Good luck!
Muzzy.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
perl regular expression problem true_atlantis Programming 4 05-27-2009 06:35 AM
Perl regular expression issue zikhermm Programming 7 09-23-2005 03:48 PM
Having trouble with a perl regular expression... jayemef Programming 3 08-25-2005 11:00 PM
Perl Regular Expression rch Programming 14 07-11-2003 11:00 PM
using a perl regular expression in php markus1982 Programming 5 11-18-2002 02:31 PM


All times are GMT -5. The time now is 10:53 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration