Latest LQ Deal: Linux Power User Bundle
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 03-27-2004, 08:19 PM   #1
Registered: Feb 2002
Location: USA
Distribution: Fedora, CENTOS
Posts: 209

Rep: Reputation: 32
Perl Regular Expression dilemma

I hope I'm not posting a topic that has already been beaten to death and back again, but if it has I apologize.

Lets say I have a string of text like this:

I want to count the frequency of each 3 letter "word" possible in this string (for instance, the first one would be: ATA, the first 3 characters in the string. There are 2 ways I know of doing this and they each give different results. Here they are:
# Method 1: While-loop
$freq = 0;
while($var =~ m/ATA/g) { $freq++; }

This will assign $freq the value of 2;

The second method is a "sliding window that moves across the string one character at a time:
# Method 2: sliding-window
$freq = 0;
for($i = 0; $i < length($var) - 3; $i++) {
$word = substr($var, $i, 3);
if($word =~ m/ATA/) { $freq++; }

This will give $freq the value of 4.

The sliding window approach is what I want to use (it can potentially give me higher frequencies). The problem is that Method 2 is significantly slower than Method 1 for strings that are really long (I'm planning on using this on strings of length 500,000).

So I want to have my cake and eat it too. Is there a way I can get the speed of Method 1 with the higher counting method of Method 2? (ie: Is there a perl syntax that I can use in Method 1 that will give me the same number of counts in $freq as Method 2? ) Alternatively is there a way I can speed up Method 2?

Thanks for any suggestions in advance.
All help is appreciated.

Old 03-27-2004, 08:48 PM   #2
Registered: Mar 2004
Location: Denmark
Distribution: Gentoo, Slackware
Posts: 333

Rep: Reputation: 30
If you want optimised searching you could use a technique like 'The Boyer-Moore Fast String Searching Algorithm'

You will have to adapt this because it only finds the first match, but it should be easy to change it so it finds all matches. It is sublinear (not all characters need to be checked and the longer your search string, the faster the algorithm performs).

I do not know of a fancy way to implement this in perl.

Hope this help. Good luck!


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
perl regular expression problem true_atlantis Programming 4 05-27-2009 07:35 AM
Perl regular expression issue zikhermm Programming 7 09-23-2005 04:48 PM
Having trouble with a perl regular expression... jayemef Programming 3 08-26-2005 12:00 AM
Perl Regular Expression rch Programming 14 07-12-2003 12:00 AM
using a perl regular expression in php markus1982 Programming 5 11-18-2002 03:31 PM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:02 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration