LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-18-2013, 10:56 AM   #46
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 840

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380

Quote:
Originally Posted by ntubski View Post
When you give grep a list of regexps it checks each one for every line, so the runtime is O(Pn) (P is the number of patterns, n is number of lines to search in). This will be much faster with -F because then grep knows it has just plain strings and uses a much faster algorithm which is O(P+n). However, since we want to find occurrences only at the beginning of lines we can't use that in this case.

Here is an awk program which combines all the keywords into a single regexp so that the search should be O(P+n):

Code:
#!/usr/bin/awk -f

NR == FNR {
    for (i = 1; i <= length($0); i++) {
        char = substr($0, i, 1);
        if (!index(charsets[i], char))
            charsets[i] = charsets[i] char;
    }
}

function regexp_range(charset,    i, c, reg_range) {
    for (i = 1; i <= length(charset); i++) {
        c = substr(charset, i, 1);
        if (index("\\]-^", c))
            reg_range = reg_range "\\" c;
        else
            reg_range = reg_range c;
    }
    return "[" reg_range "]";
}

NR != FNR && !kw_regexp {
    kw_regexp = "^";
    for (i = 1; i in charsets; i++) {
        kw_regexp = kw_regexp regexp_range(charsets[i])
    }
    # print kw_regexp ; exit
}

NR != FNR && match($0, kw_regexp) {
    kw[substr($0, RSTART, RLENGTH)]++;
}

END {
    for(w in kw) {print w, kw[w];}
}
I'm not sure I fully understand your code, but I think your logic is flawed. If, for example, the keywoards searched are "ab" and "cde", it will create regex ^[ac][bd][e], which will match "abe", "ade" and "cbe", which we don't want, while not matching "ab", which is in our keyword list.
 
Old 04-18-2013, 11:52 AM   #47
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,256

Rep: Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421
Quote:
Originally Posted by millgates View Post
I'm not sure I fully understand your code, but I think your logic is flawed. If, for example, the keywoards searched are "ab" and "cde", it will create regex ^[ac][bd][e], which will match "abe", "ade" and "cbe", which we don't want, while not matching "ab", which is in our keyword list.
You're right. Perhaps in a language with compiled regexps, the keywords could combined into keyword1|keyword2|keyword3|... instead and the regexp engine would compile it into something efficient, awk would forced to recompile every time so it wouldn't work out there.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Creating an alias in ksh that uses grep and includes 'grep -v grep' doug248 Linux - Newbie 2 08-05-2012 03:07 PM
[SOLVED] run ps|grep command by script/command line ... ERROR: Unsupported option (BSD syntax) masuch Programming 4 05-23-2012 05:13 AM
How to pass the result of a command to another command (like grep) desb01 Programming 4 06-25-2009 01:09 PM
Help me in Grep Command + cd command in single line JeiPrakash Linux - Newbie 3 05-27-2008 05:16 AM
grep command itz2000 Linux - Newbie 2 09-21-2005 08:06 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration