LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-29-2012, 01:57 PM   #16
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191

This version removes duplicates. I also removed the function as it added a large portion of time to the calculation:
Code:
#!/usr/bin/awk -f

BEGIN{ OFS = FS = "\t" }

FILENAME ~ /file/{
    if(FILENAME ~ /1/)
        pat1=pat1 (pat1?"|":"\\<(") $0
    else
        pat2=pat2 (pat2?"|":"\\<(") $0

    next
}

!x{ pat1 = pat1 ")\\>"
    pat2 = pat2 ")\\>"
    x=1
}

{
    str = $3
    $3 = $4 = ""
    l = k = 0

    while(match(substr(str,k), pat1, f)){
        if($3 !~ f[0])
            $3 = $3 ($3?",":"") f[0]
        k = k + f[0,"start"] + f[0,"length"]
    }
    if($3){
        while(match(substr(str,l), pat2, f)){
        if($4 !~ f[0])
            $4 = $4 ($4?",":"") f[0]
            l = l + f[0,"start"] + f[0,"length"]
        }
        if($4)
            print
    }
}
 
1 members found this post helpful.
Old 02-29-2012, 02:11 PM   #17
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
I found that mawk runs much faster than gawk. Last solution could probably be made even faster, but, unfortunately, mawk does not supports match() with three arguments.
 
Old 02-29-2012, 07:47 PM   #18
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
%%%%%

Last edited by Trd300; 05-01-2012 at 04:32 AM.
 
Old 02-29-2012, 08:04 PM   #19
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
%%%%%

Last edited by Trd300; 05-01-2012 at 04:32 AM.
 
Old 02-29-2012, 11:03 PM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Firstly, to run you only need to make it executable and run like so:
Code:
./search.awk file1.tab file2.tab bigdb.tab
Please find out which version of awk you are linked to:
Code:
ls -l /usr/bin/awk
 
Old 02-29-2012, 11:34 PM   #21
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Yep, I forgot to change rights !


My version of awk:
Code:
-rwxr-xr-x  1 root  wheel  238400 25 Jun  2010 /usr/bin/awk

Last edited by Trd300; 02-29-2012 at 11:40 PM.
 
Old 02-29-2012, 11:40 PM   #22
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Still the same error though:
Code:
./search.awk file1.tab file2.tab bigdb.tab
/usr/bin/awk: syntax error at source line 24 source file ./search.awk
 context is
        (match(substr(str,k), >>>  pat1, <<<  f)){
/usr/bin/awk: illegal statement at source line 25 source file ./search.awk
/usr/bin/awk: illegal statement at source line 25 source file ./search.awk
 
Old 02-29-2012, 11:53 PM   #23
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Hmmm ... is there any chance you have gawk installed as well? By the look of it I would say the straight form of awk does not support some the features I have used.
 
1 members found this post helpful.
Old 02-29-2012, 11:56 PM   #24
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
I don't have gawk installed.

Which version would suit?
 
Old 03-01-2012, 12:01 AM   #25
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
3.1.8 up to the latest
 
Old 03-01-2012, 10:14 AM   #26
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Hey I am not sure if you would have ruby installed, but I thought the following solution was sweet
Code:
#!/usr/bin/ruby

BEGIN{ $, = $; = "\t" }

while gets
    if $FILENAME[/file/]
        if $FILENAME[/1/]
            pat1 = "#{pat1}#{pat1 ? "|" : ""}#{$_.chomp}"
        else
            pat2 = "#{pat2}#{pat2 ? "|" : ""}#{$_.chomp}"
        end
    end

    if $FILENAME[/big/]
        line = $_.split

        puts line.join if (line[3] = line[2].scan(/\b(#{pat2})\b/).uniq.join(",")) && !line[3].empty? &&
                          (line[2] = line[2].scan(/\b(#{pat1})\b/).uniq.join(",")) && !line[2].empty?
    end
end
It does run a little slower than the gawk solution but it is also quite a bit tidier (although I am sure one of the ruby pros could probably do more )

This is run the same as the gawk solution.

Last edited by grail; 03-01-2012 at 10:19 AM.
 
Old 03-04-2012, 08:13 PM   #27
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
Sorry for the late response.

It works perfectly with gawk, and it is pretty fast.

Thanks a lot grail !

Thanks firstfire !
 
Old 03-04-2012, 11:20 PM   #28
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
%%%%%

Last edited by Trd300; 05-01-2012 at 04:32 AM.
 
Old 03-05-2012, 03:06 AM   #29
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
The main idea for the first part of the script is that it needs to know the names you are looking for. To help facilitate we can make the following change:
Code:
BEGIN{ OFS = FS = "\t"
       if(ARGC != 4){
           print "Wrong number of arguments"
           exit
       }

       file1 = ARGV[1]
       file2 = ARGV[2]
}

FILENAME == file1 || FILENAME == file2{
    if( FILENAME == file1 )
        pat1=pat1 (pat1?"|":"\\<(") $0
    else
        pat2=pat2 (pat2?"|":"\\<(") $0

    next
}
As long as you now place the files being used for the checking first you should be fine
 
1 members found this post helpful.
Old 03-05-2012, 07:41 PM   #30
Trd300
Member
 
Registered: Feb 2012
Posts: 89

Original Poster
Rep: Reputation: Disabled
%%%%%

Last edited by Trd300; 05-01-2012 at 04:33 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
SED how to find multiple patterns on a single line yaazz Programming 9 07-31-2009 04:20 AM
Perl only matching single-character regex patterns? Lordandmaker Programming 3 01-20-2009 08:59 AM
Finding matching patterns in 2 files herveld Programming 25 12-01-2008 03:35 PM
LXer: Regular expressions & search patterns LXer Syndicated Linux News 0 09-23-2007 02:51 AM
Remembering patterns and printing only those patterns using sed bernie82 Programming 5 05-26-2005 05:18 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 07:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration