LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-19-2006, 12:19 PM   #1
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Rep: Reputation: 30
Improving the performance !!!


Hi,

I am working with a small piece of code to find common elements between two files.

Code:
#! /opt/third-party/bin//perl

open(sf1,small) || die "couldn't open the file small file!";

$cnt = 1;
@file = ();

while ( $record1 = <sf1> ) {
  if( $cnt <= 1000 ) {
    push(@file, $record1);
    $cnt += 1;
  }

  else {
    $cnt = 1;
    open(tf1,big) || die "couldn't open the file <big file!>";
    while ( $record2 = <tf1> ) {
        foreach(@file) {
          if( $record2 eq $_ ) {
            print "Got it : $record2" ;
          }
        }
    }
    close(tf1);
    @file = ($record1);
  }
}
close(sf1);

print "\n";
exit 0;
and the another piece with grep,

Code:
#! /opt/third-party/bin//zsh

while read line
do
grep $line big 2>/dev/null 1>&2

if [ $? -eq 0 ]
then
echo $line
fi

done < small

exit 0

The performance of the grep tool seems to outperform that of the perl. Am suprised that for perl file contents are read to memory and comparison are made from the memory elements and I supposed that should be faster (perl)
But in all the samples of different records that I had run, grep seems to outperform the performance of perl.

Any way to improve the performance of perl?
Thanks for your inputs in advance !!!
 
Old 12-19-2006, 12:48 PM   #2
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 34
If you want fast:
Code:
grep -f small big
 
Old 12-19-2006, 02:22 PM   #3
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,104

Rep: Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636Reputation: 636
Quote:
Originally Posted by jim mcnamara
If you want fast:
Code:
grep -f small big
another experiment that is related to the original post:

i noticed that if you do
Code:
egrep "(item1|item2)" file.lst
against a million record file, it is a big bottleneck compared to putting the items in the quotes into a small file.

why is that ?
 
Old 12-19-2006, 05:09 PM   #4
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,301

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
For a start, use a hash instead of @file and say:

Code:
    while ( $record2 = <tf1> ) 
    {
        if( exists($file_hash{$record2} )
        {
            print "Got it : $record2" ;
        }
    }
http://perldoc.perl.org/search.html?q=hash+example

Last edited by chrism01; 12-19-2006 at 05:10 PM.
 
Old 12-20-2006, 03:01 AM   #5
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by jim mcnamara
If you want fast:
Code:
grep -f small big

This is really slower than the 2 code snippets I had posted. Almost it takes 10 times the time taken with the other ones. This seems to be really slow.
 
Old 12-20-2006, 10:21 AM   #6
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 34
Define seems.

Try running this script with time ./scriptname to give you real asnwers to performance times.

And you could be right - if small has thousands of lines in it, the regexp created could be snail slow. You can do things to "tune" a regexp. Are the contents of your "small" whole lines? Try prepending ^ to each line in small:
Code:
This is a line
^This is a line
 
Old 12-20-2006, 10:46 AM   #7
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by jim mcnamara
Define seems.

Try running this script with time ./scriptname to give you real asnwers to performance times.

And you could be right - if small has thousands of lines in it, the regexp created could be snail slow. You can do things to "tune" a regexp. Are the contents of your "small" whole lines? Try prepending ^ to each line in small:
Code:
This is a line
^This is a line
Thats great Jim,

prepending with ^ seems to improve the performance - better

>> grep -f small big

small - 1000
big - 24000

without prepending
>> time grep -f small big (16.647 sec)

after prepending ^
>> time grep -f small big (7.575 sec)

But whats the magic in prepending?
Is that regexp able to arrive straight at the pattern ^word,
could you please explain that?

many thanks once again!!!
 
Old 12-20-2006, 11:04 AM   #8
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,014

Rep: Reputation: 115Reputation: 115
^ matches the begining of a line. So, if it doesn't match directly afterwards, grep immediately skips to the next line without trying to with every possible string on the line. For example, an line with "foobar" matches both "foo" and "bar", and matches "^foo", but not "^bar". For "^bar", it sees that the first char is 'f', not 'b', so the line cannot match the pattern.
 
Old 12-20-2006, 11:12 AM   #9
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by chrism01
For a start, use a hash instead of @file and say:

Code:
    while ( $record2 = <tf1> ) 
    {
        if( exists($file_hash{$record2} )
        {
            print "Got it : $record2" ;
        }
    }
http://perldoc.perl.org/search.html?q=hash+example
Thanks for the reply!
I had tried the following code.
But output seem to vary from other implementations.

perl code:

Code:
#! /opt/third-party/bin/perl

open(fh, "s") || die "unable to open the file <small>";

%fileHash = (-100, 'somejunk');
$i = 1;

while( $content = <fh> )
{
  if( $i <= 2 ) {
    $fileHash{($content)} = $i;
    $i++;
  }
  else {
    print "the count is $i\n";
    foreach $k1 ( sort keys (%fileHash) ) {
      print "key is $k1 and value is $fileHash{$k1}";
    }
    $i = 1;
    open(file, "b") || die "Unable to open the file <big>";
    while ( $rec = <file> ) {
        print "record is $rec\n";
        print "Got it:$rec" if exists $fileHash{$rec} ;
    }
    close(file);
    %fileHash = ();
    $fileHash{($content)} = $i;
  }
}
close(fh);

print "i val is $i\n";

open(file, "b") || die "Unable to open the file <big>";
while ( $rec = <file> ) {
  print "$rec" if exists $fileHash{$rec} ;
}
close(file);

%fileHash = ();

exit 0
Following are the sample files I had used.

Code:
>cat s
and then
code
adding
do
something extra here
i would
peculiardino

Code:
>cat b
adding
do
adding
code
do
i would
like to addd
do
i would
like to addd
more
with this i test
wow is that working
adding
and then
code
like to addd
more
with this i test
wow is that working
Code:
>perl file.pl | grep "Got it" | sort -u
Got it:adding
Got it:and then
Got it:code
Code:
>grep -f s b | sort -u
adding
and then
code
do
i would
What am actually trying is to emulate grep -f <file1> <file2> in perl code.

I dont see anything weird in the code.

a)Problem of both outputs not being same
b)Ability to perform sort -u within the perl code

Many thanks in advance
 
Old 12-20-2006, 12:23 PM   #10
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Original Poster
Rep: Reputation: 30
Rather with slight modification,

Code:
#! /opt/third-party/bin/perl

open(fh, "small") || die "unable to open the file <small>";

%fileHash = (-100, 'somejunk');

$i = 1;
while( chomp($content = <fh>) )
{
  $fileHash{$content} = $i;
  $i += 1;
}
close(fh);

open(file, "big") || die "Unable to open the file <big>";
while ( chomp($rec = <file>) ) {
  print "\nMatch:$rec" if exists $fileHash{$rec} ;
}
close(file);

%fileHash = ();

exit 0
But am afraid, whether I can create such large Hashes?

Still I need to figure out the reason why the previous implementation of perl is not working ???
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Firefox -- improving performance 1kyle Suse/Novell 2 03-23-2006 02:36 AM
Improving Fedora performance? Fenster Fedora 11 10-09-2004 10:36 PM
improving 3d performance ababkin Linux - Hardware 1 04-08-2004 11:33 PM
improving disk performance Henry_1 Linux - Laptop and Netbook 1 11-10-2003 05:28 PM
Improving hd performance psyklops Linux - General 2 08-21-2003 08:19 PM


All times are GMT -5. The time now is 03:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration