ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am working with a small piece of code to find common elements between two files.
Code:
#! /opt/third-party/bin//perl
open(sf1,small) || die "couldn't open the file small file!";
$cnt = 1;
@file = ();
while ( $record1 = <sf1> ) {
if( $cnt <= 1000 ) {
push(@file, $record1);
$cnt += 1;
}
else {
$cnt = 1;
open(tf1,big) || die "couldn't open the file <big file!>";
while ( $record2 = <tf1> ) {
foreach(@file) {
if( $record2 eq $_ ) {
print "Got it : $record2" ;
}
}
}
close(tf1);
@file = ($record1);
}
}
close(sf1);
print "\n";
exit 0;
and the another piece with grep,
Code:
#! /opt/third-party/bin//zsh
while read line
do
grep $line big 2>/dev/null 1>&2
if [ $? -eq 0 ]
then
echo $line
fi
done < small
exit 0
The performance of the grep tool seems to outperform that of the perl. Am suprised that for perl file contents are read to memory and comparison are made from the memory elements and I supposed that should be faster (perl)
But in all the samples of different records that I had run, grep seems to outperform the performance of perl.
Any way to improve the performance of perl?
Thanks for your inputs in advance !!!
This is really slower than the 2 code snippets I had posted. Almost it takes 10 times the time taken with the other ones. This seems to be really slow.
Try running this script with time ./scriptname to give you real asnwers to performance times.
And you could be right - if small has thousands of lines in it, the regexp created could be snail slow. You can do things to "tune" a regexp. Are the contents of your "small" whole lines? Try prepending ^ to each line in small:
Try running this script with time ./scriptname to give you real asnwers to performance times.
And you could be right - if small has thousands of lines in it, the regexp created could be snail slow. You can do things to "tune" a regexp. Are the contents of your "small" whole lines? Try prepending ^ to each line in small:
Code:
This is a line
^This is a line
Thats great Jim,
prepending with ^ seems to improve the performance - better
>> grep -f small big
small - 1000
big - 24000
without prepending
>> time grep -f small big (16.647 sec)
after prepending ^
>> time grep -f small big (7.575 sec)
But whats the magic in prepending?
Is that regexp able to arrive straight at the pattern ^word,
could you please explain that?
^ matches the begining of a line. So, if it doesn't match directly afterwards, grep immediately skips to the next line without trying to with every possible string on the line. For example, an line with "foobar" matches both "foo" and "bar", and matches "^foo", but not "^bar". For "^bar", it sees that the first char is 'f', not 'b', so the line cannot match the pattern.
Thanks for the reply!
I had tried the following code.
But output seem to vary from other implementations.
perl code:
Code:
#! /opt/third-party/bin/perl
open(fh, "s") || die "unable to open the file <small>";
%fileHash = (-100, 'somejunk');
$i = 1;
while( $content = <fh> )
{
if( $i <= 2 ) {
$fileHash{($content)} = $i;
$i++;
}
else {
print "the count is $i\n";
foreach $k1 ( sort keys (%fileHash) ) {
print "key is $k1 and value is $fileHash{$k1}";
}
$i = 1;
open(file, "b") || die "Unable to open the file <big>";
while ( $rec = <file> ) {
print "record is $rec\n";
print "Got it:$rec" if exists $fileHash{$rec} ;
}
close(file);
%fileHash = ();
$fileHash{($content)} = $i;
}
}
close(fh);
print "i val is $i\n";
open(file, "b") || die "Unable to open the file <big>";
while ( $rec = <file> ) {
print "$rec" if exists $fileHash{$rec} ;
}
close(file);
%fileHash = ();
exit 0
Following are the sample files I had used.
Code:
>cat s
and then
code
adding
do
something extra here
i would
peculiardino
Code:
>cat b
adding
do
adding
code
do
i would
like to addd
do
i would
like to addd
more
with this i test
wow is that working
adding
and then
code
like to addd
more
with this i test
wow is that working
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.