Improving the performance !!!
Hi,
I am working with a small piece of code to find common elements between two files. Code:
#! /opt/third-party/bin//perl Code:
#! /opt/third-party/bin//zsh The performance of the grep tool seems to outperform that of the perl. Am suprised that for perl file contents are read to memory and comparison are made from the memory elements and I supposed that should be faster (perl) But in all the samples of different records that I had run, grep seems to outperform the performance of perl. Any way to improve the performance of perl? Thanks for your inputs in advance !!! |
If you want fast:
Code:
grep -f small big |
Quote:
i noticed that if you do Code:
egrep "(item1|item2)" file.lst why is that ? |
For a start, use a hash instead of @file and say:
Code:
while ( $record2 = <tf1> ) |
Quote:
This is really slower than the 2 code snippets I had posted. Almost it takes 10 times the time taken with the other ones. This seems to be really slow. |
Define seems.
Try running this script with time ./scriptname to give you real asnwers to performance times. And you could be right - if small has thousands of lines in it, the regexp created could be snail slow. You can do things to "tune" a regexp. Are the contents of your "small" whole lines? Try prepending ^ to each line in small: Code:
This is a line |
Quote:
prepending with ^ seems to improve the performance - better >> grep -f small big small - 1000 big - 24000 without prepending >> time grep -f small big (16.647 sec) after prepending ^ >> time grep -f small big (7.575 sec) But whats the magic in prepending? Is that regexp able to arrive straight at the pattern ^word, could you please explain that? many thanks once again!!! |
^ matches the begining of a line. So, if it doesn't match directly afterwards, grep immediately skips to the next line without trying to with every possible string on the line. For example, an line with "foobar" matches both "foo" and "bar", and matches "^foo", but not "^bar". For "^bar", it sees that the first char is 'f', not 'b', so the line cannot match the pattern.
|
Quote:
I had tried the following code. But output seem to vary from other implementations. perl code: Code:
#! /opt/third-party/bin/perl Code:
>cat s Code:
>cat b Code:
>perl file.pl | grep "Got it" | sort -u Code:
>grep -f s b | sort -u I dont see anything weird in the code. a)Problem of both outputs not being same b)Ability to perform sort -u within the perl code Many thanks in advance :) |
Rather with slight modification,
Code:
#! /opt/third-party/bin/perl Still I need to figure out the reason why the previous implementation of perl is not working ??? :confused: |
All times are GMT -5. The time now is 08:18 AM. |