LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Perl - Making this code more efficient (http://www.linuxquestions.org/questions/programming-9/perl-making-this-code-more-efficient-886902/)

Lost_Oracle 06-17-2011 10:47 AM

Perl - Making this code more efficient
 
Hello everyone,

I am fairly new to perl, but have a fairly strong background in C++. So, my perl usually looks like C++ when I'm done with it. This program works just fine, I was just wondering if anyone had any tips or ideas for making it more compact or efficient.

The following code goes through a text file (entries separated by spaces) line by line, that looks like this.

Code:

Number of Files    IDnumber    TYPE
2                  001001     
1                  001001      type1
1                  001001      type2
2                  001001      type3
3                  001002      type2
4                  001002      type3
.
.
etc.

A file ID may appear up to four times, once with each of the three types (denoted by literal strings "type1" "type2" etc )and possibly once with no type. The text file is ordered by IDnumber, so all 001001 files will appear consecutively in the text file.

The goal is to read in all of this information and output it like so:

Code:

IDnumber type1 type2 type3 total
001001  1    1    2      4
Percent  25    25    50    100
.
.

Here is the current code:

Code:

#!/usr/bin/perl

#Open input and output files
open IN, "input.txt" or die $!;
open ( OUT, '>>output.txt');
#print headings
print OUT "IDnumber, type1, type2, type3, total\n";
#skip first line of file
$line = <FILE>;
#read in first line
$line = <FILE>;
@data = split (" ", $line );
$type1_count = 0;
$type2_count = 0;
$type3_count = 0;
$notype_count = 0;
$old_id = 0;
$old_num = 0;
$old_type = "";
$new_num = @data[0];
$new_id = @data[1];
$new_type = @data[2];
if ( $new_type =~ "type2")
{
        $type2_count = $new_num;
}
elsif ( $new_type =~ "type1")
{
        $type1_count = $new_num;
}
elsif ( $new_type =~ "type3" )
{
        $type3_count = $new_num;
}
else
{
        $notype_count = $new_num;
}

while ( $line = <FILE>)
{
        $old_num = $new_num;
        $old_id = $new_id;
        $old_type = $new_type;
        @data = split(" ", $line);
        $new_num = @data[0];
        $new_id = @data[1];
        $new_type = @data[2];

        if ( $new_id!=$old_id )
        {
                $sum = $type1_count + $type2_count + $type3_count;
                if ( $type1_count == 0 )
                {
                        $type1_Per = 0;
                }
                else
                {
                        $type1_Per = $type1_count/$sum*100;
                }
                if ( $type2_count == 0 )
                {
                        $type2_Per = 0;
                }
                else
                {
                        $type2_Per =  $type2_count/$sum*100;
                }
                if ( $type3_count == 0 )
                {
                        $type3_Per = 0;
                }
                else
                {
                        $type3_Per = $type3_count/$sum*100;
                }
                $Per_sum= $type1_Per+$type2_Per+$type3_Per;
                if ( $sum != 0 )
                {
                        print OUT "$old_id, $type3_count, $type1_count, $type2_count, $sum\n";
                        #print OUT "% Count, $type3_Per%, $type1_Per%, $type2_Per%, $Per_sum%\n";
                }       
                $type1_count = 0;
                $type2_count = 0;
                $type3_count = 0;
                $other_count = 0;
                $sum = 0;
        }

        if ( $new_type =~ "type2")
        {
                      $type1_count = $new_num;
        }
        elsif ( $new_type =~ "type1")
        {
                $type1_count = $new_num;
        }
        elsif ( $new_type =~ "type3" )
        {
                      $type3_count = $new_num;
        }
        else
        {
                $other_count = $new_num;
        }
}
        $old_num = $new_num;
        $old_id = $new_id;
        $old_type = $new_type;
        $sum = $type1_count + $type2_count + $type3_count ;

        if ( $type1_count == 0 )
        {
                $type1_Per = 0;
        }
        else
        {
                $type1_Per = $type1_count/$sum*100;
        }
        if ( $type2_count == 0 )
        {
                $type2_Per = 0;
        }
        else
        {
                $type2_Per =  $type2_count/$sum*100;
        }
        if ( $type3_count == 0 )
        {
                $type3_Per = 0;
        }
        else
        {
                $type3_Per = $type3_count/$sum*100;
        }
        $Per_sum= $L1G_Per+$L1Gt_Per+$L1T_Per;

        if ( $sum != 0 )
        {
                print OUT "$old_id, $type1_count, $type2_count, $type3_count, $sum\n";
                print OUT "% Count, $type1_Per%, $type2_Per%, $type3_Per%, $Per_sum%\n";
        }


markush 06-17-2011 11:50 AM

You should read about hashes in Perl, here "hash of hashes". IDnumber is a Hash and every IDnumber is a hash with entries type1, type2 and so on.

For the input
Code:

open IN, "input.txt" or die $!;
while (<IN>) {
        ($number, $idnumber, $type) = split(" ", $_);
            $id_hash{$idnumber}{$type} += $number;
            # and so on
}

not tested!,

Markus

Lost_Oracle 06-17-2011 12:52 PM

Thanks for the tip markush. I had thought about trying to implement a hash, but I didn't think about a hash of hashes. Also, I didn't know you could use that form of syntax for the split function. That should help compress the code a bit.

I'm still open to more suggestions, so I won't mark this thread as closed just yet. Also, does anyone have comments performance wise on hashes of hashes vs. arrays the way I originally implemented them? I'm going to switch to hashes unless someone comes up with something even better, but I'm just curious what people have to say on the matter.

markush 06-17-2011 04:03 PM

Quote:

Originally Posted by Lost_Oracle (Post 4388755)
...Also, does anyone have comments performance wise on hashes of hashes vs. arrays the way I originally implemented them? ...

Hashes in Perl are more efficient than arrays. In general: regular expressions and hashes yield very fast Perlcode and you should prefer these if possible.
Also Perls motto is TMTOWTDI (there's more than one way to do it), there is no "best" solution in most cases. But perlcode can be very short and often the shorter code is the more efficient one. The downside of short code is the readability. I use Perl since Version 4, but haven't done very much Perlprogramming, when I read code which I've written some month ago, I often experience difficulties to understand what I wrote. This is the reason why I do more coding with Ruby. Otherwise Perl is a great language, and there are many people around who wrote brilliant perlcode, it is a very nice community.
I'd recommend to read the Camel-Book, it is very well written and gives a deep insight not only into Perl, but also into Unix/Linux in general: http://oreilly.com/catalog/9780596000271

Markus


All times are GMT -5. The time now is 12:29 AM.