LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Perl: HOA and redundancy (https://www.linuxquestions.org/questions/programming-9/perl-hoa-and-redundancy-471723/)

PB0711 08-07-2006 04:48 PM

Perl: HOA and redundancy
 
Hi all,

I have a Hash of Arrays and in the array there is a lot of redundancy. I want to take all of this out. How can I do that. I know I could access the array and then RE the 1st part of the array and then do if's to see if it is there or not but won't this take, like ages?

Maybe someone knows of a good CPAN script to help me. I looked couldn't see anything. :study:
Cheers,

chrism01 08-07-2006 08:55 PM

Given that perl is compiled before it's run, unless you have enormous amts of data, it shouldn't take that long.
We really need to see a minimal version of the code to make a judgement though.

PB0711 08-07-2006 09:31 PM

Sorry, here is the script.
Code:

while (my $line=<Prot>){
        if ($line=~/<prot_desc>(.*)<\/prot_desc>/){ # finds protein name
                $pro_name=$1;
                @pep=0;
        }elsif($line=~/<pep_seq>(.*)<\/pep_seq>/){# finds pep sequence
                push (@pep, $1);
        }
        $HOA_protein{$pro_name}=[@pep]; # makes a HOA
}

#the above parses the file into a Hash of Arrays!

for my $i (0 .. $#{$HOA_protein{$pro_name}}){
        if ($HOA_protein{$pro_name}[$i] eq $HOA_protein{$pro_name}[$i+1]){
                $HOA_protein{$pro_name}[$i] = undef;
        }
}
#above sorts out the redundancy

However, the redundancy for if doesn't work.
Yea, tried it and is goes pretty quickly, it's an XML file. The making of the HOA was a lot better than I thought it would be. :rolleyes:

bigearsbilly 08-08-2006 08:42 AM

I don't get where the real prob is...

Is there still one?
test data would be nice.

PB0711 08-08-2006 04:49 PM

So I got a hold of both the perldoc's
perldoc -q redundancy
and List::MoreUtils

at the moment I'm using uniq() which is a function that the library supplies.


All times are GMT -5. The time now is 10:51 PM.