perl duplicates hash array

casperdaghost · 09-24-2012, 10:42 PM

SO I have a file - lets call it coocoo

GRNB is missing a regsho on capser0001
SAN-F is missing a regsho on capser0001
PSA-V is missing a regsho on capser0001
BK-C is missing a regsho on capser0001
GRNB is missing a regsho on capser0040
SAN-F is missing a regsho on capser0040
PSA-V is missing a regsho on capser0040
BK-C is missing a regsho on capser0040

All i want is the first column (but don't want to use awk)

so i make a script in perl

Code:

#!/usr/bin/perl
my $column; 
my $first_column

open REGSHO, "/tmp/coocoo" or die $!;
while (<REGSHO>) {
        chomp $_ ;
        my @column = split /\s/, $_;
        push @first_column , $column[0];


}
foreach my $elem (@first_column )  {
                print "$elem\n"
}

which gives me this

Code:

GRNB
SAN-F
PSA-V
BK-C
GRNB
SAN-F
PSA-V
BK-C

YAY!!!!!!!!!!!!

However I want to keep this unique - so i put i a hash to get rid of the dupes.

Code:

#!/usr/bin/perl
my %seen;
my $column; 
my $first_column

open REGSHO, "/tmp/coocoo" or die $!;
while (<REGSHO>) {
        chomp $_ ;
        my @column = split /\s/, $_;
        push @first_column , $column[0];


}

foreach my $elem (@first_column )  {
        next if $seen{ $elem }++;
        push @first, $elem;
}

foreach my $first (@first) {
        print $first;
        print "\n" ;
}

which gives me this --- BOOOOOO!!!!

Code:

GRNB
SAN-F
PSA-V
BK-C
GRNB

how do i get rid of the first duplicate GRNB ?

this is so, so easy in bash.

amboxer21 · 09-24-2012, 11:55 PM

Why don't you want to use Awk?

Code:

awk '{print $1}' coocoo.txt | awk '!a[$0]++'

pan64 · 09-25-2012, 01:16 AM

in perl you need to insert -w in the first line, and also you need to add use strict, so your script should begin with:

Code:

#!/usr/bin/perl -w
use strict;

....

You will find a lot of problems, like:
my $first_column is not used at all, there is no ; at the end of this line (there are other problems too). You introduced a hash, so use it:

Code:

open REGSHO, "/tmp/coocoo" or die $!;
while (<REGSHO>) {
        chomp $_ ;
        my @column = split /\s/, $_;
        $seen{$column[0]} = 1;
}
close REGSHO or die $!;

# set separator
my $, = "\n";
print keys %seen;

(not tested)

grail · 09-25-2012, 06:19 AM

And if we were really going to use awk, definitely no need to call it twice:

Code:

awk '!_[$1]++{print $1}' file

markush · 09-25-2012, 03:14 PM

Hi,

with Perl you don't even need an array,

Code:

#!/usr/bin/perl

use strict ;
use warnings ;

my %seen ;

open REGSHO, "./coocoo" or die $!;
while (<REGSHO>) {
        $seen{ (split /\s/, $_ )[0] } = 1; # split is always in list-context and can therefore be used as an array
}
close REGSHO ;

# set separator
$, = "\n";
print keys %seen;
print "\n" ;

also you don't need chomp, chomp removes the linefeed at the end of a line, but you're only dealing with the first column (which of course has no linefeed).

As pan64 wrote, use strict and warnings!

Markus