LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Perl .check if data are exist in the array before adding new data (https://www.linuxquestions.org/questions/programming-9/perl-check-if-data-are-exist-in-the-array-before-adding-new-data-636059/)

ufmale 04-17-2008 03:20 PM

Perl .check if data are exist in the array before adding new data
 
I am working on a perl script to store data in an array.
This array should not have any duplicated entries.
Is there a another data struture in perl i should use or is there a way to quickly check the entry in the array before adding a new data that may already exist.

krizzz 04-17-2008 04:07 PM

If you are to lazy to write a simple loop you could use grep.

indienick 04-17-2008 04:38 PM

Code:

#!/usr/bin/perl -w

# Assuming an array called '@vals' has been created
#Assuming '$x' has been assigned a value


my $addp = 0;

foreach $i (@val)
  if ($i == $x && $addp != 0) $addp = 0;

# To add it to the right of the array
push(@vals, $x) if ($addp == 1);

# Or to add it to the left of the array
unshift(@vals, $x) if ($addp == 1);

Please be warned, I have NOT written in Perl for a very long time (I generally write in Common Lisp and Java). I couldn't remember Perl's boolean operators (after years of Lisp's T or NIL), and I believe unshift() is the complement to push().

osor 04-18-2008 05:24 PM

Quote:

Originally Posted by ufmale (Post 3124444)
Is there a another data struture in perl i should use

A good choice of data structure here will save you some extra work later on. Usually, when one is looking for a data structure with uniqueness in its entries, you use a hash in Perl (with the data in question as the hash key). This makes it very easy to check if the data is already in the hash:
Code:

if($hash{$data})
But in most cases you don’t even have to check since assigning to the same key will overwrite the previous entry (leaving you with only one entry)

Unfortunately, it will also destroy any ordering implicit in using an array. There are workarounds for this. For example, having the values of the hash correspond to an ordinal which would have been the index of an array holding the same data except with multiple entries (this is where determining whether to check or not to check determines the final order of a repeated entry—if you check, the order will correspond to the initial position of any repeated data in your input; if you don’t check the order will correspond to the final position of any repeated data in your input). Sequential access is pretty slow (since you have to sort the hash keys by value whenever you need to access the ‘nth’ one).

So it basically comes down to this: which is more important in your situation—order preservation or uniqueness of entries? If uniqueness of entries, use hashes.

bigearsbilly 04-19-2008 08:21 AM

you should use a hash if you don't want duplicates.

matthewg42 04-19-2008 08:48 AM

osor's point is a good one. Unless I know for certain that a data set will be very small, I use hashes to test for membership of a set. You should be using strict for most programs (to improve reliability). In this case, osor's code fragment should be changed to use the defined() function. e.g.

Code:

#!/usr/bin/perl

use strict;
use warnings;

my %set_of_stuff;

foreach my $key (qw(you can add a bunch of strings as keys to the has like this)) {
    $set_of_stuff{$key} = 1;
}

# Now test to see if some words are in the set_of_stuff
foreach my $key (qw(i can has cheezeburger)) {
    if (defined($set_of_stuff{$key})) {  print "FOUND:    $key\n"; }
    else { print "NOT FOUND: $key\n"; }
}

The problem with this approach is that you lose the ordering of the keys which you added to the set, and you do not store duplicates. You can set the value of the has for a given key to the number of occurrences by incrementing it each time you add a key of a given value... this way storing the number of duplicates, but you still lose the order.

osor 04-19-2008 02:59 PM

Quote:

Originally Posted by matthewg42 (Post 3126126)
You should be using strict for most programs (to improve reliability). In this case, osor's code fragment should be changed to use the defined() function.

As long as a nonzero value in the hash indicates membership, a simple truth test is sufficient, regardless of strictures (but defined and exists are surely useful as well).
Quote:

Originally Posted by matthewg42 (Post 3126126)
The problem with this approach is that you lose the ordering of the keys which you added to the set, and you do not store duplicates.

As I mentioned before, if you couple a hash with a scalar to hold the index, you have a data structure from which you can recover ordering information later on.

For example,
Code:

#!/usr/bin/perl -lw

use strict;

my @data = (1, {});

$data[1]{$_} = $data[0]++ for qw(you can add a bunch of strings as keys to the has like this);
$data[1]{$_} = $data[0]++ for qw(i can has cheezburger);

print join " ",
sort {$data[1]{$a}<=>$data[1]{$b}}
keys %{$data[1]};

In the above code, I don’t do any checks for assignment, so a repeated value will have the ordinal of its final appearance (e.g., in this case “can” appears after “i” rather than after “you”). If you want a repeated value to have the ordinal of its initial appearance, just throw in a simple check in the assignment (e.g., unless($data[1]{$_})).

osor 04-19-2008 03:09 PM

Yet another way to preserve order is by using a temporary hash at the end to filter repeats:
Code:

#!/usr/bin/perl -lw

use strict;

my @data;

push @data, qw(you can add a bunch of strings as keys to the has like this);
push @data, qw(i can has cheezburger);

print join " ", do{my %hash; grep !$hash{$_}++, @data};


xinelo 11-28-2008 04:36 AM

only one loop?
 
Thanks guys,

I could use some of your suggestions, but I was after something simpler.

I have a text with short lines. I chomp and push every new line in the text file into an array with a while(<F>). The ideal thing would be to check whether every new line exists already in the array while I'm still in the while loop.

Something like:


Code:

my (@text);
open F, "file.txt" or die $!;
while (<F>) {
        chomp;
        push(@text, $_) unless $_ IsSetIn @text;
}
close F;

The underlined part is not Perl, of course, that's the part I don't know how to do :D

But judging from your comments I guess there's no option but to push the text in the array and then open a new loop to compare.

Thanks a lot!
xinelo

xinelo 11-28-2008 04:37 AM

Sorry, I forgot to subscribe to the post ;)

Cheers!

Telemachos 11-28-2008 07:13 AM

A hash is the easiest way to do this, and it doesn't have to be very complex. Here's my input textfile:
Code:

this is a line
this is a line
this is a line
another line
something else
another line
foobar
something else
this is a line

As you can see, lots of repeats in no particular order. Here's my Perl script:
Code:

#!/usr/bin/perl
use strict;
use warnings;

my %lines;

open my $fh, '<', 'file'
  or die "Can't open file for reading: $!\n";

while (<$fh>) {
  chomp;
  $lines{$_} = 1 unless exists $lines{$_};
}

foreach (keys %lines) {
  print "$_\n";
}

And here's the output:
Code:

telemachus ~ $ perl hash
something else
another line
this is a line
foobar

As folks said: unique lines only, in no particular order. However, it doesn't sound like order matters much to your problem. So, it's a simple as that.

If, however, you want it slightly more complex, this version stores the line number that the hash key was first seen as the value of the hash. Then it uses that value later to sort the hash for output. But it's essentially the same script.
Code:

#!/usr/bin/perl
use strict;
use warnings;

my %lines;

open my $fh, '<', 'file'
  or die "Can't open file for reading: $!\n";

while (<$fh>) {
  chomp;
  $lines{$_} = $. unless exists $lines{$_};
}

close $fh;

foreach ( sort { $lines{$a} <=> $lines{$b} } keys %lines) {
  print "$_\n";
}

Sorted output:
Code:

telemachus ~ $ perl hash2
this is a line
another line
something else
foobar


Sergei Steshenko 11-28-2008 07:35 AM

Quote:

Originally Posted by ufmale (Post 3124444)
I am working on a perl script to store data in an array.
This array should not have any duplicated entries.
Is there a another data struture in perl i should use or is there a way to quickly check the entry in the array before adding a new data that may already exist.

Code:

my @entries;
my %entries;
...
while(defined(my $line = <$fh>))
  {
  chomp($line);
  unless(exists $entries{$line})
    {
    push @entries, $line;
    $entries{$line} = '';
    }
  } # while(defined(my $line = <$fh>))

Beware that at them moment you write $some_hash{$key} you create the $key, so be careful using hashes.

The above code is clean WRT hashes, i.e. keys are created as neede, not by accident.

changma_ha 07-14-2010 06:26 AM

newbie
 
hello all,
I am a newbie in perl.
i have a text file like:

a
2
b
4,5,6
c
d
e
45,657,-67

i want an output like
a => [2]
b =>[4,5,6]
c =>[]
d=> []
e=> [45,657,-67]

is it possible to make the text data into complex structure of hash array?.plz help me out asap.


All times are GMT -5. The time now is 10:20 PM.