LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-17-2008, 02:20 PM   #1
ufmale
Member
 
Registered: Feb 2007
Posts: 385

Rep: Reputation: 30
Perl .check if data are exist in the array before adding new data


I am working on a perl script to store data in an array.
This array should not have any duplicated entries.
Is there a another data struture in perl i should use or is there a way to quickly check the entry in the array before adding a new data that may already exist.
 
Old 04-17-2008, 03:07 PM   #2
krizzz
Member
 
Registered: Oct 2004
Location: NY
Distribution: Slackware
Posts: 198

Rep: Reputation: 30
If you are to lazy to write a simple loop you could use grep.
 
Old 04-17-2008, 03:38 PM   #3
indienick
Senior Member
 
Registered: Dec 2005
Location: London, ON, Canada
Distribution: Arch, Ubuntu, Slackware, OpenBSD, FreeBSD
Posts: 1,853

Rep: Reputation: 65
Code:
#!/usr/bin/perl -w

# Assuming an array called '@vals' has been created
#Assuming '$x' has been assigned a value

my $addp = 0;

foreach $i (@val)
  if ($i == $x && $addp != 0) $addp = 0;

# To add it to the right of the array
push(@vals, $x) if ($addp == 1);

# Or to add it to the left of the array
unshift(@vals, $x) if ($addp == 1);
Please be warned, I have NOT written in Perl for a very long time (I generally write in Common Lisp and Java). I couldn't remember Perl's boolean operators (after years of Lisp's T or NIL), and I believe unshift() is the complement to push().

Last edited by indienick; 04-17-2008 at 03:39 PM.
 
Old 04-18-2008, 04:24 PM   #4
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by ufmale View Post
Is there a another data struture in perl i should use
A good choice of data structure here will save you some extra work later on. Usually, when one is looking for a data structure with uniqueness in its entries, you use a hash in Perl (with the data in question as the hash key). This makes it very easy to check if the data is already in the hash:
Code:
if($hash{$data})
But in most cases you don’t even have to check since assigning to the same key will overwrite the previous entry (leaving you with only one entry)

Unfortunately, it will also destroy any ordering implicit in using an array. There are workarounds for this. For example, having the values of the hash correspond to an ordinal which would have been the index of an array holding the same data except with multiple entries (this is where determining whether to check or not to check determines the final order of a repeated entry—if you check, the order will correspond to the initial position of any repeated data in your input; if you don’t check the order will correspond to the final position of any repeated data in your input). Sequential access is pretty slow (since you have to sort the hash keys by value whenever you need to access the ‘nth’ one).

So it basically comes down to this: which is more important in your situation—order preservation or uniqueness of entries? If uniqueness of entries, use hashes.
 
Old 04-19-2008, 07:21 AM   #5
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,287

Rep: Reputation: 173Reputation: 173
you should use a hash if you don't want duplicates.
 
Old 04-19-2008, 07:48 AM   #6
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
osor's point is a good one. Unless I know for certain that a data set will be very small, I use hashes to test for membership of a set. You should be using strict for most programs (to improve reliability). In this case, osor's code fragment should be changed to use the defined() function. e.g.

Code:
#!/usr/bin/perl

use strict;
use warnings;

my %set_of_stuff;

foreach my $key (qw(you can add a bunch of strings as keys to the has like this)) {
    $set_of_stuff{$key} = 1;
}

# Now test to see if some words are in the set_of_stuff
foreach my $key (qw(i can has cheezeburger)) {
    if (defined($set_of_stuff{$key})) {  print "FOUND:     $key\n"; }
    else { print "NOT FOUND: $key\n"; }
}
The problem with this approach is that you lose the ordering of the keys which you added to the set, and you do not store duplicates. You can set the value of the has for a given key to the number of occurrences by incrementing it each time you add a key of a given value... this way storing the number of duplicates, but you still lose the order.
 
Old 04-19-2008, 01:59 PM   #7
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by matthewg42 View Post
You should be using strict for most programs (to improve reliability). In this case, osor's code fragment should be changed to use the defined() function.
As long as a nonzero value in the hash indicates membership, a simple truth test is sufficient, regardless of strictures (but defined and exists are surely useful as well).
Quote:
Originally Posted by matthewg42 View Post
The problem with this approach is that you lose the ordering of the keys which you added to the set, and you do not store duplicates.
As I mentioned before, if you couple a hash with a scalar to hold the index, you have a data structure from which you can recover ordering information later on.

For example,
Code:
#!/usr/bin/perl -lw

use strict;

my @data = (1, {});

$data[1]{$_} = $data[0]++ for qw(you can add a bunch of strings as keys to the has like this);
$data[1]{$_} = $data[0]++ for qw(i can has cheezburger);

print join " ",
sort {$data[1]{$a}<=>$data[1]{$b}}
keys %{$data[1]};
In the above code, I don’t do any checks for assignment, so a repeated value will have the ordinal of its final appearance (e.g., in this case “can” appears after “i” rather than after “you”). If you want a repeated value to have the ordinal of its initial appearance, just throw in a simple check in the assignment (e.g., unless($data[1]{$_})).
 
Old 04-19-2008, 02:09 PM   #8
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Yet another way to preserve order is by using a temporary hash at the end to filter repeats:
Code:
#!/usr/bin/perl -lw

use strict;

my @data;

push @data, qw(you can add a bunch of strings as keys to the has like this);
push @data, qw(i can has cheezburger);

print join " ", do{my %hash; grep !$hash{$_}++, @data};

Last edited by osor; 04-19-2008 at 02:25 PM. Reason: use do block
 
Old 11-28-2008, 03:36 AM   #9
xinelo
LQ Newbie
 
Registered: Dec 2003
Location: Galiza
Posts: 13

Rep: Reputation: 0
only one loop?

Thanks guys,

I could use some of your suggestions, but I was after something simpler.

I have a text with short lines. I chomp and push every new line in the text file into an array with a while(<F>). The ideal thing would be to check whether every new line exists already in the array while I'm still in the while loop.

Something like:


Code:
my (@text);
open F, "file.txt" or die $!;
while (<F>) {
	chomp;
	push(@text, $_) unless $_ IsSetIn @text; 
}
close F;
The underlined part is not Perl, of course, that's the part I don't know how to do

But judging from your comments I guess there's no option but to push the text in the array and then open a new loop to compare.

Thanks a lot!
xinelo
 
Old 11-28-2008, 03:37 AM   #10
xinelo
LQ Newbie
 
Registered: Dec 2003
Location: Galiza
Posts: 13

Rep: Reputation: 0
Sorry, I forgot to subscribe to the post

Cheers!
 
Old 11-28-2008, 06:13 AM   #11
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 59
A hash is the easiest way to do this, and it doesn't have to be very complex. Here's my input textfile:
Code:
this is a line
this is a line
this is a line
another line
something else
another line
foobar
something else
this is a line
As you can see, lots of repeats in no particular order. Here's my Perl script:
Code:
#!/usr/bin/perl
use strict;
use warnings;

my %lines;

open my $fh, '<', 'file'
  or die "Can't open file for reading: $!\n";

while (<$fh>) {
  chomp;
  $lines{$_} = 1 unless exists $lines{$_};
}

foreach (keys %lines) {
  print "$_\n";
}
And here's the output:
Code:
telemachus ~ $ perl hash 
something else
another line
this is a line
foobar
As folks said: unique lines only, in no particular order. However, it doesn't sound like order matters much to your problem. So, it's a simple as that.

If, however, you want it slightly more complex, this version stores the line number that the hash key was first seen as the value of the hash. Then it uses that value later to sort the hash for output. But it's essentially the same script.
Code:
#!/usr/bin/perl
use strict;
use warnings;

my %lines;

open my $fh, '<', 'file'
  or die "Can't open file for reading: $!\n";

while (<$fh>) {
  chomp;
  $lines{$_} = $. unless exists $lines{$_};
}

close $fh;

foreach ( sort { $lines{$a} <=> $lines{$b} } keys %lines) {
  print "$_\n";
}
Sorted output:
Code:
telemachus ~ $ perl hash2
this is a line
another line
something else
foobar

Last edited by Telemachos; 11-28-2008 at 06:21 AM.
 
Old 11-28-2008, 06:35 AM   #12
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by ufmale View Post
I am working on a perl script to store data in an array.
This array should not have any duplicated entries.
Is there a another data struture in perl i should use or is there a way to quickly check the entry in the array before adding a new data that may already exist.
Code:
my @entries;
my %entries;
...
while(defined(my $line = <$fh>))
  {
  chomp($line);
  unless(exists $entries{$line})
    {
    push @entries, $line;
    $entries{$line} = '';
    }
  } # while(defined(my $line = <$fh>))
Beware that at them moment you write $some_hash{$key} you create the $key, so be careful using hashes.

The above code is clean WRT hashes, i.e. keys are created as neede, not by accident.
 
Old 07-14-2010, 05:26 AM   #13
changma_ha
LQ Newbie
 
Registered: Jul 2010
Posts: 1

Rep: Reputation: 0
newbie

hello all,
I am a newbie in perl.
i have a text file like:

a
2
b
4,5,6
c
d
e
45,657,-67

i want an output like
a => [2]
b =>[4,5,6]
c =>[]
d=> []
e=> [45,657,-67]

is it possible to make the text data into complex structure of hash array?.plz help me out asap.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sending Raid Array Data through postfix ajferrari Linux - Newbie 2 01-12-2007 11:33 PM
How to extract data and for loop it into an array? (shell) WeiSomething Programming 6 11-17-2006 03:27 AM
how to write an image from a data array aw_wolfe Programming 4 05-05-2005 05:53 PM
Help accessing data on NTFS raid array Qwindelzorf Linux - Hardware 2 01-15-2005 02:34 PM
How to check if array[0] doesn't exist? flamesrock Programming 3 11-05-2004 10:36 AM


All times are GMT -5. The time now is 01:55 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration