LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-23-2013, 01:09 PM   #1
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Rep: Reputation: 6
Angry Perl Array to remove duplicates


I have tried several different methods and cannot seem to figure this out. I have removed duplicate entries from arrays before but for some reason it will not remove the duplicate arrays.

I am trying to get the file size of every file in a directory and then remove all duplicate entries and thus should be left with four or five file sizes that I will then use later in my script.

Here are parts of the code:

Quote:
@files = `ls $indir`; # get files in the directory into array

foreach $files (@files)
{
chomp $files;
if ($files)
{
$filesize = -s $files;

@na = $filesize;
@uniq = uniq @na;
print "@uniq\n";
When I print the @uniq array I get the same output as I do by just printing the @na array. Any thoughts as to why the uniq is not stripping out the duplicate entries? I would expect the output below to just be three numbers.

Quote:
Output:
4807088
4807088
4807088
4807088
4807088
4807088
4807088
4807088
4807088
57683648
57683648
57683648
57683648
57683648
57683648
57683648
20
 
Old 01-23-2013, 02:06 PM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941
For all Perl-related questions, I suggest you pay a visit to http://www.perlmonks.org, where you will almost-instantly obtain an answer to all questions like these.

Since I haunt both places, I can say that by far the easiest way to do this is with a hash ... an associative array. Use a statement such as:
%myhash->{$filesize} = 1;

The only purpose of this statement is to define a key corresponding to the $filesize ... you don't care at all about the value assigned (which happens to be "1"). What you do know, however, is that every key in a hash is unique. Therefore, after you have processed all your files, you can now iterate through this hash with something like:
foreach my $size (keys (%myhash)) { ... }

The loop will iterate through the list of keys that exist ... each one of these keys corresponds to a file-size that was encountered, and occurs only once. Q.E.D.

Perl is a very rich and expressive language (despite its warts, which everybody knows to tolerate), with an enormous library of tested packages in its so-called CPAN library ... including packages to iterate through file-directories and so on.

You will in time discover why Perl is referred to as "the Swiss ArmyŽ Knife of practical programming." Take the time necessary to really get to know this tool in particular.

Last edited by sundialsvcs; 01-23-2013 at 02:10 PM.
 
Old 01-23-2013, 02:24 PM   #3
fl0
Member
 
Registered: May 2010
Location: Germany
Distribution: Slackware
Posts: 105

Rep: Reputation: 34
Hi,

uniq is no perl function, where do you get the function? From List::MoreUtils ?

Hint: try
Code:
perldoc -q duplicate
regards fl0

Last edited by fl0; 01-23-2013 at 02:28 PM.
 
1 members found this post helpful.
Old 01-23-2013, 02:34 PM   #4
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Original Poster
Rep: Reputation: 6
@sundialsvcs - thanks for the reply. I have not had the need to use keys yet so bare with me if I have more questions. Please and thank you!

@fl0 - Yes I am using the following:

Quote:
use List::MoreUtils qw(uniq);
use Data:umper qw(Dumper);
 
Old 01-23-2013, 02:41 PM   #5
fl0
Member
 
Registered: May 2010
Location: Germany
Distribution: Slackware
Posts: 105

Rep: Reputation: 34
ok if i running these test code, it works, can you post more of your code? Maybe you need to move the uniq out of the foreach loop

Code:
 
#!/usr/bin/perl

use strict;
use warnings;
use List::MoreUtils qw( uniq );

my @files = qw(a a a b b m j h g);

foreach my $file (@files){

    print "$file\n";

}

my @uniq_file = uniq @files;

print "@uniq_file\n";
~

Last edited by fl0; 01-23-2013 at 02:52 PM.
 
1 members found this post helpful.
Old 01-23-2013, 02:54 PM   #6
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Original Poster
Rep: Reputation: 6
plugging away at this but first run gave me this message after adding my %hash->{$filesize} = 1;

Quote:
Using a hash as a reference is deprecated at /usr/local/bin/./segdcat.pl line
52 (#1)
(D deprecated) You tried to use a hash as a reference, as in
%foo->{"bar"} or %$ref->{"hello"}. Versions of perl <= 5.6.1
used to allow this syntax, but shouldn't have. It is now deprecated, and will
be removed in a future version.
 
Old 01-23-2013, 02:56 PM   #7
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Original Poster
Rep: Reputation: 6
Code:

Quote:
#!/usr/bin/perl

#######################
### Script Settings ###
#######################
use diagnostics;
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data:umper qw(Dumper);

########################
### Variables ###
########################
my $filebeg = "cont";
my $filext = ".sgd";
my ($indir, $filename, $startf, $endf, $start, $end, $files);
my ($zeros, $outdir, $outdisk, $filesize, $filesizechk, $filenum);
my (@files, @na, @uniq);
my $size1 = "size1.txt";
my $size2 = "size2.txt";
my $oddballs = "oddballs.txt";

####################
### Main Script ###
####################

### Get Folder name that the SGD files are in ###
$indir = "/home/prouser/concatdir/indir";

### Get the Folder name that you want to output Concats to ###
$outdir = "/home/prouser/concatdir/outdir";

### Get the starting file number without zeros, cont and .sgd ###
#print "Enter Starting File # (ex: cont000001.sgd): ";
#chomp ($filename = <stdin>);
$filename = "cont000001.sgd";

### Check the size of the first file in the range ###
$filesizechk = -s "$indir/$filename";

### Put all the files in the range into an array to work with ###
@files = `ls $indir`;

foreach $files (@files)
{
chomp $files; ### Chomp off the carriage return ###
if ($files)
{
$filesize = -s $files; ### Get the filesize of each file in the range ###

if ($filesize == $filesizechk) ### Check if the filesize matches first filesize in range ###
{
### Print file names to a file for use in ProMAX ###
#print "size1 ::: $files\n";
#`ls $files >> $outdir/$size1`;
}
elsif ($filesize != $filesizechk)
{
### Print file names to a file for use in ProMAX ###
#print "size2 ::: $files\n";
#`ls $files >> $outdir/$size2`;
}
else
{
### Print oddball sized files to oddball file ###
#print "oddballs ::: $files\n";
#`ls $files >> $outdir/$oddballs`;
}
}
}
#####################
### End of Script ###
#####################

Last edited by d072330; 01-23-2013 at 03:01 PM.
 
Old 01-23-2013, 02:56 PM   #8
fl0
Member
 
Registered: May 2010
Location: Germany
Distribution: Slackware
Posts: 105

Rep: Reputation: 34
can u please post the complete code?

EDIT: ok i am to slow

Last edited by fl0; 01-23-2013 at 02:58 PM.
 
Old 01-23-2013, 02:57 PM   #9
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Original Poster
Rep: Reputation: 6
@fl0 - I did try that in a separate script already and it worked so I have been thinking the same as you that it may need to moved outside of the loop or something.
 
Old 01-23-2013, 03:10 PM   #10
fl0
Member
 
Registered: May 2010
Location: Germany
Distribution: Slackware
Posts: 105

Rep: Reputation: 34
ok so your problem is solved?
 
Old 01-23-2013, 03:16 PM   #11
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Original Poster
Rep: Reputation: 6
nope!
 
Old 01-23-2013, 03:21 PM   #12
fl0
Member
 
Registered: May 2010
Location: Germany
Distribution: Slackware
Posts: 105

Rep: Reputation: 34
can u post the correct script? i can not find the uniq section i your posted script, and can you describe what exactly do you want to do?

Last edited by fl0; 01-23-2013 at 03:26 PM.
 
Old 01-23-2013, 03:36 PM   #13
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Original Poster
Rep: Reputation: 6
add this under $filesize = -s $files;

Quote:
@na = $filesize;
@uniq = uniq @na;
print "@uniq\n";
I had taken it out because it reproduced the same results as print $filesize.
 
Old 01-23-2013, 03:40 PM   #14
d072330
Member
 
Registered: Nov 2007
Location: USA
Distribution: CentOS 5/6
Posts: 186

Original Poster
Rep: Reputation: 6
I need to sort a 1 TB drive by file size and then output the file names into a text file 9000 at a time. The program we use will only accept 9000 files at one time.

Usually there is only about 3 different file sizes. If I can get the file sizes correctly output to a variable I can then correctly format my if statements to sort by 9000 files at a time to the output text files.

The current disk I am working on is has 32000 files in it and it has 4 different file sizes.

Clear as mud? LOL
 
Old 01-23-2013, 03:41 PM   #15
fl0
Member
 
Registered: May 2010
Location: Germany
Distribution: Slackware
Posts: 105

Rep: Reputation: 34
ok to understand what you are doing:

why you need to uniq the array? are files with the same size in the diretory?


here is a quick version for the first part of your script, not tested, but much easier and should do what you want (hopfully).



Code:
#!/usr/bin/perl

use strict;
use warnings;

use List::MoreUtils qw( uniq );
use Data::Dumper;

my $input_dir = '/home/prouser/concatdir/indir';

#get all entrys in the directory
my @files = glob "$input_dir/*.sgd";

my %file_sizes;

foreach my $file ( @files ){

      #add the filename as key and the size as value
  
      $filesizes{ $file } = -s $file if not -z $file;

}


print Dumper(%filesizes);
With this code you can sort / uniq the hash by values, and iterate over the hash with count 9000..

EDIT: ok i removed the first_file_check, is not working

Last edited by fl0; 01-23-2013 at 04:02 PM.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
perl duplicates hash array casperdaghost Programming 4 09-25-2012 03:14 PM
Remove duplicates from file sandikaxp Linux - Newbie 17 06-21-2012 06:58 PM
[SOLVED] [perl] array of files: remove specific ext Angel2953 Programming 2 01-02-2012 06:51 PM
I want to keep the duplicates not remove them! ieatbunnies Linux - Software 1 01-17-2011 12:18 PM
MySQL: How-to Surgically Remove Duplicates mchirico LinuxQuestions.org Member Success Stories 0 06-11-2004 10:53 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration