Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have tried several different methods and cannot seem to figure this out. I have removed duplicate entries from arrays before but for some reason it will not remove the duplicate arrays.
I am trying to get the file size of every file in a directory and then remove all duplicate entries and thus should be left with four or five file sizes that I will then use later in my script.
Here are parts of the code:
Quote:
@files = `ls $indir`; # get files in the directory into array
When I print the @uniq array I get the same output as I do by just printing the @na array. Any thoughts as to why the uniq is not stripping out the duplicate entries? I would expect the output below to just be three numbers.
For all Perl-related questions, I suggest you pay a visit to http://www.perlmonks.org, where you will almost-instantly obtain an answer to all questions like these.
Since I haunt both places, I can say that by far the easiest way to do this is with a hash ... an associative array. Use a statement such as:
%myhash->{$filesize} = 1;
The only purpose of this statement is to define a key corresponding to the $filesize ... you don't care at all about the value assigned (which happens to be "1"). What you do know, however, is that every key in a hash is unique. Therefore, after you have processed all your files, you can now iterate through this hash with something like:
foreach my $size (keys (%myhash)) { ... }
The loop will iterate through the list of keys that exist ... each one of these keys corresponds to a file-size that was encountered, and occurs only once. Q.E.D.
Perl is a very rich and expressive language (despite its warts, which everybody knows to tolerate), with an enormous library of tested packages in its so-called CPAN library ... including packages to iterate through file-directories and so on.
You will in time discover why Perl is referred to as "the Swiss ArmyŽ Knife of practical programming." Take the time necessary to really get to know this tool in particular.
Last edited by sundialsvcs; 01-23-2013 at 02:10 PM.
ok if i running these test code, it works, can you post more of your code? Maybe you need to move the uniq out of the foreach loop
Code:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw( uniq );
my @files = qw(a a a b b m j h g);
foreach my $file (@files){
print "$file\n";
}
my @uniq_file = uniq @files;
print "@uniq_file\n";
~
plugging away at this but first run gave me this message after adding my %hash->{$filesize} = 1;
Quote:
Using a hash as a reference is deprecated at /usr/local/bin/./segdcat.pl line
52 (#1)
(D deprecated) You tried to use a hash as a reference, as in
%foo->{"bar"} or %$ref->{"hello"}. Versions of perl <= 5.6.1
used to allow this syntax, but shouldn't have. It is now deprecated, and will
be removed in a future version.
#######################
### Script Settings ###
#######################
use diagnostics;
use strict;
use warnings;
use List::MoreUtils qw(uniq);
use Data:umper qw(Dumper);
########################
### Variables ###
########################
my $filebeg = "cont";
my $filext = ".sgd";
my ($indir, $filename, $startf, $endf, $start, $end, $files);
my ($zeros, $outdir, $outdisk, $filesize, $filesizechk, $filenum);
my (@files, @na, @uniq);
my $size1 = "size1.txt";
my $size2 = "size2.txt";
my $oddballs = "oddballs.txt";
####################
### Main Script ###
####################
### Get Folder name that the SGD files are in ###
$indir = "/home/prouser/concatdir/indir";
### Get the Folder name that you want to output Concats to ###
$outdir = "/home/prouser/concatdir/outdir";
### Get the starting file number without zeros, cont and .sgd ###
#print "Enter Starting File # (ex: cont000001.sgd): ";
#chomp ($filename = <stdin>);
$filename = "cont000001.sgd";
### Check the size of the first file in the range ###
$filesizechk = -s "$indir/$filename";
### Put all the files in the range into an array to work with ###
@files = `ls $indir`;
foreach $files (@files)
{
chomp $files; ### Chomp off the carriage return ###
if ($files)
{
$filesize = -s $files; ### Get the filesize of each file in the range ###
if ($filesize == $filesizechk) ### Check if the filesize matches first filesize in range ###
{
### Print file names to a file for use in ProMAX ###
#print "size1 ::: $files\n";
#`ls $files >> $outdir/$size1`;
}
elsif ($filesize != $filesizechk)
{
### Print file names to a file for use in ProMAX ###
#print "size2 ::: $files\n";
#`ls $files >> $outdir/$size2`;
}
else
{
### Print oddball sized files to oddball file ###
#print "oddballs ::: $files\n";
#`ls $files >> $outdir/$oddballs`;
}
}
}
#####################
### End of Script ###
#####################
@fl0 - I did try that in a separate script already and it worked so I have been thinking the same as you that it may need to moved outside of the loop or something.
I need to sort a 1 TB drive by file size and then output the file names into a text file 9000 at a time. The program we use will only accept 9000 files at one time.
Usually there is only about 3 different file sizes. If I can get the file sizes correctly output to a variable I can then correctly format my if statements to sort by 9000 files at a time to the output text files.
The current disk I am working on is has 32000 files in it and it has 4 different file sizes.
why you need to uniq the array? are files with the same size in the diretory?
here is a quick version for the first part of your script, not tested, but much easier and should do what you want (hopfully).
Code:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw( uniq );
use Data::Dumper;
my $input_dir = '/home/prouser/concatdir/indir';
#get all entrys in the directory
my @files = glob "$input_dir/*.sgd";
my %file_sizes;
foreach my $file ( @files ){
#add the filename as key and the size as value
$filesizes{ $file } = -s $file if not -z $file;
}
print Dumper(%filesizes);
With this code you can sort / uniq the hash by values, and iterate over the hash with count 9000..
EDIT: ok i removed the first_file_check, is not working
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.