LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-08-2017, 09:59 AM   #46
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211

That
Code:
$this_file =~ /^(.*?)-(\d+?)-(.*?)\.ext$/)
is from Laserbeak, and it's being used to test if the file in $this_file matches the pattern required to process it. You have (I think) correctly modified the regex to match the actual pattern [1 hyphen, mp4 extension].

If I understand what Laserbeak is doing (and I could be wrong about this), s/he is populating a hash with the values of the existing numbers in the file names, using the number as the key, and setting the value to 1.

Then sort the hash by the key values. Then iterate over the length of the hash and output the numbers that don't exist in the hash. I haven't tried it yet, but it should work. I admit to having some difficulty understanding/using hash processing. I'm going to play with this script for my own edification. I think there's also something for me to learn about capturing data from a regexp match. Fun!

My script captures the list of existing numbers in an array [@existnums], and captures the list of all numbers from 1 to entered value (270) in another array [@allnums]. Then iterate over the allnums array and set the entries that match existnums to 0...then print the numbers that are not 0. This is an adaptation of an application I wrote to draw cards from a tarot deck...as each card was drawn, it was saved into "drawn" array, then before the next card was drawn the drawn cards are removed from the "all cards" array -- so the app wouldn't draw the same card twice. My customer said the application gave her as random a draw as using the actual cards.

I'm glad you have what you need now. I hope we've demonstrated the value of learning regexp. Those same patterns could be applied using sed in a bash script, but I'm even fuzzier about bash arrays and hashes.
 
1 members found this post helpful.
Old 07-08-2017, 10:05 AM   #47
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by scasey View Post
If I understand what Laserbeak is doing (and I could be wrong about this), s/he is populating a hash with the values of the existing numbers in the file names, using the number as the key, and setting the value to 1.

Then sort the hash by the key values. Then iterate over the length of the hash and output the numbers that don't exist in the hash. I haven't tried it yet, but it should work. I admit to having some difficulty understanding/using hash processing. I'm going to play with this script for my own edification. I think there's also something for me to learn about capturing data from a regexp match. Fun!
I'm a "he" and you hit the nail on the head as far as the algorithm. In fact, that algorithm is used all the time in real-world Perl applications, I can't imagine how many times I've used something like that. You would be wise to learn it.

Last edited by Laserbeak; 07-08-2017 at 10:06 AM.
 
Old 07-08-2017, 10:23 AM   #48
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by Laserbeak View Post
I'm a "he" and you hit the nail on the head as far as the algorithm. In fact, that algorithm is used all the time in real-world Perl applications, I can't imagine how many times I've used something like that. You would be wise to learn it.
Just went over it. Very elegant. I'll add it to my library for sure. Thank you.
 
Old 07-08-2017, 10:30 AM   #49
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
@Laserbeak @scasey

well this exposure you two have given my to perl - I am now reading up on it, I am up to this page as of right now.
https://www.tutorialspoint.com/perl/perl_hashes.htm

it is not much different then bash per se' just the syntax for declaring vars and populating arrays is different, and they got some funny things one can do with it - don't know if BASH can do the same with arrays - I only use arrays sparely.

as soon as I get a little more into how to's I think I might rewrite one of my bash scripts in perl to see what I can do with it.


when I get done reading I'll go back over these two scripts to adsorb its contents better.
and that algorithm

though I still have not found this
my this and my that when declaring or calling for
Code:
my @sortedarray
which I find strange
but
big thanks for the help!

Last edited by BW-userx; 07-08-2017 at 10:39 AM.
 
1 members found this post helpful.
Old 07-08-2017, 11:15 AM   #50
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by BW-userx View Post
though I still have not found this
my this and my that when declaring or calling for
Code:
my @sortedarray
which I find strange
but
big thanks for the help!
It is basically declaring the variable in that scope. Without "my" all variables become global variables, and there can be conflicts, especially in large projects and those that use a lot of libraries.

When you put in "use strict;" Perl demands it or another way to specify variable scopes. That's really recommended for all code, but especially code that could be used for production.

Last edited by Laserbeak; 07-08-2017 at 11:16 AM.
 
Old 07-08-2017, 11:19 AM   #51
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by Laserbeak View Post
It is basically declaring the variable in that scope. Without "my" all variables become global variables, and there can be conflicts, especially in large projects and those that use a lot of libraries.

When you put in "use strict;" Perl demands it or another way to specify variable scopes. That's really recommended for all code, but especially code that could be used for production.
so keeps vars local - got to this page in Subroutine taking about sub and my for declaring functions.
https://www.tutorialspoint.com/perl/...ubroutines.htm

and looking at yours and the others script and watching Dr. Who -- multi tasking
 
Old 07-08-2017, 11:47 AM   #52
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by scasey View Post
Mayhaps, here's what I just ran...you'll need to change the $working_dir value againl
Code:
#!/usr/bin/perl
## ^^ set to location of your perl

$working_dir="/run/media/userx/250GB/NumberedFiles";
##~ $working_dir=".";   # testing

if ($ARGV[0]) {
	$max = $ARGV[0];   ## get max number from the command line
	$max++;
}
else {
	print "usage is $0 maxvalue";
}

## get list of files name in array  Names are in format of FileName-nnn-xxxxxxx.ext
@files=`find "$working_dir" -type f`;

## remove leading and trailing parts
foreach $file (@files) {
	$file =~ s/^.*?-//;    #remove from beginning to first hyphen
	$file =~ s/-.*$//;	#remove from second hyphen to end
	$existnums[$file]=$file;  #save what's left in array
}

## populate array with all the numbers: 1-input value 
for ($i = 1; $i < $max; $i++) {
    $allnums[$i] = $i;
}

## remove existing numbers from full list
foreach $nbr (@existnums) {
    $allnums[$nbr] = 0
}

## print out the remaining (i.e. missing) numbers
## note, no sorting required because the allnums array is populated in sequence
foreach $nbr  (@allnums) {
	if ($allnums[$nbr] ne 0) {      ## only print the entries that are not -0-
##~ 		print "$allnums[$nbr] ";   ## or 
		print "$allnums[$nbr]\n";  ## to do one per line
	}
}

print "\n";  ## when printing all on one line.
ok that wasn't working I fixed it by chaining the first stripping of the string

Code:
$file =~ s/.*-//; #removes everything up and including last hyphen
because it was still just taking out part of the beginning of the string and leaving part of it attached to the filename. there is a hyphen within the path too.

now both yours and Laserbeak works
yours
Code:
userx%slackwhere ⚡ scripts ⚡> ./perl-find-missing-numbers.pl 270
162
170
172
173
174
175
181
186
195
196
197
198
245
Laserbeak
Code:
userx%slackwhere ⚡ scripts ⚡> ./perl-number-list
162 is missing!
170 is missing!
172 is missing!
173 is missing!
174 is missing!
175 is missing!
181 is missing!
186 is missing!
195 is missing!
196 is missing!
197 is missing!
198 is missing!
245 is missing!
they both match! woo hoo!

Last edited by BW-userx; 07-08-2017 at 11:49 AM.
 
Old 07-08-2017, 12:20 PM   #53
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by Laserbeak View Post
It is basically declaring the variable in that scope. Without "my" all variables become global variables, and there can be conflicts, especially in large projects and those that use a lot of libraries.

When you put in "use strict;" Perl demands it or another way to specify variable scopes. That's really recommended for all code, but especially code that could be used for production.

Code:
#!/usr/bin/perl

use strict;
use warnings;

my $working_dir="/run/media/userx/3TB-External/Files-Resampled";

opendir(DIR, $working_dir) || die "Can't open $working_dir: $!\n";
  while( (my $filename = readdir(DIR))){
    push(my @files, $filename);
    print ("@files\n");
     
    }
closedir(DIR);
there probably is a better way to populate an array in perl but this worked and it is interesting nonetheless. push and pop - link list ?
 
Old 07-08-2017, 09:38 PM   #54
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
As schneidz and danielbmartin have already hinted
Code:
seq 1 270 > numbers.txt; ls -1 /run/media/userx/3TB-External/Files-Resampled/*.mp4 | rev | cut -d "." -f2 | cut -d "-" -f1 | rev | sort | comm -3 numbers.txt -
 
Old 07-08-2017, 10:25 PM   #55
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by allend View Post
As schneidz and danielbmartin have already hinted
Code:
seq 1 270 > numbers.txt; ls -1 /run/media/userx/3TB-External/Files-Resampled/*.mp4 | rev | cut -d "." -f2 | cut -d "-" -f1 | rev | sort | comm -3 numbers.txt -
I don't think I could fit all of that on my terminal.
 
Old 07-08-2017, 11:47 PM   #56
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
Quote:
Originally Posted by BW-userx View Post
there probably is a better way to populate an array in perl but this worked and it is interesting nonetheless. push and pop - link list ?
The only four ways I know is using push and doing this:

Code:
# TO ADD TO THE BACK

@thearray = (@thearray, $newelement);

# OR TO PUT AT THE FRONT

@thearray = ($newelement, @thearray);

#OR

unshift @thearray,  $newelement; # places $newelement as the first element of the array

I think push and unshift are faster than the equivalent manually creating a new array examples.

Actually there is another way, but for some reason it gives a warning:

Code:
#!/usr/bin/perl

use strict;
use warnings;

our @array = ();
for (1..10) {
  $array[$_] = $_;
}
print join("\n", @array) , "\n";
Also see that I used "our" instead of "my", that is how to specifically define a global variable, put it at the top of the file of all your files in a projecr and they will share that variable.

Last edited by Laserbeak; 07-09-2017 at 12:26 AM. Reason: Better print
 
Old 07-09-2017, 12:33 AM   #57
Laserbeak
Member
 
Registered: Jan 2017
Location: Manhattan, NYC NY
Distribution: Mac OS X, iOS, Solaris
Posts: 508

Rep: Reputation: 143Reputation: 143
I meant to edit this into the last post, but it ended up as a new post.
This is a very simple example how "our" works.


Code:
#!/usr/bin/perl

use strict;
use warnings;

our @array = ();
for (1..10) {
  $array[$_] = $_;
}

require "printest.pl";
exit(0);

#-------------------
#PRINTEST.PL TEXT:

#!/usr/bin/perl

use strict;
use warnings;

our @array;

print join ("\n", @array), "\n";

# OUTPUT

1
2
3
4
5
6
7
8
9
10

Last edited by Laserbeak; 07-09-2017 at 10:38 AM.
 
Old 07-09-2017, 04:48 PM   #58
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
There are also excellent tutorials at http://learn.perl.org/
 
Old 07-09-2017, 06:20 PM   #59
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by scasey View Post
There are also excellent tutorials at http://learn.perl.org/
bookmarked
thanks
 
Old 07-10-2017, 07:28 PM   #60
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 634

Rep: Reputation: 316Reputation: 316Reputation: 316Reputation: 316
This appears to have been [SOLVED] and the focus has been on perl.. but for good fun I implemented it in python3.
At the heart is the set.difference(set) comparison, that quickly identifies what is the difference between the two sets (the range from 1 to max files found matching regex and all available found files).
It also auto-detects the maximum range needed. This is not a perfect design since what happens if the topmost file is deleted? It won't be detected. However, this now allows you to check n directories with n files.
I tested it on 4690 directories with 999 files of each (max allowed). Some ended up being empty (pushed some kind of limit )

Code:
for i in {1..4690}; do mkdir $i_a; done
for i in */; do touch "$i"/file-{000..999}-adfg324.ext; done
for i in */; do rm $i/file-$(echo $RANDOM | cut -b1-3)-adfg324.ext; done # Randomly remove a file from each directory
Code:
$ time ./numbersequencer.py number_testing/*/ # total 4690 directories
Missing Files in /home/user/number_testing/1000_a:
239
Missing Files in /home/user/number_testing/1001_a:
223
Missing Files in /home/user/number_testing/1002_a:
882
...
real	0m8.508s
Code:
$ rm 2407_a/file-{497,987,999}-adfg324.ext
$ ./numbersequencer.py number_testing/2407_a/
Missing Files in /home/user/number_testing/2407_a:
497
987
Code:
#!/usr/bin/env python3

from __future__ import print_function, division

import argparse
import os
import re
import sys

class SequencyConsistency():
    def __init__(self, sequence, regex_def_grp=None):
        if regex_def_grp:
            self.regex_compiled, self.regex_group_num = regex_def_grp
        else:
            self.regex_compiled, self.regex_group_num = (re.compile('(\d+)'), 1)

        self.sequence = sequence
        self.missing_sequencies = self.__missing_number_sequence()

    def __missing_number_sequence(self):
        self.matches = {str(self.regex_compiled.search(sequence).group(self.regex_group_num))
                        for sequence in self.sequence
                        if self.regex_compiled.search(sequence) != None}

        if self.matches:
            # Generate set of numbers from 1 to highest regex detected
            try:
                max_range = max(int(match) for match in self.matches)
            except ValueError:
                print('Regex must match a integer.', end='\n\n')
                raise

            full_range = set(str(num).rjust(len(str(max_range)), '0') 
                             for num in range(1, max_range + 1))
            return full_range.difference(self.matches)
        
    def print_missing_sequencies(self, reverse=False):
        if getattr(self, 'missing_sequencies', None):
            return sorted(self.__missing_number_sequence(), reverse=reverse)


class DirectoryConsistency(SequencyConsistency):
    def __init__(self, directory, regex_def_grp=None):
        try:
            self.sequence = os.listdir(directory)
        except (FileNotFoundError, NotADirectoryError, PermissionError) as err:
            # Exit class if directory indicated is incorrect
            print(err, file=sys.stderr)
            return None

        # Inherit from SequenceConsistency
        SequencyConsistency.__init__(self, self.sequence, regex_def_grp)

def main():
    # Create commandline flags
    parser = argparse.ArgumentParser(description='Detect missing portion of sequency in given sequency')
    parser.add_argument('-e', '--regexp',
            type=str,
            required=False,
            default='(\d+)',
            help='Set regex. Defaults to (\d+)')
    parser.add_argument('-g', '--group',
            type=int,
            required=False,
            default=1,
            help='Set regex matching group to identify sequence. --group 3 will match (\d+) in regexp of (\w)(-)(\d+). Each () identifies a group')
    args, other_args = parser.parse_known_args()

    directories = other_args if other_args else sys.stdin
    for directory in directories:
        current_dir = DirectoryConsistency(directory.strip(),
                                           regex_def_grp=(re.compile(args.regexp), args.group))

        missing_in_dir = current_dir.print_missing_sequencies()

        if missing_in_dir:
            print('Missing from {directory}:'.format(directory=repr(os.path.abspath(directory.strip()))))
            print('{sequences}'.format(sequences='\n'.join(seq for seq in missing_in_dir)))

if __name__ == '__main__':
    main()

Last edited by Sefyir; 07-14-2017 at 10:07 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Replace text string with sequential numbers inside a textfile K-Veikko Programming 3 04-07-2013 03:23 AM
[SOLVED] find the total of numbers that are higher than x in a text file with numbers (using awk??) Mike_V Programming 12 11-24-2010 09:51 AM
[SOLVED] Replace sequential numbers in a file with a different sequence using sed thefiend Linux - Newbie 6 04-12-2010 10:29 PM
HOWTO convert a group of files in a directory to a set of sequential numbers? lleb Linux - General 7 12-24-2009 07:02 PM
sequence of numbers, how to extract which numbers are missing jonlake Programming 13 06-26-2006 03:28 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:23 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration