LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-04-2018, 12:11 PM   #1
Entropy1024
Member
 
Registered: Dec 2012
Location: UK
Distribution: Ubuntu 16 & 17
Posts: 131

Rep: Reputation: Disabled
For loop from a certain point


I'm performing a test on some files in a directory. This test is run every minute and there are lots & lots of files in the directory, many more appearing as time goes on.

My script VERY simplified looks like this:

Code:
for f in $FILES
    do
       #Do stuff
    done
Problem is that as this folder grows ALL the files in the directory are processed each time, which is very wasteful on time and resources.
What I would like to do is get it to start from where it last left off.

How can I go about this goal?

Thanks for any help.

Last edited by Entropy1024; 09-04-2018 at 12:18 PM.
 
Old 09-04-2018, 12:24 PM   #2
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Is there some progression in the file names (e.g. number or date increment)?

e.g.
file1
file2
file3
...
file2900000

or:
file201809040701
file201809040721
file201809040741
...
file201903312301

If so you could base it on that naming progression. Otherwise you could do it based on a "find" output based on creation time or access time. Save that time from one run into a file and have the next run read that saved time as the basis for its time.
 
Old 09-04-2018, 12:42 PM   #3
Entropy1024
Member
 
Registered: Dec 2012
Location: UK
Distribution: Ubuntu 16 & 17
Posts: 131

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by MensaWater View Post
Is there some progression in the file names (e.g. number or date increment)?

e.g.
file1
file2
file3
...
file2900000

or:
file201809040701
file201809040721
file201809040741
...
file201903312301

If so you could base it on that naming progression. Otherwise you could do it based on a "find" output based on creation time or access time. Save that time from one run into a file and have the next run read that saved time as the basis for its time.
Yes the files all have date and time in them like this:
DomeCCTV_216576543_20180904183710574_MOTION_DETECTION.jpg

I know I can find the last image using:
LASTIMAGE=$(ls | tail -1)
But don't know how to pass this to the loop to start from that file.
 
Old 09-04-2018, 12:47 PM   #4
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Moving this to the Programming forum to gain it better exposure.

Perhaps using date or last modified on flag.

But it depends on how the FILES list is constructed.

Memory of when the script was last run could be as simple as a specific file, visible or hidden, which you touch each time to pinpoint the last time+day when you've processed the directory.
 
Old 09-04-2018, 03:20 PM   #5
Beryllos
Member
 
Registered: Apr 2013
Location: Massachusetts
Distribution: Debian
Posts: 529

Rep: Reputation: 319Reputation: 319Reputation: 319Reputation: 319
I have done something like that in bash, by storing a list of already-processed files and using diff to find the new files.

Example:
Code:
# list all files in directory which match the pattern
ls *.jpg > allfiles.txt

# create oldfiles.txt if it does not already exist
touch oldfiles.txt

# find files in allfiles.txt which are not in oldfiles.txt, create newfiles.txt
diff allfiles.txt oldfiles.txt | grep "<" | cut -d " " -f 2 > newfiles.txt

# process list
for item in $(cat newfiles.txt)
do
    # do stuff
    
    # append filename to oldfiles.txt
    echo $item >> oldfiles.txt
done

# sort oldfiles.txt (necessary if files are created out of ls order)
sort oldfiles.txt > temp.txt
mv temp.txt oldfiles.txt
 
Old 09-04-2018, 03:45 PM   #6
Beryllos
Member
 
Registered: Apr 2013
Location: Massachusetts
Distribution: Debian
Posts: 529

Rep: Reputation: 319Reputation: 319Reputation: 319Reputation: 319
Since you are running the script every minute, you may wonder what happens if the run time is more than one minute. This could happen if there are many new files since the last script execution. You would end up with two or more simultaneously running scripts, which would duplicate the processing and leave duplicate entries in the list of old files. There is even a small chance it could garble the file lists. Not good!

The way I avoid this is by running a test at the beginning of the script:

Example:
Code:
# check and exit if task is already running

# extract script name (the bit after the last slash character)
thiscom=$(echo $0 | rev | cut -d'/' -f 1 | rev)

# count processes associated with that name (plus one for the final newline, apparently)
if [ $(ps -C "$thiscom" -o comm= | wc -l) -gt 2 ]
then
    # exit due to duplicate process
    exit
fi

# here start the file listing and processing...

Last edited by Beryllos; 09-04-2018 at 03:47 PM.
 
Old 09-04-2018, 04:20 PM   #7
Entropy1024
Member
 
Registered: Dec 2012
Location: UK
Distribution: Ubuntu 16 & 17
Posts: 131

Original Poster
Rep: Reputation: Disabled
Originally I had the FILES list created by using:
FILES=/home/tim/Videos/domecctv/$DATENOW/*.jpg

Now I'm using:
FILES=$(ls /home/tim/Videos/domecctv/$DATENOW/*.jpg | tail -40)

Therefore instead of running through the entire list of possible 100+ images it just reads the last 40 every minute.
Not perfect but more efficient.
 
Old 09-04-2018, 05:12 PM   #8
lougavulin
Member
 
Registered: Jul 2018
Distribution: Slackware,x86_64,current
Posts: 279

Rep: Reputation: 100Reputation: 100
Well, there is also something like this :
Code:
for f in $(find /home/tim/Videos/domecctv/ -newer ${HOME}/tmp/videos.ref -print)
do
  #Do stuff
done
touch ${HOME}/tmp/videos.ref
Don't have to be ${HOME}/tmp/videos.ref, choose a place and filename convenient.

The first time will take every files, so you should probably do a touch before running it :
Code:
touch -t <timestamp 2 minutes before> ${HOME}/tmp/videos.ref
It is another way process only new files as Beryllos in post #5 with diff.
 
Old 09-05-2018, 09:41 PM   #9
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,790

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
A shorter and safer way of the post #5 (allows special characters in file names):
Code:
donefile=oldfiles.txt
for item in *.jpg
do
    # skip none-files and the done files
    if [ ! -f "$item" ] || fgrep -qx "$item" $donefile
    then
        echo "skipping $item"
        continue
    fi
    # do stuff
    
    # append filename to done files
    echo "$item" >> $donefile
done
 
1 members found this post helpful.
Old 09-06-2018, 10:36 AM   #10
Beryllos
Member
 
Registered: Apr 2013
Location: Massachusetts
Distribution: Debian
Posts: 529

Rep: Reputation: 319Reputation: 319Reputation: 319Reputation: 319
Looks good. Thanks for the several improvements.
 
Old 09-08-2018, 08:46 PM   #11
Field95
LQ Newbie
 
Registered: Sep 2018
Location: xmpp:zemri@dismail.de
Posts: 13

Rep: Reputation: Disabled
Thought I'd take a go at this with python.
It monitors the directories you specify by checking the directories modification date, then checking if the file has already been processed in that directory.
If something has changed, it'll run the command on that file. So something like
Code:
./directory_monitor.py "echo -e .\t" foobar
./directory_monitor.py script.sh foobar
with script.sh containing:
Code:
echo -e ".\t" "$1"
would do this:
Code:
.    foobar/file1
.    foobar/file2
.    ...
You can run it in a daemon mode with -d or --daemon where it simply checks every second if something happened.

Code:
usage: monitor_directory.py [-h] [-d] [--temp-file TEMP_FILE] [--trim-cache]
                            [--include INCLUDE] [--exclude EXCLUDE]
                            command [directories [directories ...]]

Monitors directory for changes and runs command on new files

positional arguments:
  command               Run command or script on each file: ./script
                        file_foobar
  directories

optional arguments:
  -h, --help            show this help message and exit
  -d, --daemon
  --temp-file TEMP_FILE
                        Location of cache file, default /tmp
  --trim-cache          Remove irrelevant directories from the cache
  --include INCLUDE     glob file matching, can be invoked multiple times
  --exclude EXCLUDE     glob file matching, can be invoked multiple times
monitor_directory.py
Code:
#!/usr/bin/env python3

import argparse
import pathlib
import json
import os
import subprocess
import tempfile
import time


def main():
    parser = argparse.ArgumentParser(description="Monitors directory for changes and runs command on new files")
    parser.add_argument("-d", "--daemon",
                        action="store_true",
                        )
    parser.add_argument("--temp-file",
                        default=os.path.join(tempfile.gettempdir(), "monitor.tmp"),
                        help="Location of cache file, default {}".format(tempfile.gettempdir())
                        )
    parser.add_argument("--trim-cache",
                        action="store_true",
                        help="Remove irrelevant directories from the cache")
    parser.add_argument("--include",
                        action="append",
                        help="glob file matching, can be invoked multiple times",
                        )
    parser.add_argument("--exclude",
                        action="append",
                        help="glob file matching, can be invoked multiple times",
                        )
    parser.add_argument("command",
                        nargs=1,
                        help="Run command or script on each file: ./script file_foobar",
                        )
    parser.add_argument("directories",
                        default=[os.getcwd()],
                        nargs='*',
                        )
    args = parser.parse_args()

    # If is file, treat it as a script. If not, run as a command
    # Get scripts absolute path
    if os.path.isfile(os.path.abspath(args.command[0])):
        command = os.path.abspath(args.command[0])
    else:
        command = args.command[0]

    temp_file = args.temp_file
    args.directories = [os.path.abspath(directory) for directory in args.directories]

    # Manage loading of cache_directories.
    # Used to know when things have been processed or not.
    # If it doesn't exist, create one
    if os.path.isfile(temp_file):
        with open(temp_file) as f:
            cached_directories = json.load(f)

            # --trim option
            # If the directory isn't specified and is in the cache, remove it
            if args.trim_cache is True:
                non_existing_directories = list()
                for directory in cached_directories:
                    if directory not in args.directories:
                        non_existing_directories.append(directory)
                for directory in non_existing_directories:
                    cached_directories.pop(directory)
    else:
        cached_directories = dict()

    # Decide to run as daemon or script.
    # Having updated values will trigger a write to the temporary file
    if args.daemon is True:
        while True:
            try:
                result_cached_directories = process_files_command(command,
                                                                  cached_directories,
                                                                  args.directories,
                                                                  include=args.include,
                                                                  exclude=args.exclude)
                if result_cached_directories is True:
                    write_json_file(cached_directories, temp_file)
                time.sleep(1)
            except KeyboardInterrupt:
                write_json_file(cached_directories, temp_file)
                break
    else:
        result_cached_directories = process_files_command(command,
                                                          cached_directories,
                                                          args.directories,
                                                          include=args.include,
                                                          exclude=args.exclude)
        if result_cached_directories is True:
            write_json_file(cached_directories, temp_file)


def process_files_command(command, cache_dictionary, directories, include=None, exclude=None):
    def run_command(directory, file, command):
        subprocess.run(command.split(" ") + [os.path.join(directory, file)])
        cache_dictionary[directory][1].add(file)

    for directory in cache_dictionary:
        # Convert list of processed files to a set
        cache_dictionary[directory][1] = set(cache_dictionary[directory][1])
    for directory in directories:
        if directory in cache_dictionary:
            cached_directory_time = cache_dictionary[directory][0]
            current_directory_time = os.stat(directory).st_mtime
            if current_directory_time > cached_directory_time:
                if include or exclude:
                    directory_files = file_include_exclude(directory=directory, include=include, exclude=exclude)
                else:
                    directory_files = (file
                                       for file in os.listdir(directory)
                                       if os.path.isfile(os.path.join(directory, file)))
                # Check to see if the file is in the cache.
                # Ignore if so.
                for file in directory_files:
                    if file not in cache_dictionary[directory][1]:
                        run_command(directory, file, command)
                        cache_dictionary[directory][1].add(file)
                cache_dictionary[directory][0] = current_directory_time
                # Returning True indicates a change occurred. So the cache file should be written.
                return True
        else:
            cache_dictionary[directory] = [os.stat(directory).st_mtime, set()]
            if include or exclude:
                directory_files = file_include_exclude(directory=directory, include=include, exclude=exclude)
            else:
                directory_files = (file
                                   for file in os.listdir(directory)
                                   if os.path.isfile(os.path.join(directory, file)))
            for file in directory_files:
                run_command(directory, file, command)
            return True


def write_json_file(dictionary, temp_file):
    cached_directories = dictionary
    with open(temp_file, "w") as temp_file_write_object:
        for directory in cached_directories:
            # Convert list of processed files to a set
            cached_directories[directory][1] = list(cached_directories[directory][1])
        json.dump(cached_directories, temp_file_write_object, indent=4, sort_keys=True)


def file_include_exclude(*, directory, include, exclude):
    files = [file for file in os.listdir(directory)]
    if include:
        included_filenames = {file for glob_match in include
                              for file in files
                              if pathlib.PurePath(file).match(glob_match)}
    else:
        included_filenames = set()
    if exclude:
        excluded_filenames = {file for glob_match in exclude
                              for file in files
                              if pathlib.PurePath(file).match(glob_match)}
    else:
        excluded_filenames = set()

    for file in files:
        if file in included_filenames:
            yield file
        elif file not in excluded_filenames and len(excluded_filenames) > 0:
            yield file


if __name__ == '__main__':
    main()
I'll paste source link when I'm not to be flagged for posting links
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Java, Do-While Loop, Loop Won't Break When Cnodition is Met killingthemonkey Programming 2 06-16-2017 12:42 PM
In a for loop that operates on multiple servers, need both the options to enter the password or to skip to next server in loop rajkamalhm Linux - Newbie 7 06-08-2016 09:28 PM
how to loop over text file lines within bash script for loop? johnpaulodonnell Linux - Newbie 9 07-28-2015 03:49 PM
[SOLVED] Bash - While Loop reading from two lists simultaneously - nested while loop wolverene13 Programming 11 10-01-2011 05:00 PM
loop packets out tap0 like an access point cr13 Linux - Networking 0 07-12-2007 02:42 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:36 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration