LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Linux Newbie with issues with find (https://www.linuxquestions.org/questions/linux-newbie-8/linux-newbie-with-issues-with-find-4175519222/)

Elrique64 09-17-2014 07:37 PM

Linux Newbie with issues with find
 
I've got almost 20 years working with computers. It's all been with Windows/DOS machines, though. Now I find myself with a Linux flavored NAs and having issues getting things working the way I've wanted.

Some things I have had issues with, I've found answers for here. (cronjobs to keep a drive enclosure alive, scripting to check an application.) Other things I've had some problems finding information on.

Specifically, I'm downloading files from a local server. Some of these files are pdf's, others are text, docs, etc. They are all placed in /download. Sometimes in that folder, sometimes in folders under it. (/download/testing/, various other folders, each named uniquely.)

I think there must be a way to check recursively within a folder and find the files by extension, move those with *,pdf into a folder so they don't need to be worked on. Move the others into a folder like /volume1/needswork/ so those can be checked, modified and converted to pdf.

I know this would be a 2 step or more process, and need to check the original folder on approximately an hourly basis, movee the files, and go on.

I expect this is a rather complex set of commands. I have already tested this command in the folder looking for it. It doesn't work. (-exec needs an argument)

find . \( -name "/download/*.pdf" \) -exec mv {} /volume1/USB1/checked/ \;

Am I missing something here?

Thanks in advance.
Mike

norobro 09-17-2014 10:02 PM

bash spits out a warning on my Debian box using the command that you posted:
Quote:

find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `/download/*.pdf'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'.
The following worked:
Code:

find . -wholename ./download/*.pdf -exec mv {} checked/ \;

grail 09-18-2014 02:13 AM

As you have a 2 part process which find would not be able to do, ie pdf files here and other files elsewhere, I would write it as a script.

Something simple like below to get you started:
Code:

#!/usr/bin/env bash

initial_dir="$1"
pdf_dir="$2"
other_dir="$3"

while read -r path_to_file
do
  case "${path_to_file##*.}" in
    pdf) dest_path=$pdf_dir
      *) dest_path=$other_dir
  esac

  mv "$path_to_file" "$dest_path"
done< <(find "$initial_dir" -type f)

Now this has no error checking, but it should give you the general idea :)

I have used a case statement for a little future proofing being that you might end up with several extensions and place them in
alternate locations

If you wanted to you could also spiffy it up a little by using associative arrays which would void the case being required.

Elrique64 09-18-2014 10:13 AM

Grail,

Your solution brings up almost as many questions as it does answers.... :)

I have NO scripting experience, and this smacks more like actual programming than anything in batch files associated with DOS. So I look at what you have and stare in wonder... :)

First is, how is path_to_file set?
initial_dir, etc are all preset in the script? ie I replace the $1 with the actual path, or?
This would get dropped into the init.d folder and named something like S##ChckMov.sh and then a cronjob added?

So far, I've copied/pasted from a couple of sources to get scripts running to check if an app is hung on the NAS. This was something I barely understood the workings of. Your script I think I understand except the points mentioned above. Thanks for taking the time.

Mike

rtmistler 09-18-2014 01:38 PM

It's an interesting problem. If I may state it again to make sure I understand:
  1. From within the /download directory, once per hour
  2. Find all PDF files and move them to a "checked" directory
  3. Move all other files and sub-directories to a "needswork" directory
Issues I see here are whether or not multiple users or automated processes will place PDF files into /download. Because you may run one command to find all PDF and move them over, then run the next command which moves all remaining files; however if things are moving into that directory rapidly, then an extra PDF file may enter that directory shortly after having moved all the PDFs. The other issue is you can look and see a filename which is not closed, but instead in progress of being created or added too. Therefore you can end up trying to move an unresolved file, getting an error, or causing problems where you move a copy and then more of that file gets filled up by some other process.

My style would be to determine the two commands to accomplish this and in my case the easier ones would be the find and exec manner and also mv. And then figure out how to do this once per hour. It'll probably be something commented on, but I've run scripts which run forever in a loop, usually to test things and so I put in a sleep 1s into them. Unsure if just running a script and putting in sleep 60s would be considered bad, but that would be my first attempt; which is brute force, before I began to refine.

I'd also move all files first and then find only PDF files in the moved to location and re-move those to the checked directory. Why? I'd be concerned that files entered after I checked for and moved PDF files and then next went to move the rest; also given that "the rest" are "any" types of files and/or directories, so in that second case, I'm not searching for specific extensions or names, pretty much just any files.

Haven't run this, but the general gist of my first attempt would be a bash script:
Code:

#!/bin/sh
# Script to move files from /download to /needswork and /checked directories

set -xv

while [ 1 == 1 ]; do
    mv -rf /download/* /needswork/.
    find /needswork -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
done

You may need to be root or use sudo for these commands, because directories like /download would normally be owned by root and the files may also be owned by multiple users, I don't know it depends where these files are coming from, how they're getting there, etc. Obviously you can also do this in a /home/user directory where all the files and directories are owned by the user running the script.

I would run a script like that in the testing phase from a terminal and then I could echo status information; however the set -xv part will cause the script to output debug. It will be helpful to test what it's doing against the flow of data entering into your /download directory.

As far as zero sized files, incomplete files, or temporary files; you can instead of the broad brushed "mv" command you can find a way to detect non-zero sized files and not move them. If sub-directories are associated with certain files; a'la complete web site saves; then saem thing, detect a zero sized HTML file, detect the associated directory, and don't move both of those. This is obviously more sophisticated than a simple fine, but that's what refinement is all about. And finally temporary files; for instance you're downloading a very large XZ file, say a disk image; you'll have the XZ file there, but also have something like ....xz.temp which will be a temporary file that grows as you download and ends up being renamed or copied into the intended save file name. You can detect the zero sized file as well as the temp file and determine that you don't wish to move them until the download is complete, or rather some future loop iteration where this condition no longer exists.

All that about unresolved files I'd say is refinement, just IMHO.

Elrique64 09-18-2014 06:01 PM

You have the process that I'm trying to get to down pat. I'm liking the idea trains this thread is going down. :)

All of the files I would be searching for, in their downloaded state, are going to have an extension of .part. They are downloaded via torrent from the server to the NAS, and partial files would have a name like "abcxyz.pdf.part". Once the file is completed the .part comes off and the file name reverts to "abcxyz.pdf" as named on the server. Since we aren't looking for a contains pdf but ends pdf this should work quite well. I will run some tests.

Looking at your script, if the files are in a sub, will that sub be created in the new location? Even if they are pdf's? Or will the pdf's get moved into the parent /watched folder? (I'll ditz with this, too...) :) Also could I include another while loop to check for each extension .txt, .doc, etc... One for each, right? Don't move everything, but just the ones I want, when all of the .parts are done, delete the extra, extraneous stuff?

Modifying the script you provided:
Code:

#!/bin/sh
# Script to move files from /download to /needswork and /checked directories

set -xv

while [ 1 == 1 ]; do
    find /download -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.txt" -exec mv {} /needswork/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.doc" -exec mv {} /needswork/. \;
    sleep 60s
done

When I get the chance to test this I will, on a dummy folder/file structure. Now, somewhere I need to check for a .part file in the directory, so I don't corrupt the torrent session as it's downloading.

Beryllos 09-18-2014 09:34 PM

Quote:

Originally Posted by Elrique64 (Post 5240533)
... Modifying the script you provided:
Code:

#!/bin/sh
# Script to move files from /download to /needswork and /checked directories

set -xv

while [ 1 == 1 ]; do
    find /download -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.txt" -exec mv {} /needswork/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.doc" -exec mv {} /needswork/. \;
    sleep 60s
done


There is a small problem:
Code:

while [ 1 == 1 ]; do
    find whatever
    sleep 60s
done

This loop has no test and no way to break out. It will never get past the done statement.

If you want to execute a command every minute, cron could do that.

allend 09-18-2014 10:24 PM

Personally, I would prefer to use inotifywait to put a watch on the download directory and move files when they are closed.
Pinching grail's code I came up with this
Code:

#!/bin/bash

# Script to watch for creation of files in /download directory
#  and move to directories based on extension.
#
# This script uses the -m option to inotifywait and should never exit.

dl_dir="/download"
other_dir="/needswork/"
pdf_dir="/checked/"


inotifywait  -mq  --format '%f' \
  -e close_write "$dl_dir" \
  | while read file; do
        case "${file##*.}" in
          part) continue
          pdf) dest_path=$pdf_dir
          *) dest_path=$other_dir
        esac

        mv "$file" "$dest_path"
    done


grail 09-18-2014 11:38 PM

I wasn't aware of the requirement to run this from cron or so on, but thought I would help you understand the code a little :)

First point would be that actually yes a bash script is similar in use to a windows batch script but the level of power in what can be done is considerably more.

First point is that as I thought it was a script you would be calling as a user, the $1, $2, etc used in the script are parameters passed to the script at point of running. So it would look something like:
Code:

$ ./process_downloads.sh /download /needswork /checked
Comment below should hoepfully help with explanation:
Code:

#!/usr/bin/env bash

#parameters as per example above
initial_dir="$1"  # /download
pdf_dir="$2"      # /needswork
other_dir="$3"    # /checked

while read -r path_to_file
do
  case "${path_to_file##*.}" in      # here we strip off everything up to and including the dot (.) prior to extension
    pdf) dest_path=$pdf_dir          # if extension after stripping is 'pdf' set dest_path to pdf_dir value
      *) dest_path=$other_dir        # if extension after stripping is anything else set dest_path to other_dir value
  esac

  mv "$path_to_file" "$dest_path"    # perform the actual move
done< <(find "$initial_dir" -type f)  # path_to_file is populated by what is returned from this find query

Having now had further information, namely:
Quote:

All of the files I would be searching for, in their downloaded state, are going to have an extension of .part. They are downloaded via torrent from the server to the NAS, and partial files would have a name like "abcxyz.pdf.part". Once the file is completed the .part comes off and the file name reverts to "abcxyz.pdf" as named on the server. Since we aren't looking for a contains pdf but ends pdf this should work quite well. I will run some tests.
From this I would be lead to believe that only files without a '.part' extension should be touched and this would also eliminate the need for something like inotify and a simple cron put in place.
My thinking here is that once the '.part' is removed will be the only time we need to move a file and at this point we are also saying that the file has now finished any processing / downloading
(I may be wrong and am sure allend will let us know :) )

So you can alter the above to include your new extensions like so:
Code:

#!/usr/bin/env bash

initial_dir="/download"
needs_work="/needswork"
checked="/checked"

while read -r path_to_file
do
  case "${path_to_file##*.}" in
    pdf) dest_path=$checked
    txt|doc) dest_path=$needs_work
  esac

  mv "$path_to_file" "$dest_path"
done< <(find "$initial_dir" -type f)

This could now be requested to run in a cron job every hour.

allend 09-19-2014 06:26 AM

Just want to say the I have edited my original post in this thread to remove some inaccurate comments in the code that I missed when I did a copy and paste from an earlier LQ post. I have also added a line to cover the creation of .part files. My intention was to demonstrate an alternative method for the OP to consider.
@OP Do as grail says. I would never presume that I could improve on what grail presents. That post to reputation points ratio is well deserved.

rtmistler 09-19-2014 08:49 AM

As Beryllos points out the loop is forever so will not break out. Therefore also the second and third loops won't be run, you could do it differently such as:
Code:

while [ 1 == 1 ]; do
    find /download -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
    find /download -name "*.txt" -exec mv {} /needswork/. \;
    sleep 60s
    find /download -name "*.doc" -exec mv {} /needswork/. \;
    sleep 60s
done;

mv will move directories. If there are duplicate names already there it will not move them unless you use the -f flag. Therefore a problem can occur with duplicated sub-directory names.

HOWEVER!!!! Sorry for the bold but intending to get your attention:
Quote:

Originally Posted by Elrique64 (Post 5240533)
They are downloaded via torrent from the server to the NAS

MOST torrent applications have settings which are such that you tell it, "place temporary files ... 'here' ", and "place completed torrents ... 'here' " If that would suit you, I'd recommend you consider that alternative versus what you're doing now.

And further, there are much more elegant and more well versed script writers offering comments. Similarly their suggestions go over my experience level, but worth looking at providing you can understand how the script ultimately does work.


All times are GMT -5. The time now is 01:25 AM.