LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-17-2014, 07:37 PM   #1
Elrique64
LQ Newbie
 
Registered: Sep 2014
Posts: 3

Rep: Reputation: Disabled
Linux Newbie with issues with find


I've got almost 20 years working with computers. It's all been with Windows/DOS machines, though. Now I find myself with a Linux flavored NAs and having issues getting things working the way I've wanted.

Some things I have had issues with, I've found answers for here. (cronjobs to keep a drive enclosure alive, scripting to check an application.) Other things I've had some problems finding information on.

Specifically, I'm downloading files from a local server. Some of these files are pdf's, others are text, docs, etc. They are all placed in /download. Sometimes in that folder, sometimes in folders under it. (/download/testing/, various other folders, each named uniquely.)

I think there must be a way to check recursively within a folder and find the files by extension, move those with *,pdf into a folder so they don't need to be worked on. Move the others into a folder like /volume1/needswork/ so those can be checked, modified and converted to pdf.

I know this would be a 2 step or more process, and need to check the original folder on approximately an hourly basis, movee the files, and go on.

I expect this is a rather complex set of commands. I have already tested this command in the folder looking for it. It doesn't work. (-exec needs an argument)

find . \( -name "/download/*.pdf" \) -exec mv {} /volume1/USB1/checked/ \;

Am I missing something here?

Thanks in advance.
Mike

Last edited by Elrique64; 09-17-2014 at 07:40 PM.
 
Old 09-17-2014, 10:02 PM   #2
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 619

Rep: Reputation: 238Reputation: 238Reputation: 238
bash spits out a warning on my Debian box using the command that you posted:
Quote:
find: warning: Unix filenames usually don't contain slashes (though pathnames do). That means that '-name `/download/*.pdf'' will probably evaluate to false all the time on this system. You might find the '-wholename' test more useful, or perhaps '-samefile'.
The following worked:
Code:
find . -wholename ./download/*.pdf -exec mv {} checked/ \;
 
Old 09-18-2014, 02:13 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,499

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
As you have a 2 part process which find would not be able to do, ie pdf files here and other files elsewhere, I would write it as a script.

Something simple like below to get you started:
Code:
#!/usr/bin/env bash

initial_dir="$1"
pdf_dir="$2"
other_dir="$3"

while read -r path_to_file
do
  case "${path_to_file##*.}" in
    pdf) dest_path=$pdf_dir
      *) dest_path=$other_dir
  esac

  mv "$path_to_file" "$dest_path"
done< <(find "$initial_dir" -type f)
Now this has no error checking, but it should give you the general idea

I have used a case statement for a little future proofing being that you might end up with several extensions and place them in
alternate locations

If you wanted to you could also spiffy it up a little by using associative arrays which would void the case being required.
 
1 members found this post helpful.
Old 09-18-2014, 10:13 AM   #4
Elrique64
LQ Newbie
 
Registered: Sep 2014
Posts: 3

Original Poster
Rep: Reputation: Disabled
Grail,

Your solution brings up almost as many questions as it does answers....

I have NO scripting experience, and this smacks more like actual programming than anything in batch files associated with DOS. So I look at what you have and stare in wonder...

First is, how is path_to_file set?
initial_dir, etc are all preset in the script? ie I replace the $1 with the actual path, or?
This would get dropped into the init.d folder and named something like S##ChckMov.sh and then a cronjob added?

So far, I've copied/pasted from a couple of sources to get scripts running to check if an app is hung on the NAS. This was something I barely understood the workings of. Your script I think I understand except the points mentioned above. Thanks for taking the time.

Mike
 
Old 09-18-2014, 01:38 PM   #5
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 5,133
Blog Entries: 10

Rep: Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827
It's an interesting problem. If I may state it again to make sure I understand:
  1. From within the /download directory, once per hour
  2. Find all PDF files and move them to a "checked" directory
  3. Move all other files and sub-directories to a "needswork" directory
Issues I see here are whether or not multiple users or automated processes will place PDF files into /download. Because you may run one command to find all PDF and move them over, then run the next command which moves all remaining files; however if things are moving into that directory rapidly, then an extra PDF file may enter that directory shortly after having moved all the PDFs. The other issue is you can look and see a filename which is not closed, but instead in progress of being created or added too. Therefore you can end up trying to move an unresolved file, getting an error, or causing problems where you move a copy and then more of that file gets filled up by some other process.

My style would be to determine the two commands to accomplish this and in my case the easier ones would be the find and exec manner and also mv. And then figure out how to do this once per hour. It'll probably be something commented on, but I've run scripts which run forever in a loop, usually to test things and so I put in a sleep 1s into them. Unsure if just running a script and putting in sleep 60s would be considered bad, but that would be my first attempt; which is brute force, before I began to refine.

I'd also move all files first and then find only PDF files in the moved to location and re-move those to the checked directory. Why? I'd be concerned that files entered after I checked for and moved PDF files and then next went to move the rest; also given that "the rest" are "any" types of files and/or directories, so in that second case, I'm not searching for specific extensions or names, pretty much just any files.

Haven't run this, but the general gist of my first attempt would be a bash script:
Code:
#!/bin/sh
# Script to move files from /download to /needswork and /checked directories

set -xv

while [ 1 == 1 ]; do
    mv -rf /download/* /needswork/.
    find /needswork -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
done
You may need to be root or use sudo for these commands, because directories like /download would normally be owned by root and the files may also be owned by multiple users, I don't know it depends where these files are coming from, how they're getting there, etc. Obviously you can also do this in a /home/user directory where all the files and directories are owned by the user running the script.

I would run a script like that in the testing phase from a terminal and then I could echo status information; however the set -xv part will cause the script to output debug. It will be helpful to test what it's doing against the flow of data entering into your /download directory.

As far as zero sized files, incomplete files, or temporary files; you can instead of the broad brushed "mv" command you can find a way to detect non-zero sized files and not move them. If sub-directories are associated with certain files; a'la complete web site saves; then saem thing, detect a zero sized HTML file, detect the associated directory, and don't move both of those. This is obviously more sophisticated than a simple fine, but that's what refinement is all about. And finally temporary files; for instance you're downloading a very large XZ file, say a disk image; you'll have the XZ file there, but also have something like ....xz.temp which will be a temporary file that grows as you download and ends up being renamed or copied into the intended save file name. You can detect the zero sized file as well as the temp file and determine that you don't wish to move them until the download is complete, or rather some future loop iteration where this condition no longer exists.

All that about unresolved files I'd say is refinement, just IMHO.
 
Old 09-18-2014, 06:01 PM   #6
Elrique64
LQ Newbie
 
Registered: Sep 2014
Posts: 3

Original Poster
Rep: Reputation: Disabled
You have the process that I'm trying to get to down pat. I'm liking the idea trains this thread is going down.

All of the files I would be searching for, in their downloaded state, are going to have an extension of .part. They are downloaded via torrent from the server to the NAS, and partial files would have a name like "abcxyz.pdf.part". Once the file is completed the .part comes off and the file name reverts to "abcxyz.pdf" as named on the server. Since we aren't looking for a contains pdf but ends pdf this should work quite well. I will run some tests.

Looking at your script, if the files are in a sub, will that sub be created in the new location? Even if they are pdf's? Or will the pdf's get moved into the parent /watched folder? (I'll ditz with this, too...) Also could I include another while loop to check for each extension .txt, .doc, etc... One for each, right? Don't move everything, but just the ones I want, when all of the .parts are done, delete the extra, extraneous stuff?

Modifying the script you provided:
Code:
#!/bin/sh
# Script to move files from /download to /needswork and /checked directories

set -xv

while [ 1 == 1 ]; do
    find /download -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.txt" -exec mv {} /needswork/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.doc" -exec mv {} /needswork/. \;
    sleep 60s
done
When I get the chance to test this I will, on a dummy folder/file structure. Now, somewhere I need to check for a .part file in the directory, so I don't corrupt the torrent session as it's downloading.

Last edited by Elrique64; 09-18-2014 at 06:06 PM.
 
Old 09-18-2014, 09:34 PM   #7
Beryllos
Member
 
Registered: Apr 2013
Location: Massachusetts
Distribution: Debian
Posts: 330

Rep: Reputation: 126Reputation: 126
Quote:
Originally Posted by Elrique64 View Post
... Modifying the script you provided:
Code:
#!/bin/sh
# Script to move files from /download to /needswork and /checked directories

set -xv

while [ 1 == 1 ]; do
    find /download -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.txt" -exec mv {} /needswork/. \;
    sleep 60s
done
while [ 1 == 1 ]; do
    find /download -name "*.doc" -exec mv {} /needswork/. \;
    sleep 60s
done
There is a small problem:
Code:
while [ 1 == 1 ]; do
    find whatever
    sleep 60s
done
This loop has no test and no way to break out. It will never get past the done statement.

If you want to execute a command every minute, cron could do that.

Last edited by Beryllos; 09-19-2014 at 03:41 AM. Reason: removed incorrect remark about filenames containing spaces -- not a problem here
 
Old 09-18-2014, 10:24 PM   #8
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,540

Rep: Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421
Personally, I would prefer to use inotifywait to put a watch on the download directory and move files when they are closed.
Pinching grail's code I came up with this
Code:
#!/bin/bash

# Script to watch for creation of files in /download directory
#  and move to directories based on extension.
#
# This script uses the -m option to inotifywait and should never exit.

dl_dir="/download"
other_dir="/needswork/"
pdf_dir="/checked/"


inotifywait  -mq  --format '%f' \
  -e close_write "$dl_dir" \
  | while read file; do
        case "${file##*.}" in
          part) continue
          pdf) dest_path=$pdf_dir
          *) dest_path=$other_dir
        esac

        mv "$file" "$dest_path"
    done

Last edited by allend; 09-19-2014 at 06:16 AM.
 
Old 09-18-2014, 11:38 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,499

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
I wasn't aware of the requirement to run this from cron or so on, but thought I would help you understand the code a little

First point would be that actually yes a bash script is similar in use to a windows batch script but the level of power in what can be done is considerably more.

First point is that as I thought it was a script you would be calling as a user, the $1, $2, etc used in the script are parameters passed to the script at point of running. So it would look something like:
Code:
$ ./process_downloads.sh /download /needswork /checked
Comment below should hoepfully help with explanation:
Code:
#!/usr/bin/env bash

#parameters as per example above
initial_dir="$1"  # /download
pdf_dir="$2"      # /needswork
other_dir="$3"    # /checked

while read -r path_to_file
do
  case "${path_to_file##*.}" in       # here we strip off everything up to and including the dot (.) prior to extension
    pdf) dest_path=$pdf_dir           # if extension after stripping is 'pdf' set dest_path to pdf_dir value
      *) dest_path=$other_dir         # if extension after stripping is anything else set dest_path to other_dir value
  esac

  mv "$path_to_file" "$dest_path"     # perform the actual move
done< <(find "$initial_dir" -type f)  # path_to_file is populated by what is returned from this find query
Having now had further information, namely:
Quote:
All of the files I would be searching for, in their downloaded state, are going to have an extension of .part. They are downloaded via torrent from the server to the NAS, and partial files would have a name like "abcxyz.pdf.part". Once the file is completed the .part comes off and the file name reverts to "abcxyz.pdf" as named on the server. Since we aren't looking for a contains pdf but ends pdf this should work quite well. I will run some tests.
From this I would be lead to believe that only files without a '.part' extension should be touched and this would also eliminate the need for something like inotify and a simple cron put in place.
My thinking here is that once the '.part' is removed will be the only time we need to move a file and at this point we are also saying that the file has now finished any processing / downloading
(I may be wrong and am sure allend will let us know )

So you can alter the above to include your new extensions like so:
Code:
#!/usr/bin/env bash

initial_dir="/download"
needs_work="/needswork"
checked="/checked"

while read -r path_to_file
do
  case "${path_to_file##*.}" in
    pdf) dest_path=$checked
    txt|doc) dest_path=$needs_work
  esac

  mv "$path_to_file" "$dest_path"
done< <(find "$initial_dir" -type f)
This could now be requested to run in a cron job every hour.
 
Old 09-19-2014, 06:26 AM   #10
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,540

Rep: Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421Reputation: 1421
Just want to say the I have edited my original post in this thread to remove some inaccurate comments in the code that I missed when I did a copy and paste from an earlier LQ post. I have also added a line to cover the creation of .part files. My intention was to demonstrate an alternative method for the OP to consider.
@OP Do as grail says. I would never presume that I could improve on what grail presents. That post to reputation points ratio is well deserved.
 
Old 09-19-2014, 08:49 AM   #11
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 5,133
Blog Entries: 10

Rep: Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827Reputation: 1827
As Beryllos points out the loop is forever so will not break out. Therefore also the second and third loops won't be run, you could do it differently such as:
Code:
while [ 1 == 1 ]; do
    find /download -name "*.pdf" -exec mv {} /checked/. \;
    sleep 60s
    find /download -name "*.txt" -exec mv {} /needswork/. \;
    sleep 60s
    find /download -name "*.doc" -exec mv {} /needswork/. \;
    sleep 60s
done;
mv will move directories. If there are duplicate names already there it will not move them unless you use the -f flag. Therefore a problem can occur with duplicated sub-directory names.

HOWEVER!!!! Sorry for the bold but intending to get your attention:
Quote:
Originally Posted by Elrique64 View Post
They are downloaded via torrent from the server to the NAS
MOST torrent applications have settings which are such that you tell it, "place temporary files ... 'here' ", and "place completed torrents ... 'here' " If that would suit you, I'd recommend you consider that alternative versus what you're doing now.

And further, there are much more elegant and more well versed script writers offering comments. Similarly their suggestions go over my experience level, but worth looking at providing you can understand how the script ultimately does work.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux Newbie, Wifi Issues gecko960 Linux - Newbie 11 06-07-2005 09:48 PM


All times are GMT -5. The time now is 01:38 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration