Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I've written a script to receive parameters (SOURCEDIR, TARGETDIR, FILEMASK, FILEMIN) to search for files and move them if they match the criteria. There is more to the complete script but this is the FIND command I am using :-
Code:
for file in `find ${SOURCEDIR} -maxdepth 1 -name "${FILEMASK}" -type f -mmin +${FILEMIN}`; do
However, it finds other files that I am NOT looking for (which are handled automatically by another process). For example, looking for a filename "*ER1_100*", it finds "EDI1160_Z_SHP_OBDLV_SAVE_REPLICA_22648345".
Am I getting the name process wrong or is there some funny way that the FIND process works that I don't understand (quite likely..!!).
for file in `find ${SOURCEDIR} -maxdepth 1 -name "${FILEMASK}" -type f -mmin +${FILEMIN}`; do
There's nothing obviously wrong with that find command. Try running the script with the "-x" option ("bash -x path/to/script ...") to see how that command is actually being invoked.
you could easily use sub strings and find something like this
Code:
#!/bin/bash
FILEMASK="ER1_100"
SOURCEDIR=
TARGETDIR=
FILEMIN=
while read f ; do
xpath=${f%/*}
xbase=${f##*/}
xfext=${xbase##*.}
xpref=${xbase%.*}
path=${xpath}
pref=${xpref}
ext=${xfext}
mkdir -p "$TARGETDIR"
// pref is just the file name minus everything else . ie path and extension if one.
// do whatever you want to it here when it matches your sub string aka filemask.
[[ $pref =~ "$FILEMASK" ]] && mv -v "$f" "$TARGETDIR"
done <<<"$(find ${SOURCEDIR} -maxdepth 1 -type f -mmin +${FILEMIN})"
it is still basically doing the same thing, this i way you could also set up different sub strings for more then one file type name and process them all within the same script as well. Just by adding different sub string variables and using if else statements.
I don't think the wildcard '*' characters are going to work correctly.
As long as they are in a quoted string so that the shell won't try to expand them, they will work just fine in find.
Note that a construct like "for F in `find ...`; do ...; done` is going to misbehave quite badly of any of the found names contains embedded whitespace characters. The "for" loop is going to interpret each of those words as separate names. If the names contain any shell meta-characters, the shell will try to process those, too, before assigning the result to the variable.
There's nothing obviously wrong with that find command. Try running the script with the "-x" option ("bash -x path/to/script ...") to see how that command is actually being invoked.
I've got the output now from the SAP guys (who run the script in some weird way..its not scheduled in CRON)
Code:
[ -z ER2 ]
basename /usr/sap/ER2/interfaces/ER2/100/scripts/movefile.sh .sh
basename=movefile
dirname /usr/sap/ER2/interfaces/ER2/100/scripts/movefile.sh
BASEDIR=/usr/sap/ER2/interfaces/ER2/100/scripts
date '+%d/%m/%y %X'
DATE='16/02/18 10:08:50 AM'
awk -FER2 '{print length($0)-length($NF)+2}'
echo /usr/sap/ER2/interfaces/ER2/100/scripts
SIDPOS=29
expr substr /usr/sap/ER2/interfaces/ER2/100/scripts 29 3
CLIENT=100
BASEPATH=/usr/sap/ER2/interfaces/ER2/100
date +%Y%m
LOGFILE=/usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
date '+%d/%m/%y %X'
echo '16/02/18 10:08:50 AM - Start process..'
1>> /usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
[ -d /usr/sap/ER2/interfaces/ER2/100/out/ -a -d /usr/sap/ER2/interfaces/ER2/100/out/GIS/ ]
[ 2 -ge 0 ]
find /usr/sap/ER2/interfaces/ER2/100/out/ -maxdepth 1 -name '*SEPA_PAIN*' -type f -mmin +2
echo ' Moving file /usr/sap/ER2/interfaces/ER2/100/out/SEPA_PAIN03_5539.xml from /usr/sap/ER2/interfaces
1>> /usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
mv /usr/sap/ER2/interfaces/ER2/100/out/SEPA_PAIN03_5539.xml /usr/sap/ER2/interfaces/ER2/100/out/GIS/
date '+%d/%m/%y %X'
echo '16/02/18 10:08:50 AM - End process..'
1>> /usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
[ Y '==' Y ]
Everything seems normal to me and, in this case, it worked. I think the trouble here is going to be catching the failure as it doesn't always happen (obviously..)
It's almost like the find does things in stages (like find the files older than 2 mins and then run themask check against them). Is this possible?
If the find failure concerns the filenames, there are not many possible sources of trouble.
It's about what ${FILEMASK} contains, so I will troubleshoot how its value is assigned, put some echo ${FILEMASK} trough the script
Of course it helps to use same conditions as when the script fails (not when it's working)
If the find failure concerns the filenames, there are not many possible sources of trouble.
It's about what ${FILEMASK} contains, so I will troubleshoot how its value is assigned, put some echo ${FILEMASK} trough the script
Of course it helps to use same conditions as when the script fails (not when it's working)
The FILEMASK is not changed at any point in the script and contains the correct value at time of the find execution (in this case "*SEPA_PAIN*").
This is an intermittent issue so we cannot duplicate whenever we want. The directory where we are searching for the files is dynamic (so files can be added and/or removed while this script is running) so I was wondering whether the find is creating a list of the files (internally) and then trying to process them and finding that one of the files in it's list is no longer there. Is this possible?
I would try find without the for loop. use find -D (see man page about it) if you wish to debug find (=to understand how it works).
Yes that is possible find found a file but has been deleted before processing.
I suppose a better phrasing of my question would be - would find list all the files internally and then try and action the parameters against that list. The thing I find hard to understand is why, when I have given the filename mask to use (in this case '*SEPA_PAIN*'), does it complain about a file that it should not even be looking at (ie. the filename does not contain the filemask specified)
I have tried to use the -D option but I'm afraid I'm left non the wiser after viewing the output - it didn't make a lot of sense to me..!
I've looked at the execution of find with ltrace, and I see that when it processes a directory it first reads and saves all the names unconditionally. Then, taking the names one at a time, it performs all tests on each name before moving on to the next.
By any chance do any of the names in the directory contain any embedded shell meta-characters? Any such name returned by find is going to get expanded by the shell since pathname expansion occurs after command substitution. Combine that with some embedded whitespace characters in the names and your "for ..." loop could end up doing almost anything. For example, an unlikely name "EDI* ER1_100" could satisfy the name test in your find command and would result in processing all files with names that start with "EDI". It's generally a lot safer to let find directly invoke the processing (with "-exec") rather than letting a shell process the names first.
I've looked at the execution of find with ltrace, and I see that when it processes a directory it first reads and saves all the names unconditionally. Then, taking the names one at a time, it performs all tests on each name before moving on to the next.
By any chance do any of the names in the directory contain any embedded shell meta-characters? Any such name returned by find is going to get expanded by the shell since pathname expansion occurs after command substitution. Combine that with some embedded whitespace characters in the names and your "for ..." loop could end up doing almost anything. For example, an unlikely name "EDI* ER1_100" could satisfy the name test in your find command and would result in processing all files with names that start with "EDI". It's generally a lot safer to let find directly invoke the processing (with "-exec") rather than letting a shell process the names first.
I thought it might be doing something like this but had no idea how to prove it..! So thanks for that - it explains a lot.
None of the names should be "funny" - they are all coming out from SAP so will be fairly standard creation - nothing manual!
From "-exec" perspective, hadn't really thought about that. So, excuse any errors in this attempt, would it be something like this (based on original find)..?
Maybe error comes not from find but from following commands in the loop
What is the exact evidence that proves that find returns wrong files in the script?
As I said in my original post..
For example, looking for a filename "*ER1_100*", it finds "EDI1160_Z_SHP_OBDLV_SAVE_REPLICA_22648345"
The error we got in the script was "file not found" as the "EDI1160" mentioned had already been moved to another directory bu another script and, in any case, wasn't a file that I wanted to perform any actions on.
As it was already explained by rknichols find first reads the directory and next it processes this list. It is a common problem if you modifies the directory during that process. find may fail just because it cannot follow that change. For example find still thinks the first entry in a directory matches, but in the meantime you removed that file from that dir, but a new one was created - so find will print the first file, which is now incorrect. I don't know the internals of find, but that is what I can imagine (or something similar).
But I have other comments too, which may help:
for i in $(find ... )
is not a good construct, because find is a loop itself, putting the result in another loop is superfluous. One thing which will definitely happen: find will run in a new shell and that for loop will not be started before the find was completed.
So find starts to work on a directory and for will process the result only after the find completed. and during that time the content of that dir was changed. This is definitely unsafe. Better to use:
find ..... -exec script.sh {} \;
where this script is the same as the main part of your loop (every file will be passed one by one. In this case the script will start earlier (do not need to wait), but it is still unsafe.
If you want to be even much faster you may need to implement that find in python/perl whatever, which may act immediately in case of match (but you cannot completely eliminate this race condition this way).
Also you may want to drop find completely, because ls *MASK* is simply much faster (you may check other attributes later). Although this construct is still unsafe.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.