LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-07-2018, 11:00 AM   #1
harboa
LQ Newbie
 
Registered: Feb 2018
Posts: 14

Rep: Reputation: Disabled
Find command returns files I don't want


I hope this is in the correct area!

I've written a script to receive parameters (SOURCEDIR, TARGETDIR, FILEMASK, FILEMIN) to search for files and move them if they match the criteria. There is more to the complete script but this is the FIND command I am using :-

Code:
for file in `find ${SOURCEDIR} -maxdepth 1 -name "${FILEMASK}" -type f -mmin +${FILEMIN}`; do
However, it finds other files that I am NOT looking for (which are handled automatically by another process). For example, looking for a filename "*ER1_100*", it finds "EDI1160_Z_SHP_OBDLV_SAVE_REPLICA_22648345".

Am I getting the name process wrong or is there some funny way that the FIND process works that I don't understand (quite likely..!!).

Any help would be really appreciated.
 
Old 02-07-2018, 11:30 AM   #2
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
Quote:
Originally Posted by harboa View Post
Code:
for file in `find ${SOURCEDIR} -maxdepth 1 -name "${FILEMASK}" -type f -mmin +${FILEMIN}`; do
There's nothing obviously wrong with that find command. Try running the script with the "-x" option ("bash -x path/to/script ...") to see how that command is actually being invoked.
 
Old 02-07-2018, 12:01 PM   #3
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
you could easily use sub strings and find something like this

Code:
#!/bin/bash

FILEMASK="ER1_100"
SOURCEDIR=
TARGETDIR=
FILEMIN=

while read f ; do

	
	xpath=${f%/*} 
	xbase=${f##*/}
	xfext=${xbase##*.}
	xpref=${xbase%.*}
	path=${xpath}
	pref=${xpref}
	ext=${xfext}

 mkdir -p "$TARGETDIR"


// pref is just the file name minus everything else . ie path and extension if one.
  // do whatever you want to it here when it matches your sub string aka filemask.

[[ $pref =~ "$FILEMASK" ]] && mv -v "$f" "$TARGETDIR"
 


done <<<"$(find ${SOURCEDIR} -maxdepth 1 -type f -mmin +${FILEMIN})"
it is still basically doing the same thing, this i way you could also set up different sub strings for more then one file type name and process them all within the same script as well. Just by adding different sub string variables and using if else statements.

Last edited by BW-userx; 02-07-2018 at 12:22 PM.
 
Old 02-07-2018, 06:18 PM   #4
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
I don't think the wildcard '*' characters are going to work correctly.
 
Old 02-07-2018, 06:51 PM   #5
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
Quote:
Originally Posted by AwesomeMachine View Post
I don't think the wildcard '*' characters are going to work correctly.
As long as they are in a quoted string so that the shell won't try to expand them, they will work just fine in find.

Note that a construct like "for F in `find ...`; do ...; done` is going to misbehave quite badly of any of the found names contains embedded whitespace characters. The "for" loop is going to interpret each of those words as separate names. If the names contain any shell meta-characters, the shell will try to process those, too, before assigning the result to the variable.
 
Old 02-21-2018, 03:44 AM   #6
harboa
LQ Newbie
 
Registered: Feb 2018
Posts: 14

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rknichols View Post
There's nothing obviously wrong with that find command. Try running the script with the "-x" option ("bash -x path/to/script ...") to see how that command is actually being invoked.
I've got the output now from the SAP guys (who run the script in some weird way..its not scheduled in CRON)

Code:
[ -z ER2 ]
basename /usr/sap/ER2/interfaces/ER2/100/scripts/movefile.sh .sh
basename=movefile
dirname /usr/sap/ER2/interfaces/ER2/100/scripts/movefile.sh
BASEDIR=/usr/sap/ER2/interfaces/ER2/100/scripts
date '+%d/%m/%y %X'
DATE='16/02/18 10:08:50 AM'
awk -FER2 '{print length($0)-length($NF)+2}'
echo /usr/sap/ER2/interfaces/ER2/100/scripts
SIDPOS=29
expr substr /usr/sap/ER2/interfaces/ER2/100/scripts 29 3
CLIENT=100
BASEPATH=/usr/sap/ER2/interfaces/ER2/100
date +%Y%m
LOGFILE=/usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
date '+%d/%m/%y %X'
echo '16/02/18 10:08:50 AM - Start process..'
1>> /usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
[ -d /usr/sap/ER2/interfaces/ER2/100/out/ -a -d /usr/sap/ER2/interfaces/ER2/100/out/GIS/ ]
[ 2 -ge 0 ]
find /usr/sap/ER2/interfaces/ER2/100/out/ -maxdepth 1 -name '*SEPA_PAIN*' -type f -mmin +2
echo '                       Moving file /usr/sap/ER2/interfaces/ER2/100/out/SEPA_PAIN03_5539.xml from /usr/sap/ER2/interfaces
1>> /usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
mv /usr/sap/ER2/interfaces/ER2/100/out/SEPA_PAIN03_5539.xml /usr/sap/ER2/interfaces/ER2/100/out/GIS/
date '+%d/%m/%y %X'
echo '16/02/18 10:08:50 AM - End process..'
1>> /usr/sap/ER2/interfaces/ER2/100/log/movefile_201802.log
[ Y '==' Y ]
Everything seems normal to me and, in this case, it worked. I think the trouble here is going to be catching the failure as it doesn't always happen (obviously..)

It's almost like the find does things in stages (like find the files older than 2 mins and then run themask check against them). Is this possible?
 
Old 02-21-2018, 04:41 AM   #7
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
If the find failure concerns the filenames, there are not many possible sources of trouble.
It's about what ${FILEMASK} contains, so I will troubleshoot how its value is assigned, put some echo ${FILEMASK} trough the script
Of course it helps to use same conditions as when the script fails (not when it's working)
 
Old 02-21-2018, 07:07 AM   #8
harboa
LQ Newbie
 
Registered: Feb 2018
Posts: 14

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by keefaz View Post
If the find failure concerns the filenames, there are not many possible sources of trouble.
It's about what ${FILEMASK} contains, so I will troubleshoot how its value is assigned, put some echo ${FILEMASK} trough the script
Of course it helps to use same conditions as when the script fails (not when it's working)
The FILEMASK is not changed at any point in the script and contains the correct value at time of the find execution (in this case "*SEPA_PAIN*").

This is an intermittent issue so we cannot duplicate whenever we want. The directory where we are searching for the files is dynamic (so files can be added and/or removed while this script is running) so I was wondering whether the find is creating a list of the files (internally) and then trying to process them and finding that one of the files in it's list is no longer there. Is this possible?
 
Old 02-21-2018, 08:27 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,899

Rep: Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318
I would try find without the for loop. use find -D (see man page about it) if you wish to debug find (=to understand how it works).

Yes that is possible find found a file but has been deleted before processing.
 
Old 02-21-2018, 08:59 AM   #10
harboa
LQ Newbie
 
Registered: Feb 2018
Posts: 14

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
I would try find without the for loop. use find -D (see man page about it) if you wish to debug find (=to understand how it works).

Yes that is possible find found a file but has been deleted before processing.
I suppose a better phrasing of my question would be - would find list all the files internally and then try and action the parameters against that list. The thing I find hard to understand is why, when I have given the filename mask to use (in this case '*SEPA_PAIN*'), does it complain about a file that it should not even be looking at (ie. the filename does not contain the filemask specified)

I have tried to use the -D option but I'm afraid I'm left non the wiser after viewing the output - it didn't make a lot of sense to me..!
 
Old 02-21-2018, 09:33 AM   #11
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,780

Rep: Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213Reputation: 2213
I've looked at the execution of find with ltrace, and I see that when it processes a directory it first reads and saves all the names unconditionally. Then, taking the names one at a time, it performs all tests on each name before moving on to the next.

By any chance do any of the names in the directory contain any embedded shell meta-characters? Any such name returned by find is going to get expanded by the shell since pathname expansion occurs after command substitution. Combine that with some embedded whitespace characters in the names and your "for ..." loop could end up doing almost anything. For example, an unlikely name "EDI* ER1_100" could satisfy the name test in your find command and would result in processing all files with names that start with "EDI". It's generally a lot safer to let find directly invoke the processing (with "-exec") rather than letting a shell process the names first.
 
1 members found this post helpful.
Old 02-21-2018, 10:12 AM   #12
harboa
LQ Newbie
 
Registered: Feb 2018
Posts: 14

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rknichols View Post
I've looked at the execution of find with ltrace, and I see that when it processes a directory it first reads and saves all the names unconditionally. Then, taking the names one at a time, it performs all tests on each name before moving on to the next.

By any chance do any of the names in the directory contain any embedded shell meta-characters? Any such name returned by find is going to get expanded by the shell since pathname expansion occurs after command substitution. Combine that with some embedded whitespace characters in the names and your "for ..." loop could end up doing almost anything. For example, an unlikely name "EDI* ER1_100" could satisfy the name test in your find command and would result in processing all files with names that start with "EDI". It's generally a lot safer to let find directly invoke the processing (with "-exec") rather than letting a shell process the names first.
I thought it might be doing something like this but had no idea how to prove it..! So thanks for that - it explains a lot.

None of the names should be "funny" - they are all coming out from SAP so will be fairly standard creation - nothing manual!

From "-exec" perspective, hadn't really thought about that. So, excuse any errors in this attempt, would it be something like this (based on original find)..?

Code:
find ${SOURCEDIR} -maxdepth 1 -name "${FILEMASK}" -type f -mmin +${FILEMIN} -exec mv {} ${NewPath} \;
Is that about right?
 
Old 02-21-2018, 10:49 AM   #13
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
Maybe error comes not from find but from following commands in the loop

What is the exact evidence that proves that find returns wrong files in the script?
 
Old 02-21-2018, 10:56 AM   #14
harboa
LQ Newbie
 
Registered: Feb 2018
Posts: 14

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by keefaz View Post
Maybe error comes not from find but from following commands in the loop

What is the exact evidence that proves that find returns wrong files in the script?
As I said in my original post..

For example, looking for a filename "*ER1_100*", it finds "EDI1160_Z_SHP_OBDLV_SAVE_REPLICA_22648345"

The error we got in the script was "file not found" as the "EDI1160" mentioned had already been moved to another directory bu another script and, in any case, wasn't a file that I wanted to perform any actions on.
 
Old 02-21-2018, 11:21 AM   #15
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,899

Rep: Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318
As it was already explained by rknichols find first reads the directory and next it processes this list. It is a common problem if you modifies the directory during that process. find may fail just because it cannot follow that change. For example find still thinks the first entry in a directory matches, but in the meantime you removed that file from that dir, but a new one was created - so find will print the first file, which is now incorrect. I don't know the internals of find, but that is what I can imagine (or something similar).
But I have other comments too, which may help:
for i in $(find ... )
is not a good construct, because find is a loop itself, putting the result in another loop is superfluous. One thing which will definitely happen: find will run in a new shell and that for loop will not be started before the find was completed.
So find starts to work on a directory and for will process the result only after the find completed. and during that time the content of that dir was changed. This is definitely unsafe. Better to use:
find ..... -exec script.sh {} \;
where this script is the same as the main part of your loop (every file will be passed one by one. In this case the script will start earlier (do not need to wait), but it is still unsafe.
If you want to be even much faster you may need to implement that find in python/perl whatever, which may act immediately in case of match (but you cannot completely eliminate this race condition this way).
Also you may want to drop find completely, because ls *MASK* is simply much faster (you may check other attributes later). Although this construct is still unsafe.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using 'xargs' command to copy files returns error? Simon_zhu Linux - Newbie 9 06-11-2015 04:56 PM
[SOLVED] How to find files that contain one string, but don't contain another. PatrickDickey Linux - Newbie 2 09-11-2011 06:00 AM
find found files that don't exist? mocax Linux - Newbie 3 12-08-2010 06:13 AM
I don't get find command ele003 Linux - General 7 04-01-2010 06:37 PM
Find/grep/wc command to find matching files, print filename and word count dbasch Linux - Newbie 10 09-14-2009 05:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration