LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-19-2019, 09:33 AM   #1
qombi
Member
 
Registered: Jan 2016
Posts: 34

Rep: Reputation: Disabled
Question Find missing filenames from a list recursively


Hello,

I have researched how to find missing filenames from a list recursively and I am close but not there completely.

I have files on one server that I need to verify exist on another server. I first made a list of the files with
Code:
ls > filelist
on one server. Copied that list to the other server for auditing.

I then executed script on server to be audited:

Code:
while IFS= read -r name; do
  [ -n "$(find / -name "$name" -print | head -n 1)" ] || printf '%s\n' "$name"
done < filelist
This printed out a list of missing files to the screen which was partially correct. The issue is files with special characters and spaces in their names. They show as missing. If I manually change files such as
Code:
[this] file name
to
Code:
\[this\]\ file\ name
, then it works properly. I tried
Code:
' '
or
Code:
" "
but that didn't seem to work.

How do I print a list of filenames from the originating server that contain the backslashes in files for special characters and spaces? I found
Code:
ls -b
but that only shows
Code:
\
for the spaces not special characters.

Or is there a better way of doing this all together? Thanks
 
Old 12-19-2019, 11:02 AM   #2
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,601

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546
Try ls -Q ?
 
Old 12-19-2019, 11:49 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
My first question is, whay are all the required files on the original machine in a single directory but the server has them possibly anywhere?

Spaces are not an issue, but of course [] will be expanded by the shell to look for single characters.
You could try something like:
Code:
[[ -n "$(find progs/ -name "$(sed 's/[[:punct:]]/\\&/g' <<<"$"name)" | head -n 1)" ]] || printf '%s\n' "$name"
 
Old 12-19-2019, 12:08 PM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,850

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
rsync can do the job I think.
 
Old 12-19-2019, 02:51 PM   #5
qombi
Member
 
Registered: Jan 2016
Posts: 34

Original Poster
Rep: Reputation: Disabled
Thanks for the replies. The reason they are in multiple locations on the other server is they are categorized. I am just looking for a quick way to determine if they all made it to the archive server.
 
Old 12-19-2019, 02:59 PM   #6
qombi
Member
 
Registered: Jan 2016
Posts: 34

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
My first question is, whay are all the required files on the original machine in a single directory but the server has them possibly anywhere?

Spaces are not an issue, but of course [] will be expanded by the shell to look for single characters.
You could try something like:
Code:
[[ -n "$(find progs/ -name "$(sed 's/[[:punct:]]/\\&/g' <<<"$"name)" | head -n 1)" ]] || printf '%s\n' "$name"
I tried this above:

Code:
[[ -n "$(find / -name "$(sed 's/[[:punct:]]/\\&/g' <<<"$"name)" | head -n 1)" ]] || printf '%s\n' "$name" < filelist
Unfortunately it did not report file names from the list that didn't exist on the server.
 
Old 12-19-2019, 03:03 PM   #7
qombi
Member
 
Registered: Jan 2016
Posts: 34

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by boughtonp View Post
Try ls -Q ?
I tried the
Code:
ls -Q
option, unfortunately didn't work with the find script.
Quote:
" "
didn't seem to work
 
Old 12-19-2019, 03:04 PM   #8
qombi
Member
 
Registered: Jan 2016
Posts: 34

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
rsync can do the job I think.
I'm not sure how rsync could help me in this situation with the directory structure on the other server not matching the source server.
 
Old 12-19-2019, 05:38 PM   #9
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,601

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546
Ok, the way you wrote this bit was confusing:

Quote:
Originally Posted by qombi View Post
I tried
Code:

' '

or
Code:

" "

but that didn't seem to work.

Writing something like "I tried wrapping with single and double quotes but find wouldn't locate them." would have been way clearer.


The reason that doesn't work is similar to the reason brackets aren't working - as per "man find", the -name argument accepts a pattern, where '*', '?', and '[]' are metacharacters. (And the quotes were being treated as literals.)

Since there appears to be no way to pass in a fixed string instead of a pattern, those characters need escaping, so your find command probably wants to be:
Code:
find / -name "$(sed 's/[][*?\\]/\\&/g' <<<""$name"")" -print
Note that the issue with the previous sed was actually the "$"name part -i.e. [[&#x3A;punct:]] would work, but no point escaping more characters than necessary.

Writing something like "I tried wrapping with single and double quotes but find wouldn't locate them." would have been way clearer.


The reason that doesn't work is similar to the reason brackets aren't working - as per "man find", the -name argument accepts a pattern, where '*', '?', and '[]' are metacharacters. (And the quotes were being treated as literals.)

Since there appears to be no way to pass in a fixed string instead of a pattern, those characters need escaping, so your find command probably wants to be:
Code:
find / -name "$(sed 's/[][*?\\]/\\&/g' <<<""$name"")" -print
Note that the issue with the previous sed was actually the "$"name part - i.e. [[:punct:]] would work, but no point escaping more characters than necessary.


Last edited by boughtonp; 12-19-2019 at 05:43 PM.
 
Old 12-19-2019, 06:49 PM   #10
qombi
Member
 
Registered: Jan 2016
Posts: 34

Original Poster
Rep: Reputation: Disabled
Thanks for the reply and patience. I apologize for my difficult to interpret post. I attempted to feed my list into the command you provided show below:
Code:
find / -name "$(sed 's/[][*?\\]/\\&/g' <<<""$name"")" -print < filelist
. Unfortunately no luck, it did not display any results at all. I know some of the file names in the list are not present on the server. I must be still missing something.
 
Old 12-19-2019, 07:30 PM   #11
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,601

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546
You still need to loop through the generated filelist as in your first post...

Code:
while IFS= read -r name; do
	find / -name "$(sed 's/[][*?\\]/\\&/g' <<<""$name"")"
done < filelist
or:
Code:
while IFS= read -r name; do
  [ -n "$(find / -name "$(sed 's/[][*?\\]/\\&/g' <<<""$name"")")" ] || echo "$name"
done < filelist
 
Old 12-19-2019, 09:31 PM   #12
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
I created this a while back, it created two list of files, compared the list from what was processed, and what was not. to give me what is left to do.
Code:
#!/usr/bin/env bash

#Friday, July 5, 2019


#read -p "Enter Path to Directory -> " path1
#[[ -z $path1 ]] && exit
#read -p "Enter Path to rar/zip Files -> " path2
#[[ -z "$path2" ]] && exit

#paths to the two dir to read out of. 
filesDone=
tarstoCheck=


save1=$HOME/bin/dir1
save2=$HOME/bin/dir2
save3=$HOME/bin/ProcessTheseFiles
save4=$HOME/bin/Files2Process
#
[[ -f "$save1" ]] && { rm $save1 ; }
[[ -f "$save2" ]] && { rm $save2 ; }
[[ -f "$save3" ]] && { rm $save3 ; }
[[ -f "$save4" ]] && { rm $save4 ; }
#get directories list1

CreateListFilesDone()
{
	while read f
	do
		dir1=${f##*/}
		dir1=${dir1#*-}
		dir1=${dir1%-*}
		echo $dir1

		echo "$dir1" >>  $save1
	done < <(find "$filesDone" -type d )
}

CreateListStorageOfRars()
{
	while read f
	do
		dir2=${f##*/}
		dir2=${dir2##*-}
		dir2=${dir2%.*}
		echo $dir2
		echo $dir2 >>  $save2
	done < <(find "$tarstoCheck" \( -type f -iname "*.rar" -o -type f -iname "*.zip" \) )
}

FindFilesNotProccesed1()
{
	while read dirs
	do
		echo "$dirs"
		echo   "$dirs" >> "$save3"
	done < <(comm -13 <(sort $save1) <(sort $save2))
}
FindFilesNotProccesed2()
{
	while read dirs
	do
		echo "$dirs"
		echo   "$dirs" >> "$save3"
	done < <(comm -23 <(sort $save1) <(sort $save2))
}


cat "$save3" | sort > "$save4"

mapfile -t CkNums < "$save4"
echo
echo "${CkNums[@]}"
echo;echo
printf "%s\n" "${CkNum[@]}"

echo;echo
CreateListFilesDone
CreateListStorageOfRars
FindFilesNotProccesed1
FindFilesNotProccesed2

exit

[[ -f "$save1" ]] && { rm $save1 ; }
[[ -f "$save2" ]] && { rm $save2 ; }
[[ -f "$save2" ]] && { rm $save3 ; }
[[ -f "$save4" ]] && { rm $save4 ; }
uses comm to get missing files between two files of 'like' strings. you should be able to use it direct one to the one server path to create the one file, then the other to another to create its list of files, then it spits it out in the end.

just mod it to get file names nd creates a list of the two seperate servers in the first two functions, then mod the other two to work. as you can see i did that in July so my memory of if is scratchy.

But i know this would work. just skim / read through these two links and hopefully you'll see what this script is basically doing to get missing files between two list of files.

https://linux.die.net/man/1/comm
https://www.geeksforgeeks.org/comm-c...with-examples/

Last edited by BW-userx; 12-19-2019 at 09:37 PM.
 
1 members found this post helpful.
Old 12-20-2019, 02:05 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
The extra quotes around $name are not needed as $() allows for the nesting of itself and quotes. Also, I tested on my computer without issue, so unless you have other types of filenames
I do not see why it would not return what you needed.
Perhaps you could give an example of a filename that was not returned so we could check?
 
Old 12-20-2019, 05:45 AM   #14
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,794

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
The quotes around $name are needed but the two quotes in sequence inhibit it.
It must be
Code:
while IFS= read -r name; do
  [ -n "$(find / -name "$(sed 's/[][*?\\]/\\&/g' <<<"$name")")" ] || echo "$name"
done < filelist
Have an intermediate variable, so it's easier to read/understand/debug
Code:
while IFS= read -r name; do
  # The find interpretes wildcards [ ] * ? and \ in its -name argument; let's escape them
  globname=$( sed 's/[][*?\\]/\\&/g' <<< "$name" )
  [ -n "$( find / -name "$globname" -print -prune )" ] || echo "$name"
done < filelist
The -print -prune is an attempt to make it faster.
Another attempt is | head -1 but behavior of GNU find on "SIGPIPE" is sometimes not nice.

Last edited by MadeInGermany; 12-20-2019 at 05:49 AM.
 
Old 12-20-2019, 08:50 AM   #15
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,601

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546
Quote:
Originally Posted by grail View Post
The extra quotes around $name are not needed as $() allows for the nesting of itself and quotes.
Quote:
Originally Posted by MadeInGermany View Post
The quotes around $name are needed but the two quotes in sequence inhibit it.
Being more awake now, I would agree that $() removes the need to double the quotes, and a single set is usually required/recommended for safety.

However, it's working the same way for all three variations in my test i.e. $name and "$name" and ""$name"" are all giving the expected results.

It also works with """"""$name which is rather odd, so I went and checked the docs on <<< and they state:
Quote:
Originally Posted by Bash Reference
...undergoes tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, and quote removal. Pathname expansion and word splitting are not performed...
So I guess that explains why they all behave the same.


Quote:
Originally Posted by MadeInGermany
The -print -prune is an attempt to make it faster.
My reading of the find manpage is that -print is not necessary, because it's automatically added, unless a conflicting param is used.

With regards to -prune, that raises the question of what directory behaviour should be - currently if there is a directory matching a line in the filelist then it is counted as being present, but perhaps it should explicitly be looking for files, in which case -type f should be added.

Last edited by boughtonp; 12-20-2019 at 08:51 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[Bash] Rename filenames with corresponding filenames Power2All Linux - Software 4 12-02-2009 04:15 AM
How to list all files recursively, in a non-broken list? nyle Linux - Newbie 1 12-16-2008 10:52 PM
List all files and recursively open directories. unreal128 Linux - General 2 07-16-2005 02:06 PM
Can ls recursively list only directories? Vosper Linux - General 3 07-16-2005 03:57 AM
Recursively cleaning up filenames nyk Linux - Software 11 08-05-2004 03:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:52 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration