LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   searching for multiple files (https://www.linuxquestions.org/questions/linux-newbie-8/searching-for-multiple-files-235754/)

ryedunn 09-27-2004 09:02 AM

searching for multiple files
 
What command can I run in order to search for duplicate files within a single directory and the sub directories?

Thank you,
R

meblost 09-27-2004 12:27 PM

How about

$> find . | awk -F / '{print $NF}' | sort | uniq -d

This will tell you which files have the same filename. This version won't tell you where the file is though.

chrism01 09-27-2004 12:28 PM

If you know the name of the file(s) you could use:
Code:

for file in `cat filelist.dat`
do
    find <topdir> -name $file -print  2>/dev/null
done

if you have the names in a file.

meblost 09-27-2004 12:32 PM

This will tell you where the file with the same name is located, but it will take longer


for file in $(find . | awk -F / '{print $NF}' | sort | uniq -d)
do
find . -name $file
done

ryedunn 09-27-2004 03:21 PM

for future refrence
 
To find duplicate files (by content, not name) in a directory tree at /home/user:

find /home/user -type f -exec md5sum {} \; | sort >dups

dups will hold the md5 checksums and names of all the files in the directory tree, sorted by checksum.
All files that have the same contents will be next to each other in dups, since they will have the same checksum.

To compare files in two or more directories, just include all the directories in the find command:

find /usr/src/linux-2.4.20-18.7 /usr/src/linux-2.4.20-19.7 -type f -exec md5sum {} \; | sort >dups

To remove the non-duplicates from a file (or output stream), you can use the uniq command. For example, to see only the lines from dups that have duplicates (matching only the checksums, which are the first 32 characters of each line):

uniq -D -w 32 dups | less

To learn more about sort and uniq, and other text utilities, type:

info textutils


All times are GMT -5. The time now is 07:40 PM.