Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Does anyone know of reliable software for finding duplicate files?
I'm pretty sure Freshmeat and Sourceforge would have some.
I was thinking of writing a script to md5sum all files in a directory, putting the results in a database then sorting out dupes that way. Would this work?
Yes, partially: say you have the same album but compressed differently. That'll gen different sums. You'll still need the meta data, from like a cli tool like mp3info, to compare. Checking MD5sum could be the first and non-interactive iteration, and using meta information the final and interactive one, unless your regex-fu is kinda elite :-]
However, previous solution is very inefficient because it hash all files in your data storage (hard disk). To compare the files which their size is different is no necessary. So I improved this problem. First sort the file list by file size, and just compare the files which have the same file size.
echo "#!/bin/sh" > $OUTF;
find "$1" -type f -size +42k -printf %s\ -print |
sort -nr | uniq -w 9 -d --all-repeated | cut -f2- -d" " |
while read i;
Xsize2=$(stat -c%s "$i")
if [ "$Xsize1" == "$Xsize2" ]; then
cmp --silent "$XfilepathNname1" "$XfilepathNname2"
if [ "$?" == "0" ]; then
if [ "$Xflag" == "0" ]; then
echo "#rm \"$XfilepathNname1\""
echo "#rm \"$XfilepathNname2\""
Xflag=$(($Xflag + 1))
if [ "$Xflag" != "0" ]; then
echo "" >> $OUTF
Xcounter=$(($Xcounter + 1))
done >> $OUTF
echo "exit 0;" >> $OUTF
chmod a+x $OUTF
The last script is very efficient. It should be much efficient than NoClone, FSlint, and many duplicate file finder that I ever used. However, it still exists a little problem. The mistake occurs if some hard symbol links in your hard disk. The last script will find duplicate files if two hard symbol links point to the same file. Fortunately, I don't use hard symbol link for my data storage. Could someone improve this problem?