Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
Does anyone know of reliable software for finding duplicate files?
I'm pretty sure Freshmeat and Sourceforge would have some.
I was thinking of writing a script to md5sum all files in a directory, putting the results in a database then sorting out dupes that way. Would this work?
Yes, partially: say you have the same album but compressed differently. That'll gen different sums. You'll still need the meta data, from like a cli tool like mp3info, to compare. Checking MD5sum could be the first and non-interactive iteration, and using meta information the final and interactive one, unless your regex-fu is kinda elite :-]
However, previous solution is very inefficient because it hash all files in your data storage (hard disk). To compare the files which their size is different is no necessary. So I improved this problem. First sort the file list by file size, and just compare the files which have the same file size.
echo "#!/bin/sh" > $OUTF;
find "$1" -type f -size +42k -printf %s\ -print |
sort -nr | uniq -w 9 -d --all-repeated | cut -f2- -d" " |
while read i;
Xsize2=$(stat -c%s "$i")
if [ "$Xsize1" == "$Xsize2" ]; then
cmp --silent "$XfilepathNname1" "$XfilepathNname2"
if [ "$?" == "0" ]; then
if [ "$Xflag" == "0" ]; then
echo "#rm \"$XfilepathNname1\""
echo "#rm \"$XfilepathNname2\""
Xflag=$(($Xflag + 1))
if [ "$Xflag" != "0" ]; then
echo "" >> $OUTF
Xcounter=$(($Xcounter + 1))
done >> $OUTF
echo "exit 0;" >> $OUTF
chmod a+x $OUTF
The last script is very efficient. It should be much efficient than NoClone, FSlint, and many duplicate file finder that I ever used. However, it still exists a little problem. The mistake occurs if some hard symbol links in your hard disk. The last script will find duplicate files if two hard symbol links point to the same file. Fortunately, I don't use hard symbol link for my data storage. Could someone improve this problem?