Script to remove duplicate jpg files

ski_phreak · 05-25-2010, 07:47 PM

Thanks y'all for the great script and explanation. This helped a lot in my own project. I thought I'd share the efforts.

The project is this: I've got lots of duplicate JPGs from all the family members who've named the same photo with different names. Since md5sum generates a "fingerprint" based on the file contents, not the name, I want to use the md5sum of each jpg to uniquely name each photo and also remove exact duplicates.

It has the following flaws:
0) it doesn't handle certain non-alphanumerics
1) it keeps both photo-shopped and unaltered photos (different md5s)
2) it (currently) doesn't preserve descriptive filenames.

(For me, removal of duplicates is more important than keeping the filenames. I may change that to concatenate the md5 and the filename.)

Please note that the commented "rename" command should be used to strip non-aphanumerics from the file names, and the script should be launched with the commented "find" command.

Code:

#!/bin/bash
### prepare files for renaming by removing alphanumerics
# rename -vf 's/[^a-zA-Z0-9]//g' *.jpg   

# then launch this script find . -type f -maxdepth 1 -name "*jpg" -exec ./md5rename.sh {} \;


if [ "$1x" != "x" ] ; then
  filename=$1
  new_filename=`(/usr/bin/md5sum "$filename") | cut -f 1 -d ' '`
  jpg_filename="${new_filename}.jpg"
  echo "mv ${filename} $new_filename"
  mv $filename $jpg_filename
fi

pixellany · 05-26-2010, 08:22 AM

Welcome to LQ!!

Please don't jump into 6-year old threads---I moved this into its own thread