Sorting pictures, removing duplicates
Tags duplicates, picture, sort
EDIT 2012-02-03/2014-10-11:
In Debian I just discovered "duff". A tool to find duplicate files-.-
Unfortunately it doesn't work with symbolic links to files.
---
All files with their md5sum
Lists with only md5sum, once with duplicates removed (should be improved to handle double spaces in filenames).
md5sum of duplicates
List of all duplicate files (sorted by their md5sum)
In Debian I just discovered "duff". A tool to find duplicate files-.-
Unfortunately it doesn't work with symbolic links to files.
---
All files with their md5sum
Code:
PICS="pics" find . -type f -exec md5sum '{}' ';' > $PICS
Code:
cat $PICS | sed -r "s|^(.*) .*|\1|"| sort > ${PICS}.md5sums.sorted cat $PICS | sed -r "s|^(.*) .*|\1|"| sort -u > ${PICS}.md5sums.sorted.uniq
Code:
diff ${PICS}.md5sums.sorted ${PICS}.md5sums.sorted.uniq | grep \< | sed "s|< ||" > ${PICS}.md5sums.duplicates
Code:
grep -e "$(cat ${PICS}.md5sums.duplicates)" $PICS | sort > ${PICS}.duplicates
Total Comments 2
Comments
-
nice tip... I never would have thought this would be a problem... but then, I treat my pictures much like my music files and put them all in one directory... If I have duplicate scenes, it's because they are cropped differently or re-sized.
What would be nice is an optical recognition program that would have a tolerance to avoid the pitfalls of image compression.Posted 11-22-2010 at 03:02 PM by lumak -
I use md5sum for comparing the pictures, so I can only tell which files are 100% identical. What you suggest would require an application that detects the contents of pictures. I assume that would need a lot computing power.
Posted 11-22-2010 at 03:45 PM by jere21