LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Blogs > jere21
User Name
Password

Notices


Rate this Entry

Sorting pictures, removing duplicates

Posted 11-20-2010 at 08:37 AM by jere21
Updated 10-11-2014 at 10:02 AM by jere21

EDIT 2012-02-03/2014-10-11:
In Debian I just discovered "duff". A tool to find duplicate files-.-
Unfortunately it doesn't work with symbolic links to files.
---



All files with their md5sum
Code:
PICS="pics"
find . -type f -exec md5sum '{}' ';' > $PICS
Lists with only md5sum, once with duplicates removed (should be improved to handle double spaces in filenames).
Code:
cat $PICS | sed -r "s|^(.*)  .*|\1|"| sort > ${PICS}.md5sums.sorted
cat $PICS | sed -r "s|^(.*)  .*|\1|"| sort -u > ${PICS}.md5sums.sorted.uniq
md5sum of duplicates
Code:
diff ${PICS}.md5sums.sorted ${PICS}.md5sums.sorted.uniq | grep \< | sed "s|< ||" > ${PICS}.md5sums.duplicates
List of all duplicate files (sorted by their md5sum)
Code:
grep -e "$(cat ${PICS}.md5sums.duplicates)" $PICS | sort > ${PICS}.duplicates
Posted in Uncategorized
Views 7019 Comments 2
« Prev     Main     Next »
Total Comments 2

Comments

  1. Old Comment
    nice tip... I never would have thought this would be a problem... but then, I treat my pictures much like my music files and put them all in one directory... If I have duplicate scenes, it's because they are cropped differently or re-sized.

    What would be nice is an optical recognition program that would have a tolerance to avoid the pitfalls of image compression.
    Posted 11-22-2010 at 03:02 PM by lumak lumak is offline
  2. Old Comment
    Quote:
    Originally Posted by lumak View Comment
    What would be nice is an optical recognition program that would have a tolerance to avoid the pitfalls of image compression.
    I use md5sum for comparing the pictures, so I can only tell which files are 100% identical. What you suggest would require an application that detects the contents of pictures. I assume that would need a lot computing power.
    Posted 11-22-2010 at 03:45 PM by jere21 jere21 is offline
 

  



All times are GMT -5. The time now is 04:05 PM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration