Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I managed to restore about 5000 photos from my corrupted hard drive using photorec. This is great news! However, I've ended up with multiple copies of the same photos, sometimes 5 or 6 copies.
I would like to know if there is a command that I can use to locate duplicate copies, then perform some function, such as move only one of the copies to another location?
The chance of two photos having the same md5 is quite astronomically small, so you can assume that two photos are the same iff their md5 sums are the same. The rest is just a scripting exercise. This will take a wee while to run on 5000 photos but you can speed it up if you refine it a bit. Or you could just resign yourself to leaving your machine running overnight.
Code:
for photo in $(find /the/place/I/put/them)
do
unique_id="$(md5 < "$photo")"
echo "${unique_id:0:32} $photo" > database
done
sort -u -k1,1 database > a_list_of_unique_photos
while read line
do
photo="${line:33}"
cp "$photo" /a/directory/of/my/choice
done < a_list_of_unique_photos
Speedups would include e.g. reading the only the first 200 bytes from each file using dd. Depending on what version of md5 you have you can simplify the code as well. Good luck!
Ok, I'm definitely making forward progress on this. I've md5'd all of my images into a list, and have sorted.
Now I'm trying to figure out how the -k operation works, or rather what it's function is. If I used sort -u -k1,32 would sort use only the first 32 characters of each line to determine uniqueness?
Edit: Ok, I think I figured it out. -kx,y ... x is the field number (as separated by spaces), y is the character position within the field. Right?
P.S. It only took my computer about 1 minute to md5 all 2.3G of images.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.