LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-25-2007, 03:23 PM   #1
KevinAlaska
Member
 
Registered: May 2004
Location: Alaska, USA
Distribution: Fedora Core 4 - RedHat
Posts: 42

Rep: Reputation: 15
Question I need a command that can delete the following...


Hi everyone and thank you for reading my post.

I have been importing all my photos from about 5 different backups. I am consolidating all my photos so I don't miss a single photo. The problem with this that there are lots of duplicate files being imported. The files are imported into folders by date taken. (ie /home/myname/Photos/<year>/<month>/<day>/<photos> so for example /home/myname/Photos/2007/05/25/<photos>).

So I have about 30,000 photos in there and about 60 percent are probably duplicates that get renamed like the following by F-Stop when they are imported: 'photo123.jpg' if it already exists would be renamed with the -1 at the end of it like 'photo123-1.jpg then the next one would be 'photo123-2.jpg' etc etc.

I had this command given to me but I can't find it for the life of me. But the good news is my real desks desktop is now clean in the process of looking for it.

Well I have I have not forgotten anything here. Thank you for all the help.

Sincerely,

Kevin in Alaska
 
Old 05-25-2007, 03:34 PM   #2
Emerson
LQ Guru
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~arch
Posts: 5,959

Rep: Reputation: Disabled
GImageView can find duplicates. There is also Dupefinder for QT and CLI. And I'm sure there are many more.
 
Old 05-25-2007, 03:51 PM   #3
KevinAlaska
Member
 
Registered: May 2004
Location: Alaska, USA
Distribution: Fedora Core 4 - RedHat
Posts: 42

Original Poster
Rep: Reputation: 15
Thank you for the info...

I am very to to linux and not very keen on installing stuff thats listed in the "adapt installer" or by "automatix2" thats also installed.

I am currently running Kubuntu feisty i386 build. Do you know if there is anything already installed in this distrobution that just needs to activated or downloaded via the programs listed above?

Also I am not sure what QT is and I would imagine CLI is 'command line interface'?

Thank you again.

Kevin in Alaska
 
Old 05-25-2007, 04:26 PM   #4
pljvaldez
LQ Guru
 
Registered: Dec 2005
Location: Somewhere on the String
Distribution: Debian Wheezy (x86)
Posts: 6,094

Rep: Reputation: 271Reputation: 271Reputation: 271
http://ubuntu.wordpress.com/2005/10/...pies-of-files/
 
Old 05-25-2007, 05:05 PM   #5
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
If they are all under the same base directory, but different subdirectories, I would use the find command with the -exec md5sum '{}' \; option to calculate the md5sum of the files. A less reliable but potentially faster way could be to use normal checksums instead of calculating the md5sums.
Code:
find photodir/ -type f -iname "*.jpg" -exec md5sum '{}' \; >photolist
sort photolist >sorted-photo-list
uniq -w32 -D sorted-photo-list >dupelist
The -w32 option for uniq limits the test to the md5sum column. The -D option lists non-unique entries. Entries with the same md5sum are identical. Their names and locations may differ.

You could further process the list if you wanted to group the lists.
Code:
# Note: assumes that there are not tens of thousands of duplicates.  That would overflow bash with to many arguments in the for loop.
# Get a list of uniq md5sums in the dupelist by themselves
cut -d' ' -f1 dupelist  | uniq  >m5dupes

# cycle through the list and output all of the dupes adding an empty line between them
for md5item in $(cat m5dupes); do
grep $md5item 
echo '------------'
done
You might want to scan through the coreutils info pages. There are a number of utilities that come in very handy in handling text files and lists. Uniq, sort, comm, grep and sed work together very nicely. I haven't really learned awk programming yet because I haven't needed to use it that often, because piping together these commands often solves the problem. But add the "Gawk: Effective AWK programming" info manual to the coreutils manual.

One command I find very handy at work is using "comm". It compares two sorted lists and prints out three columns: 1) uniq in file1 2) uniq in file2 3) common in both. You can turn off any column you want. Sed is often used to massage the items in a list, such as removing trailing spaces, before using grep or comm.

---

Note: I was in a hurry and haven't tested these lines of code. So some testing may be needed before you use them.

Last edited by jschiwal; 05-25-2007 at 05:07 PM.
 
Old 05-25-2007, 11:12 PM   #6
chadwick
Member
 
Registered: Apr 2005
Location: At the 100th Meridian where the great plains begin
Distribution: Debian Testing on T60 laptop
Posts: 105

Rep: Reputation: 16
Here's how I'd do it.

1) First make a backup to make sure I don't hit the wrong key by accident and delete the wrong ones:
cd photodir/..
mkdir backup/
cp -r photodir/ backup/

where photodir/ would be replaced with /home/myname/Photos in your case.

2) Double check to make sure it worked

3) Then since the file names all use the same format and you can select out the ones you want to get rid of by the hyphen plus an extra character, use wildcards:

cd photodir/..
rm -f photodir/????/??/??/photo???-?.jpg

or if the naming isn't that consistent you could do

rm -f photodir/????/??/??/*-?.jpg

4) Double check to make sure everything's okay before you start doing anything to the backup.

5) Remember you can never be too careful when using rm -f

Believe me, I nonetheless wish I could have come up with something like jschiwal's. jschiwal's doesn't assume for example that there's no file you want to keep that has a hypen followed by one character by .jpg. jschiwal's has the added safety of being certain that two files are identical before deleting one of them, but is harder to understand so perhaps easier to make a mistake.

Last edited by chadwick; 05-26-2007 at 12:48 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to delete all files in one command for all user packets Linux - Newbie 4 04-16-2007 08:56 PM
I need the DELETE console command Sabinou Linux - Newbie 6 04-13-2006 01:28 PM
Delete command lgarcia3 Linux - Newbie 1 01-13-2005 06:58 AM
ha ha i forgot command how to delete file tarak4u Linux - General 2 05-16-2002 01:14 AM
command to delete? lax2sman Linux - General 3 02-11-2002 05:32 PM


All times are GMT -5. The time now is 12:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration