LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 09-11-2010, 11:45 AM   #1
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 800

Rep: Reputation: 68
Question managing duplicated files


Hi all,

I reckon I waste several tens of gigabytes of storage by holding unwanted copies of personal data and media files over very many partitions. Sometimes they will have identical names, dates and sizes. Other times they may be slightly different in terms of metadata but nonetheless essentially identical in respect of content. It is becoming a nightmare. Any suggestions for how best to get rid of these unwanted files?

Cheers,

CC.
 
Old 09-11-2010, 12:00 PM   #2
EricTRA
Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291
Hello,

Lucky for you there exist software to find duplicate files on your system
fslint
fdupes
or you can write your own script like somebody did a long time ago as pointed out here.

Kind regards,

Eric
 
Old 09-11-2010, 12:01 PM   #3
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

There are some tools out there that can help you. Do a search for find duplicate files linux.

Most seem to be based on calculating md5 sums and comparing these. fdupes looks promising (fdupes man page).

Hope this helps.
 
Old 09-11-2010, 12:45 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by Completely Clueless View Post
Hi all,

I reckon I waste several tens of gigabytes of storage by holding unwanted copies of personal data and media files over very many partitions. Sometimes they will have identical names, dates and sizes. Other times they may be slightly different in terms of metadata but nonetheless essentially identical in respect of content. It is becoming a nightmare. Any suggestions for how best to get rid of these unwanted files?

Cheers,

CC.
the common ways to identify dupes are md5,sha, file sizes and content(diff). You can download already available tools like others have mentioned, or just write your own. Here's a snippet in Python, using sha256
Code:
import hashlib
import os
from collections import defaultdict
sha256=defaultdict(str)
def checksum(filename):
    ''' function to get sha256 hash of file '''
    d = hashlib.sha256()
    try:
        d.update(open(filename).read())
    except Exception,e:
        print e
    else:
        return d.hexdigest()

root="/home"
path = os.path.join(root,"path1","path2")
for r,d,f in os.walk(path):
    for files in f:
        filename=os.path.join(r,files)
        if checksum(filename) in sha256:
           print "Possible duplicate: %s with %s" % (filename , sha256[checksum(filename)] )
           #print "Delete here..."
        else:
           sha256[checksum(filename)] = filename
 
Old 09-11-2010, 12:57 PM   #5
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 800

Original Poster
Rep: Reputation: 68
Thanks, guys. I see there is a lot of reliance here on hash comparisons, which will not work for many of my worst offenders, which are song files which I myself may have ripped, or else dowloaded from Itunes, or else re-mastered from (eek!) vinyl (giving my age away). Looks like I shall need more than one tool...
 
Old 09-11-2010, 01:09 PM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Checking for possible duplicate names is a relative simple find command: find . -name "*" -exec basename '{}' \; | sort > file.list

And yes, I guess you need to use more then one tool. I would run the find command after the other tool(s) have done their job.

Hope this helps.
 
Old 09-11-2010, 01:14 PM   #7
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 800

Original Poster
Rep: Reputation: 68
Wink

Quote:
Originally Posted by druuna View Post
Hi,

Checking for possible duplicate names is a relative simple find command: find . -name "*" -exec basename '{}' \; | sort > file.list

And yes, I guess you need to use more then one tool. I would run the find command after the other tool(s) have done their job.

Hope this helps.
If I were just checking for dup names I think I would not bother with the console and just use the find feature in Nautilus or Konqueror.

The way things are going, this has got to be an increasing problem so I guess some kind soul will come up with an all-in-one solution fairly soon. I certainly hope so, anyway.
 
Old 09-11-2010, 01:22 PM   #8
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,
Quote:
Originally Posted by Completely Clueless View Post
If I were just checking for dup names I think I would not bother with the console and just use the find feature in Nautilus or Konqueror.
I know you are not just checking for duplicate file names. It (the find command) was meant as an addition to, for example, fdupes. fdupes/... checks using md5/sha, file sizes and content. What is left can be done with find (or Konqueror/Nautilus).

You can always write a program yourself, would make you and others happy
 
Old 09-11-2010, 02:06 PM   #9
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 800

Original Poster
Rep: Reputation: 68
Quote:
Originally Posted by druuna View Post
You can always write a program yourself, would make you and others happy
The problem is, it is so infrequently I ever need to write a program that it takes soooooo long. I have to revise the language every time before I begin. Thank God I only know C.

I see you have disabled your rep, Druuna. Are you making a personal stand against the fundamental iniquity of the axing of the "thanks" system?
 
Old 09-11-2010, 02:28 PM   #10
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,
Quote:
Originally Posted by Completely Clueless View Post
I see you have disabled your rep, Druuna. Are you making a personal stand against the fundamental iniquity of the axing of the "thanks" system?
I don't see the point in this, or any other rep system. It is too ambiguous to be of any real value. I won't go into it here, but have a look at this thread (LQ Reputation System) and one of my replies.

I'm not going to fight this feature, too many people here seem to be happy with it and the overall LQ experience is what counts in the end. One can still click on my the rep icon, but I won't notice it in any way. I'd rather see that people leave a follow-up message, much more personal.
 
Old 09-11-2010, 03:56 PM   #11
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 800

Original Poster
Rep: Reputation: 68
Unhappy

Quote:
Originally Posted by druuna View Post
One can still click on my the rep icon, but I won't notice it in any way. I'd rather see that people leave a follow-up message, much more personal.
It is not quite that simple. As a former generous "thanks-giver", I cannot rep-up anyone in this thread for their suggestions simply because you have disabled your rep and it would consequently not be fair, since you would be one of those I would have "up-repped."
I guess this is the law of unintended consequences!
 
Old 09-11-2010, 04:09 PM   #12
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Quote:
Originally Posted by Completely Clueless View Post
It is not quite that simple. As a former generous "thanks-giver", I cannot rep-up anyone in this thread for their suggestions simply because you have disabled your rep and it would consequently not be fair, since you would be one of those I would have "up-repped."
I guess this is the law of unintended consequences!
Both the rep icon and the helpful yes/no are present and one can click on them if one wants to. Not pressing them is your choice. I do agree and realize that me opting out will probably deter some(?) people from pushing either of the icons. So be it.

Like I said, I rather have a follow-up message telling me what they did (not) like about my replies.

In a way you did just that: Thanks

Last edited by druuna; 09-11-2010 at 04:52 PM. Reason: Cleaned up my spelling/grammar.
 
Old 09-11-2010, 09:18 PM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by Completely Clueless View Post
which are song files which I myself may have ripped, or else dowloaded from Itunes, or else re-mastered from (eek!) vinyl (giving my age away). Looks like I shall need more than one tool...
then you will a tool to find the song header information, store them and compare.
 
Old 09-12-2010, 04:04 PM   #14
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
I use fdupes to remove duplicate files. But, if things are really messy, I save what I need, and wipe the drive ... it's the easiest way.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Managing a Website - Uploading Files BionicJoe Linux - Software 1 02-11-2005 03:37 AM
Managing saved html files in easily queryable fashion? bramadams Linux - Software 2 08-01-2004 07:32 AM
Managing Music Files. liguorir Linux - Software 2 06-15-2003 12:03 AM
managing pictures/video files. liguorir Linux - Software 1 06-14-2003 10:24 PM
Software suggestions (managing files) neurotra Linux - Software 0 01-23-2003 07:01 AM


All times are GMT -5. The time now is 09:08 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration