LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 12-03-2007, 06:47 AM   #1
edgjerp
Member
 
Registered: Dec 2004
Location: Trondheim, Norway
Distribution: kubuntu 10.04
Posts: 308

Rep: Reputation: 31
clean music collection


how can I use md5 to search for identical files in an mp3 collection? regardless of name. I want the output to show name and location of identical files, (preferably all copies at the same place in the list) while ignoring unique files.
 
Old 12-03-2007, 07:08 AM   #2
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
To make the md5 sums is relatively easy (but potentially time consuming!):
Code:
for song in /path/to/mp3s/*; do
   touch ~/mp3s.md5
   md5sum $song >> ~/mp3s.md5
done
Then you need to sort them, and find duplicates, and then delete the appropriate files.

With that list, man sort and man awk is one way of getting there.

edit: perhaps a find command would be better than the for loop:
Code:
touch ~/mp3s.md5; find /path/to/mp3s -iname "*.mp3" -exec md5sum {} \; >> ~/mp3s.md5

Last edited by pwc101; 12-03-2007 at 07:10 AM.
 
Old 12-03-2007, 11:57 AM   #3
edgjerp
Member
 
Registered: Dec 2004
Location: Trondheim, Norway
Distribution: kubuntu 10.04
Posts: 308

Original Poster
Rep: Reputation: 31
anyone familiar enough with awk to know what kind of expression I need for the next part of the operation?

Last edited by edgjerp; 12-03-2007 at 12:17 PM.
 
Old 12-03-2007, 12:24 PM   #4
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Here's a terribly inefficient way:
Code:
awk '{print $1}' mp3s.md5 | sort | uniq -d > duplicates.list
while read line
   do awk '/'$line'/ {print $0}' mp3s.md5
done < duplicates.list
Or, as a horrible on liner:
Code:
awk '{print $1}' mp3s.md5 | sort | uniq -d | while read line
   do awk '/'$line'/ {print $0}' mp3s.md5
done

Last edited by pwc101; 12-03-2007 at 12:28 PM. Reason: cleaned it up a bit
 
Old 12-03-2007, 01:55 PM   #5
MQMan
Member
 
Registered: Jan 2004
Location: Los Angeles
Distribution: Slack64 13.37
Posts: 535

Rep: Reputation: 36
Go look at md5deep. It has options for matching checksums, and only printing out the matches, or non-matches.

BTW Using a checksum will only work if it a copy of the same song, it won't find duplicates of the same song that were encoded with different programs, or at different bit rates.

Cheers.
 
Old 12-04-2007, 02:38 AM   #6
edgjerp
Member
 
Registered: Dec 2004
Location: Trondheim, Norway
Distribution: kubuntu 10.04
Posts: 308

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by MQMan View Post
BTW Using a checksum will only work if it a copy of the same song, it won't find duplicates of the same song that were encoded with different programs, or at different bit rates.
I know, but exact duplicates are the most important to get rid of, besides, the only way to find nonexact song duplicates is to listen to all of them.

Last edited by edgjerp; 12-04-2007 at 02:47 AM. Reason: added info
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Adding music to Amarok collection leupi Linux - Software 10 05-30-2009 08:41 PM
my music collection Killer Penguin Linux - Software 10 10-06-2006 06:24 AM
Organize music collection Oxagast Linux - Software 1 09-06-2006 12:26 PM
How to backup digital music collection?? servnov General 3 05-27-2006 04:05 PM
Juk Music collection is not persistent ernesto_cgf Linux - Software 0 12-10-2004 09:56 AM


All times are GMT -5. The time now is 09:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration