LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   searching for duplicate files, but named differently (https://www.linuxquestions.org/questions/linux-general-1/searching-for-duplicate-files-but-named-differently-495762/)

matthewhardwick 10-26-2006 09:35 AM

searching for duplicate files, but named differently
 
I am looking for a way to find duplicate MP3s I run a radio station, and well we have several thousand duplicated MP3s but the names are all different, is there anyway someone can point me in the write direction?

All the files are named Mnnnnnn.mp3 or Pnnnnnn.mp3 (where n is a linear number) and I want to search through and delete the ones that are duplicated... I need some kind of list of files that were deleted afterward too... either in text or csv so I know what number are available to be used again.

Thanks in advance.

Matthew.

Wells 10-26-2006 10:08 AM

You are probably going to want to create a database that lists what the song is and what it's md5sum is. This would give you a nice way to tell if there are duplicates.

Guttorm 10-26-2006 10:38 AM

Hi.

I had a little script to check for duplicate files.

Code:

#!/bin/bash

last_sum=x
last_file=x
find . -type f -exec md5sum {} ";" |sort >/tmp/md5sums.txt
cat /tmp/md5sums.txt |
while read line
do
        sum=$(echo $line |cut -d" " -f1)
        file=$(echo $line |cut -d" " -f2)
        if [ $sum = $last_sum ];
        then
                echo "$file looks like a duplicate with $last_file"
                if [ -n $(diff -q $file $last_file) ];
                then
                    echo "$file and $last_file are the same."
                    #rm $file
                fi
        fi     
        last_sum="$sum" 
        last_file="$file"
done
rm /tmp/md5sums.txt

I added a #rm $file there - if you want one of those deleted - my script did reporting only - just delete the # to get the file deleted. The script is not perfect by any means, but it did the job when I wanted to search for duplicates. I don't think it will like filenames with spaces in and such - use at your own risk!

Note - cd to the directory with the mp3 before you run it.

bigearsbilly 10-27-2006 04:08 AM

FYI :)


Code:

while read line
do
        sum=$(echo $line |cut -d" " -f1)
        file=$(echo $line |cut -d" " -f2)

this can be done neater without 'cut':

Code:

while read sum file
do
      ....


bigearsbilly 10-27-2006 04:15 AM

I have got a perl script that does a similar job to the bash but recurses down.
It produces a list by comparing the cksum of the files.


with mp3s though I should think if the ID3 tags are different then naturally
the cksum will be different.

see this script http://www.linuxquestions.org/questi...94#post2343194


All times are GMT -5. The time now is 02:19 PM.