LinuxQuestions.org - searching for duplicate files, but named differently

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - searching for duplicate files, but named differently (https://www.linuxquestions.org/questions/linux-general-1/searching-for-duplicate-files-but-named-differently-495762/)

matthewhardwick

10-26-2006 09:35 AM

searching for duplicate files, but named differently

I am looking for a way to find duplicate MP3s I run a radio station, and well we have several thousand duplicated MP3s but the names are all different, is there anyway someone can point me in the write direction?

All the files are named Mnnnnnn.mp3 or Pnnnnnn.mp3 (where n is a linear number) and I want to search through and delete the ones that are duplicated... I need some kind of list of files that were deleted afterward too... either in text or csv so I know what number are available to be used again.

Thanks in advance.

Matthew.

Wells

10-26-2006 10:08 AM

You are probably going to want to create a database that lists what the song is and what it's md5sum is. This would give you a nice way to tell if there are duplicates.

Guttorm

10-26-2006 10:38 AM

Hi.

I had a little script to check for duplicate files.

Code:

#!/bin/bash



last_sum=x

last_file=x

find . -type f -exec md5sum {} ";" |sort >/tmp/md5sums.txt

cat /tmp/md5sums.txt |

while read line

do

        sum=$(echo $line |cut -d" " -f1)

        file=$(echo $line |cut -d" " -f2)

        if [ $sum = $last_sum ];

        then

                echo "$file looks like a duplicate with $last_file"

                if [ -n $(diff -q $file $last_file) ];

                then

                    echo "$file and $last_file are the same."

                    #rm $file

                fi

        fi      

        last_sum="$sum"  

        last_file="$file"

done

rm /tmp/md5sums.txt

I added a #rm $file there - if you want one of those deleted - my script did reporting only - just delete the # to get the file deleted. The script is not perfect by any means, but it did the job when I wanted to search for duplicates. I don't think it will like filenames with spaces in and such - use at your own risk!

Note - cd to the directory with the mp3 before you run it.

bigearsbilly

10-27-2006 04:08 AM

FYI :)

Code:

while read line

do

        sum=$(echo $line |cut -d" " -f1)

        file=$(echo $line |cut -d" " -f2)

this can be done neater without 'cut':

Code:

while read sum file

do

      ....

bigearsbilly

10-27-2006 04:15 AM

I have got a perl script that does a similar job to the bash but recurses down.
It produces a list by comparing the cksum of the files.

with mp3s though I should think if the ID3 tags are different then naturally
the cksum will be different.

see this script http://www.linuxquestions.org/questi...94#post2343194

All times are GMT -5. The time now is 02:19 PM.