[SOLVED] BASH Script batch delete dup file in same dir
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
MODDED:
I took it to three loops instead of 2 loop. to try and keep the one file and have it go through the same dir comparing it to the rest of the files then move the matched file, goto next file.
but it is looping through more than once and getting a file not there error after the first match and move.
end MOD:
Ok say I got same file names but some with spaces and the same without spaces between the words.
The only dups in this test bed is what I have posted in here. the absolute path is
/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live
Code:
userx%slackwhere ⚡ Xtc Live ⚡> ls
Xtc-Are You Receiving Me?.mp3 Xtc-Burning With Optimism's Flame.mp3 Xtc-LifeBeginsAtTheHop.mp3
Xtc-AreYouReceivingMe?.mp3 Xtc-BurningWithOptimism'sFlame.mp3 Xtc-Living Through Another Cuba.mp3
Xtc-Battery Brides.mp3 Xtc-Generals And Majors.mp3 Xtc-Love At First Sight.mp3
Xtc-BatteryBrides.mp3 Xtc-Life Begins At The Hop.mp3 Xtc-LoveAtFirstSight.mp3
userx%slackwhere ⚡ Xtc Live ⚡>
my echo test results
Code:
File: /run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3
title: Xtc-LoveAtFirstSight
title2: Xtc-LoveAtFirstSight
most inner Loop
/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live
most inner loop
here - title2: Xtc-LoveAtFirstSight
count: 0
mkdir: created directory '/run/media/userx/3TB-External/Hold-Test-Files/Xtc'
mkdir: created directory '/run/media/userx/3TB-External/Hold-Test-Files/Xtc/Xtc Live'
'/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3' -> '/run/media/userx/3TB-External/Hold-Test-Files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3'
File: /run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-Life Begins At The Hop.mp3
title: Xtc-LoveAtFirstSight
title2: Xtc-LifeBeginsAtTheHop
------then it loops again and no file of course --
most inner Loop
/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live
most inner loop
here - title2: Xtc-LoveAtFirstSight
count: 0
mv: cannot stat '/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3': No such file or directory
File: /run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-Burning With Optimism's Flame.mp3
It looks like it is not grabbing the actual dups like I want (the ones without spaces)
How to search through multi directories/subDirectores looking for such creatures.
take one file then compar it against the other files looking for that type of match?
then just removing the file without the spaces, keeping the other one then move on to the next file then look through everything doing the same?
loop through file names reading fileone
loop through file names reading filetwo
compare fileone and filetwo
if not equal
run instructions to remove spaces from fileone
compare fileone and filetwo
if equal
move filetwo
**BREAK**???
end if
end if
end loop
end loop
If this works as expected you may want to break out of the loop where **BREAK**??? is to avoid scanning all files after a successful move This will significantly speed up processing if there are many files in the directory
I'm a simple soul, so I like simple answers.
If you are sure all the mp3 that have CamelCase names are what you want to delete, add regex to that effect to the find, and delete them at the same time.
KISS.
I would doubt md5 on every file would be "faster". If the OP is happy similarly named files are the same, simply processing the names would have to be less processor intensive.
That should be very fast, and could find names spelled incorrectly. Things like spaces doesn't matter. I did some testing with soundex some time ago, and it worked well as long as the language is English.
I tried it on the list in the first post. It looks like it overgenerates duplicates because of the Xtc prefix.
Quote:
cat list.txt | while read name ; do echo $(bash fuzzy.sh "$name" ; echo "$name") ; done | sort | uniq -d -w 4 -D
X321 Xtc-Battery Brides.mp3
X321 Xtc-BatteryBrides.mp3
X321 Xtc-Burning With Optimism's Flame.mp3
X321 Xtc-BurningWithOptimism'sFlame.mp3
X324 Xtc-Life Begins At The Hop.mp3
X324 Xtc-LifeBeginsAtTheHop.mp3
X324 Xtc-Living Through Another Cuba.mp3
X324 Xtc-Love At First Sight.mp3
X324 Xtc-LoveAtFirstSight.mp3
X326 Xtc-Are You Receiving Me?.mp3
X326 Xtc-AreYouReceivingMe?.mp3
So then i cut of the "Xtc-" prefix, and it works well:
Quote:
cut -d"-" -f2 list.txt| while read name ; do echo $(bash fuzzy.sh "$name" ; echo "$name") ; done | sort | uniq -d -w 4 -D
A662 Are You Receiving Me?.mp3
A662 AreYouReceivingMe?.mp3
B361 Battery Brides.mp3
B361 BatteryBrides.mp3
B655 Burning With Optimism's Flame.mp3
B655 BurningWithOptimism'sFlame.mp3
L112 Life Begins At The Hop.mp3
L112 LifeBeginsAtTheHop.mp3
L131 Love At First Sight.mp3
L131 LoveAtFirstSight.mp3
Yes, there are a bunch of duplicate finders (I prefer duff), but this is a bit different. The names (only) appear to matter.
I really like this soundex idea ...
loop through file names reading fileone
loop through file names reading filetwo
compare fileone and filetwo
if not equal
run instructions to remove spaces from fileone
compare fileone and filetwo
if equal
move filetwo
**BREAK**???
end if
end if
end loop
end loop
If this works as expected you may want to break out of the loop where **BREAK**??? is to avoid scanning all files after a successful move This will significantly speed up processing if there are many files in the directory
that is what I started out with. 2 loops, and too many test files. tried a break but that was yesterday 3 loop non sense. Now that I'll have some sleep my brain might want to work better as well.
that bold part is another 'part' of the equation to ensure I get the one file without the spaces. because a compare string will not work due to spaces.
the thing I got a look at is, keeping the search within that same directory and only that dir and not any other dir.
grab one file in a directory then compare it to every file within that same dir, if match, move or delete the match, get next file repeat, when done with that dir move to next directory and repeat entire process again.
I'm going to cut back on my test files so I will not have so much to look at until I get this figured out.
I'm a simple soul, so I like simple answers.
If you are sure all the mp3 that have CamelCase names are what you want to delete, add regex to that effect to the find, and delete them at the same time.
KISS.
Might behoove to echo them first just in case ...
this is a I think a prior batch that I lowered case everything, and when I redid my script and not checking really really close. They must have gotten the spaces removed then made into one file name without spaces. that is why one is spaced and the other one is not, but not all 14,000 plus files are like this. All kept within one parent directory.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.