LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-31-2017, 04:56 PM   #1
BW-userx
Senior Member
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Void Linux / Slackware 14.2
Posts: 4,677

Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
BASH Script batch delete dup file in same dir


MODDED:
I took it to three loops instead of 2 loop. to try and keep the one file and have it go through the same dir comparing it to the rest of the files then move the matched file, goto next file.

but it is looping through more than once and getting a file not there error after the first match and move.
end MOD:

Ok say I got same file names but some with spaces and the same without spaces between the words.

The only dups in this test bed is what I have posted in here. the absolute path is

/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live


Code:
userx%slackwhere ⚡ Xtc Live ⚡> ls
Xtc-Are You Receiving Me?.mp3  Xtc-Burning With Optimism's Flame.mp3  Xtc-LifeBeginsAtTheHop.mp3         
Xtc-AreYouReceivingMe?.mp3     Xtc-BurningWithOptimism'sFlame.mp3     Xtc-Living Through Another Cuba.mp3
Xtc-Battery Brides.mp3         Xtc-Generals And Majors.mp3            Xtc-Love At First Sight.mp3        
Xtc-BatteryBrides.mp3          Xtc-Life Begins At The Hop.mp3         Xtc-LoveAtFirstSight.mp3           
userx%slackwhere ⚡ Xtc Live ⚡>
my echo test results
Code:
File: /run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3
title: Xtc-LoveAtFirstSight
title2: Xtc-LoveAtFirstSight
most inner Loop
/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live
most inner loop
here - title2: Xtc-LoveAtFirstSight
count: 0
mkdir: created directory '/run/media/userx/3TB-External/Hold-Test-Files/Xtc'
mkdir: created directory '/run/media/userx/3TB-External/Hold-Test-Files/Xtc/Xtc Live'
'/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3' -> '/run/media/userx/3TB-External/Hold-Test-Files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3'
File: /run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-Life Begins At The Hop.mp3
title: Xtc-LoveAtFirstSight
title2: Xtc-LifeBeginsAtTheHop
------then it loops again and no file of course --

most inner Loop
/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live
most inner loop
here - title2: Xtc-LoveAtFirstSight
count: 0
mv: cannot stat '/run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-LoveAtFirstSight.mp3': No such file or directory
File: /run/media/userx/3TB-External/Test-Duplecates-files/Xtc/Xtc Live/Xtc-Burning With Optimism's Flame.mp3
It looks like it is not grabbing the actual dups like I want (the ones without spaces)

How to search through multi directories/subDirectores looking for such creatures.

take one file then compar it against the other files looking for that type of match?

then just removing the file without the spaces, keeping the other one then move on to the next file then look through everything doing the same?

Code:
#!/bin/bash

working_dir=/run/media/userx/3TB-External/Test-Duplecates-files

move_to=/run/media/userx/3TB-External/Hold-Test-Files



count=0

while read FILENAME
do
{


f=$FILENAME
path=${f%/*}
xfile=${f##*/}
title=${xfile%.*}
ext=${xfile##*.}

echo "outter Loop: "
echo "$path"
echo "outter Loop"

while read FILE
do
{

f=$FILE
path=${f%/*}
xfile=${f##*/}
title2=${xfile%.*}
ext=${xfile##*.}

#title=${title// /}

echo "inner Loop:"
echo "$path"
echo "inner loop"



while read FILE
do
{
f=$FILE
path=${f%/*}
xfile=${f##*/}
title2=${xfile%.*}
ext=${xfile##*.}

title2=${title2// /}

echo "File: $FILE"
echo "title: $title"
echo "title2: $title2"
echo "most inner Loop"
echo "$path"
echo "most inner loop"

[[ "$title" =~ "$title2" ]] &&  (

echo "here - title2: $title2"

echo "count: $((count++))"


echo "File: $FILE" >> CountFile
echo "title: $title" >> CountFile
echo "title2: $title2" >> CountFile
echo "most inner Loop" >> CountFile
echo "$path" >> CountFile
echo "most inner loop" >> CountFile
echo "here - title2: $title2" >> CountFile
echo "count: $((count++))"  >> CountFile
echo  >> CountFile

#keep sub dir structor to mimic same from in move_to dir
getRidOf=${path/$working_dir/$move_to}

mkdir -pv "$getRidOf"
mv -v "$path/$title2.$ext" "$getRidOf"

sleep 1

)

}
done< <(find "$path" -type f \( -name "*.mp3" -o -name "*.MP3" \))
}
done< <(find "$path" -type f \( -name "*.mp3" -o -name "*.MP3" \))
}
done< <(find "$working_dir" -type f \( -name "*.mp3" -o -name "*.MP3" \))
I don't know - it is a mess.

I think my loop logic is too loopy for this to work like that. Any suggestions?

Last edited by BW-userx; 05-31-2017 at 08:09 PM.
 
Old 06-01-2017, 12:44 AM   #2
rdgreenlaw
Member
 
Registered: May 2007
Location: Newport, Maine, USA
Distribution: Debian 8.7
Posts: 72

Rep: Reputation: 18
Here's a few suggestions that might help...

(This is pseudo code only)
Code:
loop through file names reading fileone
  loop through file names reading filetwo
    compare fileone and filetwo
      if not equal
         run instructions to remove spaces from fileone
         compare fileone and filetwo
           if equal 
             move filetwo 
             **BREAK**???
           end if
      end if
  end loop
end loop
If this works as expected you may want to break out of the loop where **BREAK**??? is to avoid scanning all files after a successful move This will significantly speed up processing if there are many files in the directory
 
1 members found this post helpful.
Old 06-01-2017, 01:34 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,805

Rep: Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168
I'm a simple soul, so I like simple answers.
If you are sure all the mp3 that have CamelCase names are what you want to delete, add regex to that effect to the find, and delete them at the same time.
KISS.

Might behoove to echo them first just in case ...
 
Old 06-01-2017, 02:49 AM   #4
scasey
Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 5.11
Posts: 334

Rep: Reputation: 122Reputation: 122
To follow up on syg00's comment: Do you know that all the file names without spaces are duplicates?
 
Old 06-01-2017, 03:32 AM   #5
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,293

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
A faster way to find duplicates. The filename is ignored, and if the md5 is the same, its considered duplicate.
Code:
md5sum * | sort | uniq -d -w 32
md5sum collitions exist but it's very very unlikely. You could use "diff -q file1 file2" to be 100% sure it's a duplicate.
 
Old 06-01-2017, 03:45 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,805

Rep: Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168
I would doubt md5 on every file would be "faster". If the OP is happy similarly named files are the same, simply processing the names would have to be less processor intensive.
 
Old 06-01-2017, 04:00 AM   #7
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,293

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
I agree. I just pasted an alias I had for finding duplicates.

Another idea is to use something like soundex on the names instead of md5 of the file contents.

https://gist.github.com/livibetter/1997957

That should be very fast, and could find names spelled incorrectly. Things like spaces doesn't matter. I did some testing with soundex some time ago, and it worked well as long as the language is English.
 
1 members found this post helpful.
Old 06-01-2017, 04:43 AM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,805

Rep: Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168
What an great idea !! - Love it.

I can remember testing soundex in one of our government departments - must have been back in the late 70's. Bet things have improved since then ...
 
Old 06-01-2017, 04:43 AM   #9
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,293

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
I tried the fuzzy.sh script on filenames in a few directories, and it looks like it's maybe a bit too "fuzzy"?

Code:
ls -1 | while read name ; do echo $(bash fuzzy.sh "$name" ; echo "$name") ; done | sort | uniq -d -w 4 -D
 
Old 06-01-2017, 05:05 AM   #10
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,293

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
I tried it on the list in the first post. It looks like it overgenerates duplicates because of the Xtc prefix.

Quote:
cat list.txt | while read name ; do echo $(bash fuzzy.sh "$name" ; echo "$name") ; done | sort | uniq -d -w 4 -D
X321 Xtc-Battery Brides.mp3
X321 Xtc-BatteryBrides.mp3
X321 Xtc-Burning With Optimism's Flame.mp3
X321 Xtc-BurningWithOptimism'sFlame.mp3
X324 Xtc-Life Begins At The Hop.mp3
X324 Xtc-LifeBeginsAtTheHop.mp3
X324 Xtc-Living Through Another Cuba.mp3
X324 Xtc-Love At First Sight.mp3
X324 Xtc-LoveAtFirstSight.mp3
X326 Xtc-Are You Receiving Me?.mp3
X326 Xtc-AreYouReceivingMe?.mp3
So then i cut of the "Xtc-" prefix, and it works well:

Quote:
cut -d"-" -f2 list.txt| while read name ; do echo $(bash fuzzy.sh "$name" ; echo "$name") ; done | sort | uniq -d -w 4 -D
A662 Are You Receiving Me?.mp3
A662 AreYouReceivingMe?.mp3
B361 Battery Brides.mp3
B361 BatteryBrides.mp3
B655 Burning With Optimism's Flame.mp3
B655 BurningWithOptimism'sFlame.mp3
L112 Life Begins At The Hop.mp3
L112 LifeBeginsAtTheHop.mp3
L131 Love At First Sight.mp3
L131 LoveAtFirstSight.mp3
 
1 members found this post helpful.
Old 06-01-2017, 05:13 AM   #11
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,805

Rep: Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168
Nice - you were a few steps ahead of me.
 
Old 06-01-2017, 05:41 AM   #12
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,570

Rep: Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811Reputation: 2811
http://doubles.sourceforge.net/
 
Old 06-01-2017, 06:11 AM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,805

Rep: Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168Reputation: 2168
Yes, there are a bunch of duplicate finders (I prefer duff), but this is a bit different. The names (only) appear to matter.
I really like this soundex idea ...
 
Old 06-01-2017, 06:22 AM   #14
BW-userx
Senior Member
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Void Linux / Slackware 14.2
Posts: 4,677

Original Poster
Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Quote:
Originally Posted by rdgreenlaw View Post
Here's a few suggestions that might help...

(This is pseudo code only)
Code:
loop through file names reading fileone
  loop through file names reading filetwo
    compare fileone and filetwo
      if not equal
         run instructions to remove spaces from fileone
         compare fileone and filetwo
           if equal 
             move filetwo 
             **BREAK**???
           end if
      end if
  end loop
end loop
If this works as expected you may want to break out of the loop where **BREAK**??? is to avoid scanning all files after a successful move This will significantly speed up processing if there are many files in the directory
that is what I started out with. 2 loops, and too many test files. tried a break but that was yesterday 3 loop non sense. Now that I'll have some sleep my brain might want to work better as well.

that bold part is another 'part' of the equation to ensure I get the one file without the spaces. because a compare string will not work due to spaces.

the thing I got a look at is, keeping the search within that same directory and only that dir and not any other dir.

grab one file in a directory then compare it to every file within that same dir, if match, move or delete the match, get next file repeat, when done with that dir move to next directory and repeat entire process again.


I'm going to cut back on my test files so I will not have so much to look at until I get this figured out.

thanks
 
Old 06-01-2017, 06:25 AM   #15
BW-userx
Senior Member
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Void Linux / Slackware 14.2
Posts: 4,677

Original Poster
Rep: Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865Reputation: 865
Quote:
Originally Posted by syg00 View Post
I'm a simple soul, so I like simple answers.
If you are sure all the mp3 that have CamelCase names are what you want to delete, add regex to that effect to the find, and delete them at the same time.
KISS.

Might behoove to echo them first just in case ...
this is a I think a prior batch that I lowered case everything, and when I redid my script and not checking really really close. They must have gotten the spaces removed then made into one file name without spaces. that is why one is spaced and the other one is not, but not all 14,000 plus files are like this. All kept within one parent directory.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script- text file manip- delete everything before/after string justin99 Programming 9 11-20-2014 03:16 AM
Bash script to delete folder's that are listed in a text file Bone11409 Programming 26 01-16-2009 02:55 PM
Need a bash shell script which will delete lines from file scjohnie Linux - Newbie 1 09-13-2008 08:51 PM
How to run a BASH script in a Batch file (with Cygwin) FaeDine Programming 2 10-27-2007 04:47 PM
Bash script to process every file in current dir and sub dirs BuckRogers01 Linux - Software 3 09-06-2005 07:32 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration