LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   find all the same named directories on a hard disk (https://www.linuxquestions.org/questions/linux-newbie-8/find-all-the-same-named-directories-on-a-hard-disk-4175639978/)

fillemon 10-08-2018 04:31 PM

find all the same named directories on a hard disk
 
Hello,

my harddisk has a great amount of the same named directories all over the place. most of the time they hold the same data. i don't wanna use a compare tool to go through each file... at least not at this time. i just would like to know where the same named directories are, and merge them together...

i haven't found a tool that can do this out of the box. i guess this has to be done by scripting. and i'm not a scripting wizard. i think it might be interesting having a script that compares one directory against the rest of the harddisk ?

can anybody point me to a solution ? for example which command to use: i thought of the find command ?

i think this might help me... but then i would need to check how much % is the same, and if it holds a large % of the same files, i would like to merge it.
find /disk -type d -name "Documents"



thank you very much
kind regards

scasey 10-08-2018 04:36 PM

Yup. find will do that for you with minimal (if any) scripting. See man find

Turbocapitalist 10-08-2018 11:25 PM

You might look at fslint or fdupes.

Or if you work with find you can use -exec to make a checksum for each file and pipe that into sort and then into uniq. With the right option(s) uniq will show duplicates.

fatmac 10-09-2018 05:07 AM

Maybe run tree & pipe it to a file to peruse later using any text program, (cat/less/more/vi/nano/etc).
(Then you could just copy them all into one directory.)

Code:

tree /home > textfile
cat textfile | grep pdf > file2


l0f4r0 10-09-2018 07:40 AM

Quote:

Originally Posted by fatmac (Post 5912719)
Maybe run tree & pipe it to a file to peruse later using any text program, (cat/less/more/vi/nano/etc).
(Then you could just copy them all into one directory.)
Code:

tree /home > textfile
cat textfile | grep pdf > file2


If I'm correct, your method is manual (grepping of strings) then tedious to implement.
And cat is unecessary (--> grep 'pdf' textfile) ;)

@fillemon:
IMO, the simplest approach is Turbocapitalist's.
I would suggest something like:
Code:

find /disk -type d -iname '*documents*' -exec basename {} \; | sort | uniq -d
It will give you the name of the duplicate folders containing at least "documents" in their name (case insensitive). If need be, you can then do some find on each folder name found to get the path to them:
Code:

find /disk -type d -name 'exact_folder_name'
However, I don't get it with your original post because you indicated:
Code:

find /disk -type d -name "Documents"
As is, it implies that you already know what to search for (directories exactly named "Documents"). So this command already gives you what you want...

Turbocapitalist 10-09-2018 07:47 AM

Quote:

Originally Posted by l0f4r0 (Post 5912760)
I would suggest something like:

I was thinking something more along these lines to compare the content and not the file names:

Code:

find /dir01/ /dir02/ -type f -exec md5sum {} \; \
| sort | uniq --check-chars=32 -D

Other hash algorithms could be used instead, but there's a rather low chance of an accidental MD5 collision.

syg00 10-09-2018 07:54 AM

Quote:

Originally Posted by fillemon (Post 5912564)
i don't wanna use a compare tool to go through each file... at least not at this time.

Given the requirements for just names, I'd use locate - much less hammering of the hardware till the OP actually decides what they want to do.

Of course that may only a subset depending on prune options, but should be pretty complete for most use cases.

l0f4r0 10-09-2018 07:58 AM

Quote:

Originally Posted by Turbocapitalist (Post 5912762)
I was thinking something more along these lines to compare the content and not the file names:
Code:

find /dir01/ /dir02/ -type f -exec md5sum {} \; \
| sort | uniq --check-chars=32 -D


Okay but you are searching for files and not directories ;)

@fillemon: can you be more specific please? Do you want to search folders that have the same names, or duplicate files?

BW-userx 10-09-2018 08:31 AM

you'd might have a two part'er, first find dup directories, then using that search for dup file in the found dup directories. Going back to @Turbocapitalist suggestion of using fslint or fdupes. Perhaps along with find.

or something "crazy" like this, NOT Completely working code, the theory is there, but needs work. (I'm not feeling like setting up a dup directories, and dup files, to test this code until I get it to work properly)
Code:

#!/bin/bash

#set -x

count1=0
count2=0

working_dir1=/run/media/userx/3TB-External
working_dir2=/media/ntfs1

while read d1 ; do

 echo "outer Loop $count1"
 echo "$d1"
        while read d2 ; do
                echo "Inner loop"
                echo "$d2"
        echo "        [[ "$d1" =~ "$d2" ]] && ( echo "match" ; echo "reset find" ; exit ) "
                [[ "$d1" =~ "$d2" ]] && ( echo "match" ; echo "reset find" ; exit )
                ((count2++))
        done <<<"$(find "$working_dir2" -type d)"
       
          [[ "$count2" -ge '10' ]] && count2=0
        ((count1++))
done <<<"$(find "$working_dir1" -type d)"

you could even add it to move all of the files into one central location ,then delete the directories they came from to clean up after moving everything out of them.

fatmac 10-09-2018 09:14 AM

@ l0f4r0

Just trying to keep things simple as it is the OPs first post. :)


All times are GMT -5. The time now is 07:42 PM.