LinuxQuestions.org - find all the same named directories on a hard disk

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - find all the same named directories on a hard disk (https://www.linuxquestions.org/questions/linux-newbie-8/find-all-the-same-named-directories-on-a-hard-disk-4175639978/)

find all the same named directories on a hard disk

Hello,

my harddisk has a great amount of the same named directories all over the place. most of the time they hold the same data. i don't wanna use a compare tool to go through each file... at least not at this time. i just would like to know where the same named directories are, and merge them together...

i haven't found a tool that can do this out of the box. i guess this has to be done by scripting. and i'm not a scripting wizard. i think it might be interesting having a script that compares one directory against the rest of the harddisk ?

can anybody point me to a solution ? for example which command to use: i thought of the find command ?

i think this might help me... but then i would need to check how much % is the same, and if it holds a large % of the same files, i would like to merge it.
find /disk -type d -name "Documents"

thank you very much
kind regards

Yup. find will do that for you with minimal (if any) scripting. See man find

You might look at fslint or fdupes.

Or if you work with find you can use -exec to make a checksum for each file and pipe that into sort and then into uniq. With the right option(s) uniq will show duplicates.

Maybe run tree & pipe it to a file to peruse later using any text program, (cat/less/more/vi/nano/etc).
(Then you could just copy them all into one directory.)

Code:

tree /home > textfile

cat textfile | grep pdf > file2

Quote:

Originally Posted by fatmac (Post 5912719)

Maybe run tree & pipe it to a file to peruse later using any text program, (cat/less/more/vi/nano/etc).
(Then you could just copy them all into one directory.)

Code:

tree /home > textfile

cat textfile | grep pdf > file2

If I'm correct, your method is manual (grepping of strings) then tedious to implement.
And cat is unecessary (--> grep 'pdf' textfile) ;)

@fillemon:
IMO, the simplest approach is Turbocapitalist's.
I would suggest something like:

Code:

find /disk -type d -iname '*documents*' -exec basename {} \; | sort | uniq -d

It will give you the name of the duplicate folders containing at least "documents" in their name (case insensitive). If need be, you can then do some find on each folder name found to get the path to them:

Code:

find /disk -type d -name 'exact_folder_name'

However, I don't get it with your original post because you indicated:

Code:

find /disk -type d -name "Documents"

As is, it implies that you already know what to search for (directories exactly named "Documents"). So this command already gives you what you want...

Quote:

Originally Posted by l0f4r0 (Post 5912760)

I would suggest something like:

I was thinking something more along these lines to compare the content and not the file names:

Code:

find /dir01/ /dir02/ -type f -exec md5sum {} \; \

| sort | uniq --check-chars=32 -D

Other hash algorithms could be used instead, but there's a rather low chance of an accidental MD5 collision.

Quote:

Originally Posted by fillemon (Post 5912564)

i don't wanna use a compare tool to go through each file... at least not at this time.

Given the requirements for just names, I'd use locate - much less hammering of the hardware till the OP actually decides what they want to do.

Of course that may only a subset depending on prune options, but should be pretty complete for most use cases.

Quote:

Originally Posted by Turbocapitalist (Post 5912762)

I was thinking something more along these lines to compare the content and not the file names:

Code:

find /dir01/ /dir02/ -type f -exec md5sum {} \; \

| sort | uniq --check-chars=32 -D

Okay but you are searching for files and not directories ;)

@fillemon: can you be more specific please? Do you want to search folders that have the same names, or duplicate files?

you'd might have a two part'er, first find dup directories, then using that search for dup file in the found dup directories. Going back to @Turbocapitalist suggestion of using fslint or fdupes. Perhaps along with find.

or something "crazy" like this, NOT Completely working code, the theory is there, but needs work. (I'm not feeling like setting up a dup directories, and dup files, to test this code until I get it to work properly)

Code:

#!/bin/bash



#set -x



count1=0

count2=0



working_dir1=/run/media/userx/3TB-External

working_dir2=/media/ntfs1



while read d1 ; do



 echo "outer Loop $count1"

 echo "$d1"

        while read d2 ; do

                echo "Inner loop"

                echo "$d2"

        echo "        [[ "$d1" =~ "$d2" ]] && ( echo "match" ; echo "reset find" ; exit ) "

                [[ "$d1" =~ "$d2" ]] && ( echo "match" ; echo "reset find" ; exit )

                ((count2++))

        done <<<"$(find "$working_dir2" -type d)"

        

          [[ "$count2" -ge '10' ]] && count2=0

        ((count1++))

done <<<"$(find "$working_dir1" -type d)"

you could even add it to move all of the files into one central location ,then delete the directories they came from to clean up after moving everything out of them.

@ l0f4r0

Just trying to keep things simple as it is the OPs first post. :)