Hello,
I have about 10tb clonezilla images in archives. I would like to make any order with that.
I would like to find any folder/file deduplicating solution which can allow me to set the lowest level as a driectory level, not at file level.
So,
I have about 10 image directories in each directory listed below (in /root/1st and /mnt/usbdrive). In each image directory few files can be repeated between specific images. I do not want to touch this.
When i run rmlint on that - rmlint will scan all files and removes all repeatable files from each directory. That is what I want to avoid.
But I would like to analyze whole diretiories - and compare at this level.
AND - If the another path will contain SAME directory with all it's contains - then will remove whole directory.
I understand that it looks like without sense, but for my problem is the only one answer
Example directory structure
Code:
Assume that root path is /root/1st
dir1
uniqfile1
uniqfile2
duplicatedfile1
duplicatedfile2
dir2
duplicatedfile1
uniqfile4
duplicatedfile2
dir3
duplicatedfile1
uniqfile3
duplicatedfile2
and second drive - /mnt/usbdrive
with similar but not exact the same:
dir4
uniqfile1
uniqfile2
uniqfile6
uniqfile7
duplicatedfile1
duplicatedfile2
dir5
duplicatedfile1
uniqfile8
unifile10
duplicatedfile2
dir6
duplicatedfile1
uniqfile9
uniqfiles10-19
duplicatedfile2
and same dirs as in /root/1st:
dir1
uniqfile1
uniqfile2
duplicatedfile1
duplicatedfile2
dir2
duplicatedfile1
uniqfile4
duplicatedfile2
dir3
duplicatedfile1
uniqfile3
duplicatedfile2
And I would like to automatically (by any script, binary, etc) mark for deletion as duplicated content only:
/mnt/usbdrive/dir1
/mnt/usbdrive/dir2
/mnt/usbdrive/dir3
BUT not mark files inside the directories as duplicated.
So, after that we will have one occurence each of image dir. Any image directories can have have some small files duplicated between the image dirs, but that is not a problem for me.
How can I achieve that ?
regards