LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-27-2020, 12:30 PM   #1
uyjjhak
LQ Newbie
 
Registered: Feb 2020
Posts: 11

Rep: Reputation: Disabled
rmlint or any deduplication tools options - need to operate at directory level


Hello,
I have about 10tb clonezilla images in archives. I would like to make any order with that.
I would like to find any folder/file deduplicating solution which can allow me to set the lowest level as a driectory level, not at file level.

So,
I have about 10 image directories in each directory listed below (in /root/1st and /mnt/usbdrive). In each image directory few files can be repeated between specific images. I do not want to touch this.
When i run rmlint on that - rmlint will scan all files and removes all repeatable files from each directory. That is what I want to avoid.

But I would like to analyze whole diretiories - and compare at this level.
AND - If the another path will contain SAME directory with all it's contains - then will remove whole directory.

I understand that it looks like without sense, but for my problem is the only one answer

Example directory structure

Code:
Assume that root  path is /root/1st
dir1
uniqfile1
uniqfile2
duplicatedfile1
duplicatedfile2
dir2
duplicatedfile1
uniqfile4
duplicatedfile2
dir3
duplicatedfile1
uniqfile3
duplicatedfile2

and second drive - /mnt/usbdrive
with similar but not exact the same:

dir4
uniqfile1
uniqfile2
uniqfile6
uniqfile7
duplicatedfile1
duplicatedfile2
dir5
duplicatedfile1
uniqfile8
unifile10
duplicatedfile2
dir6
duplicatedfile1
uniqfile9
uniqfiles10-19
duplicatedfile2

and same dirs as in /root/1st:

dir1
uniqfile1
uniqfile2
duplicatedfile1
duplicatedfile2
dir2
duplicatedfile1
uniqfile4
duplicatedfile2
dir3
duplicatedfile1
uniqfile3
duplicatedfile2
And I would like to automatically (by any script, binary, etc) mark for deletion as duplicated content only:
/mnt/usbdrive/dir1
/mnt/usbdrive/dir2
/mnt/usbdrive/dir3

BUT not mark files inside the directories as duplicated.
So, after that we will have one occurence each of image dir. Any image directories can have have some small files duplicated between the image dirs, but that is not a problem for me.

How can I achieve that ?
regards

Last edited by uyjjhak; 03-27-2020 at 12:32 PM.
 
Old 03-29-2020, 06:12 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,123

Rep: Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372Reputation: 7372
since you have a special comparison criteria you need to implement your own solution.
First you need to write a compare tool to know if A and B (two dirs) are equal to each other.
Next you need to walk thru your folders. I think you can find solutions for this on the net for a lot of different languages. You need to skip files, which is most probably something you need to add, but looks not so hard.
Finally you can construct a report and if you find it acceptable you can remove the unwanted directories.

I would start with a definitely smaller structure, because you can test it much faster.

From the other hand you can run an ls -lR <dir> (or similar) and work on that text file which can be fast enough.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
rmlint or any other tool to remove common part of file systems uyjjhak Linux - Software 2 03-23-2020 10:52 AM
list directories where all foles are dumpicated (rmlint) funkytwig Linux - General 6 04-01-2019 02:15 PM
[SOLVED] file deduplication tools: looking for a particular software shivahoj Linux - Software 3 02-23-2019 04:24 PM
rmlint-2.0.0 - a lint/duplicate finder [rewrite of old rmlint, testers wanted] sahib_bommelig Linux - General 13 10-25-2015 09:55 AM
emacs in run level 3 then switch to X (level 7) then back to level 3 dsoliver Slackware 3 09-01-2006 03:31 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:18 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration