LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-10-2014, 07:35 PM   #1
fritzxy
LQ Newbie
 
Registered: Jul 2014
Posts: 6

Rep: Reputation: Disabled
Compare hard-disks and generate list of distinct (non duplicate) files


I have two hard-drives with data. There are some unique files on disk 1 and there are some unique files on disk 2, but most of the data is present on both disks.
The problem however is that not all the files have been placed in the equivalent directories.

Now I want to make sure that all files from disk 2 are stored on disk 1, without wasting space on files that are already present on that disk.

Checking all the files and directories manually is impossible. There are to many files and directories.
I am therefore looking for a way to automate the laborious part of this task.

If the directory structure and names on both disks were identical, then I could have used rsync to copy only the new and updated files to disk 1.
Unfortunately that is not the case, because some directories have a different name and some of the files are placed in different directories.

example:

Code:
Disk 1/beverages/black_coffee.txt
Disk 1/cars/4x4/jeep.txt
Disk 1/cars/sports/porsche.txt
Disk 1/food/onion.txt
Disk 1/food/carrot.txt

Disk 2/drinks/black_coffee.txt
Disk 2/car/jeep.txt
Disk 2/porsche.txt
Disk 2/food/onion.txt
Disk 2/food/potatoes.txt
Disk 2/vegetables/broccoli.txt
Then the file black_coffee.txt already exists on both disks, but in a different directory. The same goes for the file jeep.txt and porsche.txt as they have been placed in different directories on a different level.
Only the files potatoes.txt and broccoli.txt would be unique when comparing the two disks. Therefore the name and absolute path of the potatoes.txt and broccoli.txt files should be added to a list, which can be used to copy these unique files to disk 1 in a fitting directory.

I need some way to check which files on disk 2 do not yet exist on disk 1.
For now I assume that the filenames are unique identifiers, which would make it easier to compare files. (perhaps md5 hashing would be an alternative, but I fear that might be to complex and heavy for the many files)

The union of both sets covers about 80% of the data:


I am however interested in the relative complement of the first set:


That would exactly be the set of files that only exist on disk 2 and not yet on disk 1.

What would be the best approach to tackle this problem?
 
Old 08-10-2014, 09:39 PM   #2
kilgoretrout
Senior Member
 
Registered: Oct 2003
Posts: 3,015

Rep: Reputation: 399Reputation: 399Reputation: 399Reputation: 399
Have you looked at the diff command:

$ diff -qr <mount point disk 1> <mount point disk 2>

For example assuming disk 1 is mounted on /media/disk1/ and disk 2 is mounted on /media/disk2/, then you would run:

$ diff -qr /media/disk1/ /media/disk2/

Last edited by kilgoretrout; 08-10-2014 at 09:42 PM.
 
Old 08-10-2014, 10:11 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,340

Rep: Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177Reputation: 4177
rsync is designed to determine this sort of thing - you have all sorts of options for what you want to {in,ex}clude.
One of particular interest might be --dry-run.

No, I haven't tried it.
 
  


Reply

Tags
back-up, compare, files, hard drives


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to list the hard disks in the system? amishera Linux - Hardware 11 05-27-2013 08:52 AM
[SOLVED] How to compare a list of files in two directories: compare content and print size Batistuta_g_2000 Linux - Newbie 9 03-24-2013 08:05 AM
Generate list of deleted files eshwarconsulting Linux - Newbie 3 02-01-2012 07:38 PM
compare $php variable to indexed distinct mysql columns secretlydead Programming 1 02-18-2008 11:48 PM
List/compare a directory's files' creation and modification dates LittleTrish Linux - Newbie 3 10-22-2007 03:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 09:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration