LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 04-03-2012, 04:36 AM   #1
fergus
LQ Newbie
 
Registered: Jun 2011
Posts: 7

Rep: Reputation: Disabled
Comparing two directories


Looking through Forum and sub-Forum titles, this seems to be the only place that it's all right to ask about syntax. Is this OK? Lots of moderators can get very snippy so please let me know, and if necessary re-direct me to another forum or another site ...
Comparing two directories
I'm interested in comparing two directories dir1 and dir2 both of size about 600G (each backs up an entire resource, in fact). Essentially I need to do
diff -rq /dir1 /dir2
but diff compares file contents byte-by-byte and I really don't want, or need, to do this too often. Most of the time all I need to do is check presence/ absence/ location. That is, I just need the output from
diff -rq /dir1 /dir2 | grep "^Only in"
without the forensic effort implied by diff. Some variation on find suggests itself. I have iterated to
comm -3 <(find /dir1 | sort | sed 's/^.....//g') <(find /dir2 | sort | sed 's/^.....//g')
where the ..... just gets rid of the leading /dir?/. Actually this provides too much information, recounting not just unmatching directories but also superfluously all the files they contain. (But find -d initiates too skimpy a search, because it will not identify unmatching files!) I want to abbreviate the output from this command (I dunno, kind of "zip" it?) so that it concisely provides just the minimal information that diff provides by "Only in ..".
I can't believe this is an original request, and I am quite surprised that diff does not provide it through some kind of option. Google-ing suggests dircmp, but that seems largely unavailable and where it is available, highly variable in design and output. I'm sure the script above, with tinkering to achieve the concatenation required, is the answer.
Anybody out there seen this, done this?
Thank you!
 
Old 04-03-2012, 06:03 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
do ls -lR in the two directories and compare the result.
 
Old 04-03-2012, 06:56 AM   #3
fergus
LQ Newbie
 
Registered: Jun 2011
Posts: 7

Original Poster
Rep: Reputation: Disabled
Thank you, but this isn't going to fly. ls -lr gives output like
-rw-r--r-- 1 fergus hdd-7 636 Mar 31 07:43 geese
-rw-r--r-- 1 fergus hdd-7 539 Mar 31 07:43 hansen
-rw-r--r-- 1 fergus hdd-7 570 Mar 31 07:43 hundal
so any changes in timestamp between dir1 and dir2 will blur the comparison. Also filenames are listed independently of the directory that contains them. You might as well have said "do find in the two directories and compare the result" which (a) provides better location-specific output; (b) is not blurred by timestamp data; and (c) is what essentially I am doing, but need to improve. Thanks all the same.
 
Old 04-03-2012, 07:02 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
Code:
cd <dir>; find . -type f
will give you a simple filelist, so you can compare those lists (maybe you need to sort them first)
 
Old 04-03-2012, 07:17 AM   #5
fergus
LQ Newbie
 
Registered: Jun 2011
Posts: 7

Original Poster
Rep: Reputation: Disabled
Thanks again. But your suggestion "find/ sort/ compare ... a simple filelist" is exactly what's going on in my original post. The problem is that the output from the comparison is too verbose e.g. listing the entire contents of unmatched subdirectories rather than simply stating the fact of the unmatched directory, which is all one needs to know. My post is not about how to perform a comparison, it's about how to do it quickly cleanly and completely but also concisely. Thanks all the same.
 
Old 04-03-2012, 07:47 AM   #6
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
ok, maybe this time:
Code:
cd <dir>; find . -type dir -exec <script> {} \;
the script should look like this:
Code:
echo -n "$1 "
ls -1 $1 |  awk ' { a = a $0 } END { print a } '
this will generate one line for every dir, and now you can compare the output of the two find dir by dir.

I think you need to execute find to have the list of all the dirs, and than you need to generate a text in these dirs to be able to compare the contents as you like.
 
Old 04-03-2012, 08:14 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Another option is rsync (with the --dry-run option) and it should be fast:
Code:
rsync --dry-run -O -av --ignore-existing /dir1/* /dir2 | sed -n '2,/^$/s/^/Only in dir1: /p'
rsync --dry-run -O -av --ignore-existing /dir2/* /dir1 | sed -n '2,/^$/s/^/Only in dir2: /p'
This should give an output similar to the "Only in" lines of the diff command. However you have to run the command twice, inverting the two directories in order to retrieve files only in dir1 and then files only in dir2. The sed command removes the statistics from the last two lines of the rsync output and adds the proper "Only in" string. The only caveat is the presence of the blank lines between the last file name and the statistics: it serves to remove the unwanted lines from the rsync output, but it should be removed afterwards. Just to give you an idea!


Edit: a simple sed command can remove the last unwanted line:
Code:
rsync --dry-run -O -av --ignore-existing /dir1/* /dir2 | sed -n '2,/^$/s/^/Only in dir1: /p' | sed '$d'
rsync --dry-run -O -av --ignore-existing /dir2/* /dir1 | sed -n '2,/^$/s/^/Only in dir2: /p' | sed '$d'

Last edited by colucix; 04-03-2012 at 08:17 AM.
 
1 members found this post helpful.
Old 04-03-2012, 10:56 AM   #8
fergus
LQ Newbie
 
Registered: Jun 2011
Posts: 7

Original Poster
Rep: Reputation: Disabled
Thank you. This looked really convenient. But for any non-matching folder under dir1 or dir2, which is all one needs to know, rsync still provides, additionally to identifying it, a complete listing of all files and subdirectories contained in it. Again, superfluous information (and potentially many 00s or 000s of lines of it)! I have played with all possible switches .. I think .. without managing to suppress this.
Driving me nuts ... I am wondering whether another approach would be to pipe the sorted information to a text editor, that might have the facility to in some sense "recognise headings" by their shape, and suppress all "section contents" thereunder. So for example in the listing
/dir1/a
/dir1/a/b/c/file1
/dir1/a/file1
/dir1/f/file2
/dir1/f/g/file3
the 2nd and 3rd lines would be suppressed.
(In which case one might as well revert to the output from comparing the two lists from "find" and edit that output in the same way.)
Thank you.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Comparing two remote directories in RHEL 5 servers. roi.levy Linux - Enterprise 1 10-17-2010 11:40 AM
Comparing directories bzenowich Linux - Software 3 10-08-2009 01:47 PM
comparing directories and files crazy8 Linux - Newbie 4 01-16-2008 10:33 AM
Comparing directories ursusman Linux - Newbie 5 07-04-2006 06:56 AM
check for matching files when comparing directories WarriorWarren Linux - General 3 05-07-2003 01:40 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration