LinuxQuestions.org
Support LQ: Use code LQCO20 and save 20% on CrossOver Office
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
LinkBack Search this Thread
Old 02-20-2007, 08:44 PM   #1
joeljkp
Member
 
Registered: Feb 2003
Distribution: Ubuntu
Posts: 41

Rep: Reputation: 15
Looking for directory tree comparison tool


I did a server move, and I want to make sure everything transferred over and wasn't corrupted. So I'm looking for some kind of directory tree comparison tool that will work over ssh between two remote machines (or that I can run on one in relation to the other).

I've tried downloading the results of `ls -lR` from both servers and diffed the resulting files, but this flagged every little change in modification times, etc., so it was useless to me.

I also tried rsync with -n (dry-run), but this seemed to just produce a list of files it would compare, without actually doing any comparison.

I've also noticed a couple tools out there just for this, but they both require Java, and neither of the servers has Java on it.

Any suggestions?
 
Old 02-21-2007, 01:11 AM   #2
zulfilee
Member
 
Registered: Apr 2004
Location: India
Distribution: Redhat,Fedora
Posts: 430

Rep: Reputation: 32
Go for a md5sum of the files and do a compare.

A simple loop will be [From the directory where u have all the files]

for EACHFILE in `find . -iname "*"`
do
md5sum -b $EACHFILE
done

Get the results from both the serves and compare

- zulfi
 
Old 02-21-2007, 12:11 PM   #3
Quigi
Member
 
Registered: Mar 2003
Location: Cambridge, MA, USA
Distribution: Ubuntu (Dapper and Heron)
Posts: 376

Rep: Reputation: 31
Quote:
Originally Posted by zulfilee
for EACHFILE in `find . -iname "*"`
Hi Zulfilee,
What's the purpose of the -iname "*" test? It seems to succeed always.

---

BTW, md5sum doesn't like to operate on directories, so -type f might be useful.

Passing all files to md5sum should be a bit more efficient than starting a new process for each file. So:
Code:
md5sum `find . -type f`
Or, if you have so many files that that gives you a too long argument list (or weird characters in file names),
Code:
find . -type f -print0 | xargs -0 md5sum
You only need NUL delimiters (purple stuff) if you have weird characters in your file names, e.g., spaces.
 
Old 02-22-2007, 12:19 AM   #4
zulfilee
Member
 
Registered: Apr 2004
Location: India
Distribution: Redhat,Fedora
Posts: 430

Rep: Reputation: 32
Yeah the 'type' option will be proper .

-iname "*" tries to get all files and directories [which is not needed].
 
Old 02-22-2007, 10:46 AM   #5
joeljkp
Member
 
Registered: Feb 2003
Distribution: Ubuntu
Posts: 41

Original Poster
Rep: Reputation: 15
All of these solutions have a flaw if there is a possibility of there being additional or missing files. When using diff to compare the two sets of md5sums, it can't differentiate between items from different directories.

Consider the following:

Code:
Tree 1:
dir1 \
  file1
  file2
  file2~ (for example)
  file3

Tree 2:
dir1 \
  file1
  file2
  file3

diff:
<md5sum> dir1/file1   <md5sum> dir1/file1 <- match
<md5sum> dir1/file2   <md5sum> dir1/file2 <- match
<md5sum> dir1/file2~  <md5sum> dir1/file3 <- no match!
<md5sum> dir1/file3   <md5sum> dir2/file1 <- no match!
<md5sum> dir2/file1   <md5sum> dir2/file2 <- no match!

etc.
There would need to be something that has a concept of added and removed files, directories, etc.
 
Old 02-22-2007, 12:44 PM   #6
Quigi
Member
 
Registered: Mar 2003
Location: Cambridge, MA, USA
Distribution: Ubuntu (Dapper and Heron)
Posts: 376

Rep: Reputation: 31
Quote:
Originally Posted by joeljkp
When using diff to compare the two sets of md5sums, it can't differentiate between items from different directories.
Not sure if I understand the part "from different directories", so I'll reply to the example.

Diff is smarter than that; it re-synchronizes.
I recreated your scenario. As we have a reasonable number of files with sensible names, the simple md5sum `find dir1 -type f` is enough. The respective outputs are:
Code:
b026324c6904b2a9cb4b88d6d61c81d1  dir1/file1
26ab0db90d72e28ad0ba1e22ee510510  dir1/file2
6d7fce9fee471194aa8b5b6e47267f03  dir1/file3
and
Code:
b026324c6904b2a9cb4b88d6d61c81d1  dir1/file1
26ab0db90d72e28ad0ba1e22ee510510  dir1/file2
c193497a1a06b2c72230e6146ff47080  dir1/file2~
6d7fce9fee471194aa8b5b6e47267f03  dir1/file3
And when I run diff on them, I get
Code:
2a3
> c193497a1a06b2c72230e6146ff47080  dir1/file2~
Note: my find (GNU find version 4.2.27) outputs the file names without sorting, i.e., like ls -f does. The files may be ordered differently between the two trees, which would cause spurious differences. If that's an issue, try md5sum `find dir1 -type f | sort` to get a predictable order.

BTW, initially you said
Quote:
I want to make sure everything transferred over and wasn't corrupted.
If diff is quiet, everything is well. If it reports any difference, the trees don't have identical content.
 
  


Reply

Tags
diff, server


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
permissions for directory tree adamwenner Linux - Security 3 10-23-2004 07:39 AM
ksh directory comparison script SeT Programming 4 08-30-2004 11:57 AM
Gimp Tool Kit gchar Widget comparison... chewysplace Programming 2 01-30-2004 04:30 PM
Help! How do you delete a directory tree? johnmcollier Linux - Security 0 10-24-2003 02:29 AM
Directory Tree Question GreatMilenko Linux - Security 3 06-02-2002 01:48 PM


All times are GMT -5. The time now is 06:08 AM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration