LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-22-2006, 10:25 PM   #1
mijohnst
Member
 
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419

Rep: Reputation: 31
mass file compare or diff


I've been saving lots of data from one of my NAS boxes to another one and that's been going fine...I thought. I recently found out that some of the data on my secondary NAS is different. I'm wondering what some of you might suggest about doing a mass file check of my mirror copy. Would you use diff or would you create and checksum and compare?

My directory structure looks something like this:

Code:
NAS1:/data/testcase1/testfile1.dat
NAS1:/data/testcase1/testfile2.dat
NAS1:/data/testcase2/testfile1.dat
NAS1:/data/testcase2/testfile2.dat

Mirror copy to:

NAS2:/data/testcase1/testfile1.dat
NAS2:/data/testcase1/testfile2.dat
NAS2:/data/testcase2/testfile1.dat
NAS2:/data/testcase2/testfile2.dat
I've got hundreds of these directories with data files in them. Any recommondations of a script or something I should run to check them all and tell me which are different?

As always, thanks for the help.
 
Old 01-23-2006, 12:39 AM   #2
bulliver
Senior Member
 
Registered: Nov 2002
Location: Edmonton AB, Canada
Distribution: Gentoo x86_64; Gentoo PPC; FreeBSD; OS X 10.9.4
Posts: 3,760
Blog Entries: 4

Rep: Reputation: 78
rsync is a good choice. If you set it up to sync good copy to bad, it will only transfer the files that are different.

rsync docs:
http://samba.anu.edu.au/rsync/documentation.html
 
Old 01-23-2006, 02:56 AM   #3
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
"diff" is only good on text files, I believe. It may have support for binary files with an option.
You could also look at "cmp", for binary comparisons.
However, both "diff" and "cmp" practically read and compare each block of each file, so you may be in for a long wait.
Using checksums, like md5s, is a very reliable alternative and much faster.
Check out:
man md5sum
(don't forget the -b option for binary files and to capture all output in a file. Reporting on screen
will only be time consuming.

But, as bulliver already pointed out, rsync or some other intelligent copying
software may be your friend too.
 
Old 01-23-2006, 11:56 AM   #4
mijohnst
Member
 
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419

Original Poster
Rep: Reputation: 31
Thanks for the responses...

So even though I have two mounted NAS drives (/NAS1 and /NAS2), rsync is still a good way to go? Please excuse my ignorance. I thought that rsync was just for remote coping. Also, if I start using rsync, is there a switch in there that will tell it to compare checksums or something to verify that the copy is exact? Thanks again for you help.
 
Old 01-23-2006, 03:42 PM   #5
bulliver
Senior Member
 
Registered: Nov 2002
Location: Edmonton AB, Canada
Distribution: Gentoo x86_64; Gentoo PPC; FreeBSD; OS X 10.9.4
Posts: 3,760
Blog Entries: 4

Rep: Reputation: 78
Quote:
I thought that rsync was just for remote coping. Also, if I start using rsync, is there a switch in there that will tell it to compare checksums or something to verify that the copy is exact?
It is for remote copying, but you can use it locally just as fine as well. And as for checksums, I am not sure specifically what goes on internally with rsync, but the whole point of rsync is to mirror directories of files. So yeah, it will find the files that are different, and when your done your two directories will be identical.

Here is an rsync tutorial that may be easier to digest than the man page, and it also has an example of using rsync to copy/sync files locally:
http://www.devshed.com/c/a/Administr...on-With-Rsync/
 
Old 01-23-2006, 10:22 PM   #6
mijohnst
Member
 
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419

Original Poster
Rep: Reputation: 31
Thanks again for your help! I use rsync for doing backups, but it never thought to use it for doing mirror copying my data.
 
Old 01-24-2006, 05:27 PM   #7
mijohnst
Member
 
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419

Original Poster
Rep: Reputation: 31
Just to help anyone else that might need something like this, I did in two steps to verify the data that I copied from one NAS to another was correct. Seems to work pretty well, so I thought I'd share it.

Code:
# find /mnt/nas1/data/ -name "*.dat" | awk ' { print "md5sum -b "$1 } ' | bash >>/tmp/sums.txt
The output looked something like:

Code:
1e80e8ae0e7dcad533755f60fae4ac4a */mnt/NAS1/data/testcase1/testfile1.dat
6457349bc3cd6a870fba408eff5c51b8 */mnt/NAS1/data/testcase1/testfile2.dat
3b6128efbd31b09a0b9bcbda1d58965c */mnt/NAS1/data/testcase2/testfile1.dat
8120ddefd5d971c7bb36678a58e30ad4 */mnt/NAS1/data/testcase2/testfile2.dat
I then edited the sums.txt file and switched my NAS1 in the file to NAS2 and ran the 'md5sum -c /tmp/sums.txt' to give me a list of good and bad files.

Code:
# md5sum -c /tmp/sums.txt

/mnt/NAS2/data/testcase1/testfile1.dat: OK
/mnt/NAS2/data/testcase1/testfile2.dat: OK
/mnt/NAS2/data/testcase2/testfile1.dat: FAILED
/mnt/NAS2/data/testcase2/testfile2.dat: OK
md5sum: WARNING: 1 of 4 computed checksums did NOT match
Hope this help someone...

Last edited by mijohnst; 01-24-2006 at 10:28 PM.
 
Old 01-25-2006, 01:45 AM   #8
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
If you want to run this automatically, try replacing the editing of the MD5 checksum file (sums.txt)
with:

Code:
sed -e 's/\/NAS1\//\/NAS2\//' sums.txt > sums.tmp
mv sums.tmp sums.txt
After that, you can put everything in one script and, for instance, make it run once per day or once per week via cron, perhaps even mailing you the results.
 
Old 01-26-2006, 06:28 PM   #9
mijohnst
Member
 
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419

Original Poster
Rep: Reputation: 31
Nice addon Timmeke... Now we are down to one step...
 
Old 01-27-2006, 02:00 AM   #10
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
What step is that?
Setting up the cron job?

That's easy:
-add all commands in a plain text file, using your favorite text editor
-put your shell as interpreter at the top, using the "shebang" syntax:
ie #!/bin/bash
if you're using bash. (If you don't know your shell, try "echo $SHELL").
-Then type:
crontab -e
This will make you edit your crontab file (contains the cron jobs you want to run).
-To make it run each day, on a fixed time, add the following line:
min hour * * * /path/to/script
Replace "min" by the minute(s) and "hour" by the hour(s) when you want to run the script.
Ranges are allowed (ie 1-5), multiple entries are separated by commas (ie hour 1,4,6,10-14,20).
-To make it run once per week, add the following line instead:
min hour * * day_of_week /path/to/script
day_of_week consists of one or more values between 0 and 7 where 0 is Sunday. Names for weekdays are allowed to.
-For other cron job timing, please look at
man 1 crontab
man 5 crontab
man cron
-Don't forget to save the changes you've made to your crontab file (save your changes when you exit the editor).

Note that, for the "crontab -e" command to work, your user must be allowed to specify cron jobs. This depends on the /etc/cron.allow and /etc/cron.deny files and their contents. Please consult the man pages mentioned above for details.
 
Old 01-27-2006, 04:06 AM   #11
pczou
LQ Newbie
 
Registered: Aug 2005
Posts: 14

Rep: Reputation: 0
Quote:
Originally Posted by timmeke
Check out:
man md5sum
(don't forget the -b option for binary files and to capture all output in a file. Reporting on screen
will only be time consuming.
is '-b' option for md5sum necessary? Linux does not distinguish binary file and text file.
 
Old 01-27-2006, 06:32 AM   #12
timmeke
Senior Member
 
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515

Rep: Reputation: 61
Nowadays, NAS machines based on Windows exist too (unfortunately). So you can never be too careful.

In any case, I'd advise you to use the -b option for binary files.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compare two .doc file in c++ program Megasms Programming 2 11-04-2005 09:14 PM
Diff compare 2 files DavidTempler Linux - Newbie 2 11-01-2004 10:00 AM
Using diff to compare file with common lines, but at different line numbers jimieee Linux - Newbie 3 05-10-2004 07:26 AM
How Compare FS/dirs - can't use 'diff'? MikHud Linux - General 2 05-07-2002 07:51 AM
file compare program Nyc0n Linux - General 4 08-18-2001 09:08 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:10 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration