Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
01-22-2006, 10:25 PM
|
#1
|
Member
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419
Rep:
|
mass file compare or diff
I've been saving lots of data from one of my NAS boxes to another one and that's been going fine...I thought. I recently found out that some of the data on my secondary NAS is different. I'm wondering what some of you might suggest about doing a mass file check of my mirror copy. Would you use diff or would you create and checksum and compare?
My directory structure looks something like this:
Code:
NAS1:/data/testcase1/testfile1.dat
NAS1:/data/testcase1/testfile2.dat
NAS1:/data/testcase2/testfile1.dat
NAS1:/data/testcase2/testfile2.dat
Mirror copy to:
NAS2:/data/testcase1/testfile1.dat
NAS2:/data/testcase1/testfile2.dat
NAS2:/data/testcase2/testfile1.dat
NAS2:/data/testcase2/testfile2.dat
I've got hundreds of these directories with data files in them. Any recommondations of a script or something I should run to check them all and tell me which are different?
As always, thanks for the help. 
|
|
|
01-23-2006, 12:39 AM
|
#2
|
Senior Member
Registered: Nov 2002
Location: British Columbia, Canada
Distribution: Gentoo x86_64; FreeBSD; OS X
Posts: 3,764
Rep:
|
rsync is a good choice. If you set it up to sync good copy to bad, it will only transfer the files that are different.
rsync docs:
http://samba.anu.edu.au/rsync/documentation.html
|
|
|
01-23-2006, 02:56 AM
|
#3
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Rep:
|
"diff" is only good on text files, I believe. It may have support for binary files with an option.
You could also look at "cmp", for binary comparisons.
However, both "diff" and "cmp" practically read and compare each block of each file, so you may be in for a long wait.
Using checksums, like md5s, is a very reliable alternative and much faster.
Check out:
man md5sum
(don't forget the -b option for binary files and to capture all output in a file. Reporting on screen
will only be time consuming.
But, as bulliver already pointed out, rsync or some other intelligent copying
software may be your friend too.
|
|
|
01-23-2006, 11:56 AM
|
#4
|
Member
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419
Original Poster
Rep:
|
Thanks for the responses...
So even though I have two mounted NAS drives (/NAS1 and /NAS2), rsync is still a good way to go? Please excuse my ignorance. I thought that rsync was just for remote coping. Also, if I start using rsync, is there a switch in there that will tell it to compare checksums or something to verify that the copy is exact? Thanks again for you help. 
|
|
|
01-23-2006, 03:42 PM
|
#5
|
Senior Member
Registered: Nov 2002
Location: British Columbia, Canada
Distribution: Gentoo x86_64; FreeBSD; OS X
Posts: 3,764
Rep:
|
Quote:
I thought that rsync was just for remote coping. Also, if I start using rsync, is there a switch in there that will tell it to compare checksums or something to verify that the copy is exact?
|
It is for remote copying, but you can use it locally just as fine as well. And as for checksums, I am not sure specifically what goes on internally with rsync, but the whole point of rsync is to mirror directories of files. So yeah, it will find the files that are different, and when your done your two directories will be identical.
Here is an rsync tutorial that may be easier to digest than the man page, and it also has an example of using rsync to copy/sync files locally:
http://www.devshed.com/c/a/Administr...on-With-Rsync/
|
|
|
01-23-2006, 10:22 PM
|
#6
|
Member
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419
Original Poster
Rep:
|
Thanks again for your help! I use rsync for doing backups, but it never thought to use it for doing mirror copying my data. 
|
|
|
01-24-2006, 05:27 PM
|
#7
|
Member
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419
Original Poster
Rep:
|
Just to help anyone else that might need something like this, I did in two steps to verify the data that I copied from one NAS to another was correct. Seems to work pretty well, so I thought I'd share it.
Code:
# find /mnt/nas1/data/ -name "*.dat" | awk ' { print "md5sum -b "$1 } ' | bash >>/tmp/sums.txt
The output looked something like:
Code:
1e80e8ae0e7dcad533755f60fae4ac4a */mnt/NAS1/data/testcase1/testfile1.dat
6457349bc3cd6a870fba408eff5c51b8 */mnt/NAS1/data/testcase1/testfile2.dat
3b6128efbd31b09a0b9bcbda1d58965c */mnt/NAS1/data/testcase2/testfile1.dat
8120ddefd5d971c7bb36678a58e30ad4 */mnt/NAS1/data/testcase2/testfile2.dat
I then edited the sums.txt file and switched my NAS1 in the file to NAS2 and ran the 'md5sum -c /tmp/sums.txt' to give me a list of good and bad files.
Code:
# md5sum -c /tmp/sums.txt
/mnt/NAS2/data/testcase1/testfile1.dat: OK
/mnt/NAS2/data/testcase1/testfile2.dat: OK
/mnt/NAS2/data/testcase2/testfile1.dat: FAILED
/mnt/NAS2/data/testcase2/testfile2.dat: OK
md5sum: WARNING: 1 of 4 computed checksums did NOT match
Hope this help someone... 
Last edited by mijohnst; 01-24-2006 at 10:28 PM.
|
|
|
01-25-2006, 01:45 AM
|
#8
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Rep:
|
If you want to run this automatically, try replacing the editing of the MD5 checksum file (sums.txt)
with:
Code:
sed -e 's/\/NAS1\//\/NAS2\//' sums.txt > sums.tmp
mv sums.tmp sums.txt
After that, you can put everything in one script and, for instance, make it run once per day or once per week via cron, perhaps even mailing you the results.
|
|
|
01-26-2006, 06:28 PM
|
#9
|
Member
Registered: Nov 2003
Location: Huntsville, AL
Distribution: RHEL, Solaris, OSX, SuSE
Posts: 419
Original Poster
Rep:
|
Nice addon Timmeke...  Now we are down to one step... 
|
|
|
01-27-2006, 02:00 AM
|
#10
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Rep:
|
What step is that?
Setting up the cron job?
That's easy:
-add all commands in a plain text file, using your favorite text editor
-put your shell as interpreter at the top, using the "shebang" syntax:
ie #!/bin/bash
if you're using bash. (If you don't know your shell, try "echo $SHELL").
-Then type:
crontab -e
This will make you edit your crontab file (contains the cron jobs you want to run).
-To make it run each day, on a fixed time, add the following line:
min hour * * * /path/to/script
Replace "min" by the minute(s) and "hour" by the hour(s) when you want to run the script.
Ranges are allowed (ie 1-5), multiple entries are separated by commas (ie hour 1,4,6,10-14,20).
-To make it run once per week, add the following line instead:
min hour * * day_of_week /path/to/script
day_of_week consists of one or more values between 0 and 7 where 0 is Sunday. Names for weekdays are allowed to.
-For other cron job timing, please look at
man 1 crontab
man 5 crontab
man cron
-Don't forget to save the changes you've made to your crontab file (save your changes when you exit the editor).
Note that, for the "crontab -e" command to work, your user must be allowed to specify cron jobs. This depends on the /etc/cron.allow and /etc/cron.deny files and their contents. Please consult the man pages mentioned above for details.
|
|
|
01-27-2006, 04:06 AM
|
#11
|
LQ Newbie
Registered: Aug 2005
Posts: 14
Rep:
|
Quote:
Originally Posted by timmeke
Check out:
man md5sum
(don't forget the -b option for binary files and to capture all output in a file. Reporting on screen
will only be time consuming.
|
is '-b' option for md5sum necessary? Linux does not distinguish binary file and text file.
|
|
|
01-27-2006, 06:32 AM
|
#12
|
Senior Member
Registered: Nov 2005
Location: Belgium
Distribution: Red Hat, Fedora
Posts: 1,515
Rep:
|
Nowadays, NAS machines based on Windows exist too (unfortunately). So you can never be too careful.
In any case, I'd advise you to use the -b option for binary files.
|
|
|
All times are GMT -5. The time now is 03:42 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|