Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I've been saving lots of data from one of my NAS boxes to another one and that's been going fine...I thought. I recently found out that some of the data on my secondary NAS is different. I'm wondering what some of you might suggest about doing a mass file check of my mirror copy. Would you use diff or would you create and checksum and compare?
I've got hundreds of these directories with data files in them. Any recommondations of a script or something I should run to check them all and tell me which are different?
"diff" is only good on text files, I believe. It may have support for binary files with an option.
You could also look at "cmp", for binary comparisons.
However, both "diff" and "cmp" practically read and compare each block of each file, so you may be in for a long wait.
Using checksums, like md5s, is a very reliable alternative and much faster.
Check out:
man md5sum
(don't forget the -b option for binary files and to capture all output in a file. Reporting on screen
will only be time consuming.
But, as bulliver already pointed out, rsync or some other intelligent copying
software may be your friend too.
So even though I have two mounted NAS drives (/NAS1 and /NAS2), rsync is still a good way to go? Please excuse my ignorance. I thought that rsync was just for remote coping. Also, if I start using rsync, is there a switch in there that will tell it to compare checksums or something to verify that the copy is exact? Thanks again for you help.
I thought that rsync was just for remote coping. Also, if I start using rsync, is there a switch in there that will tell it to compare checksums or something to verify that the copy is exact?
It is for remote copying, but you can use it locally just as fine as well. And as for checksums, I am not sure specifically what goes on internally with rsync, but the whole point of rsync is to mirror directories of files. So yeah, it will find the files that are different, and when your done your two directories will be identical.
Just to help anyone else that might need something like this, I did in two steps to verify the data that I copied from one NAS to another was correct. Seems to work pretty well, so I thought I'd share it.
I then edited the sums.txt file and switched my NAS1 in the file to NAS2 and ran the 'md5sum -c /tmp/sums.txt' to give me a list of good and bad files.
Code:
# md5sum -c /tmp/sums.txt
/mnt/NAS2/data/testcase1/testfile1.dat: OK
/mnt/NAS2/data/testcase1/testfile2.dat: OK
/mnt/NAS2/data/testcase2/testfile1.dat: FAILED
/mnt/NAS2/data/testcase2/testfile2.dat: OK
md5sum: WARNING: 1 of 4 computed checksums did NOT match
If you want to run this automatically, try replacing the editing of the MD5 checksum file (sums.txt)
with:
Code:
sed -e 's/\/NAS1\//\/NAS2\//' sums.txt > sums.tmp
mv sums.tmp sums.txt
After that, you can put everything in one script and, for instance, make it run once per day or once per week via cron, perhaps even mailing you the results.
That's easy:
-add all commands in a plain text file, using your favorite text editor
-put your shell as interpreter at the top, using the "shebang" syntax:
ie #!/bin/bash
if you're using bash. (If you don't know your shell, try "echo $SHELL").
-Then type:
crontab -e
This will make you edit your crontab file (contains the cron jobs you want to run).
-To make it run each day, on a fixed time, add the following line:
min hour * * * /path/to/script
Replace "min" by the minute(s) and "hour" by the hour(s) when you want to run the script.
Ranges are allowed (ie 1-5), multiple entries are separated by commas (ie hour 1,4,6,10-14,20).
-To make it run once per week, add the following line instead:
min hour * * day_of_week /path/to/script
day_of_week consists of one or more values between 0 and 7 where 0 is Sunday. Names for weekdays are allowed to.
-For other cron job timing, please look at
man 1 crontab
man 5 crontab
man cron
-Don't forget to save the changes you've made to your crontab file (save your changes when you exit the editor).
Note that, for the "crontab -e" command to work, your user must be allowed to specify cron jobs. This depends on the /etc/cron.allow and /etc/cron.deny files and their contents. Please consult the man pages mentioned above for details.
Check out:
man md5sum
(don't forget the -b option for binary files and to capture all output in a file. Reporting on screen
will only be time consuming.
is '-b' option for md5sum necessary? Linux does not distinguish binary file and text file.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.