LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Testing integrity of backups? (https://www.linuxquestions.org/questions/linux-newbie-8/testing-integrity-of-backups-701297/)

nyle 01-31-2009 02:17 PM

Testing integrity of backups?
 
Hello,

I am responsible for backing up a little less than a TB of data for a few people, centrally stored on machine A. So far my backup plan has been using rsync to copy the data over the network to a removable drive on machine B every so often (it's a rather static environment).

It's worked fine so far, however I must admit having very paranoid tendencies and always fear that safety/backup mechanisms will fail when I need them most ;) As such, I would like to know a simple and unobtrusive way to test the integrity of the data so I know that what is on the removable drive is a bit-for-bit copy of what's on the central machine.

So far I have tried using df to count the number of blocks used on the central machine's data partitions and compare it to the blocks used on the backup drive. The numbers differ but are pretty close. Somehow I get the feeling this is not an accurate way to compare-- especially since both drives are using different filesystems (main is on reiser, backup is ext3).

Furthermore this is ~1TB of other peoples' data so it would not be practical or ethical for me to try to manually examine every single file to make sure it is intact.

I know md5sum is a way to check the integrity of a single file, but judging from the man page it doesn't have a recursive option or any way to compare hashes. I found this Windows program which seems to be able to scan recursively and compare. Unfortunately it does me no good here :P

Any tips? How do you manage to sleep at night when you consider your backup scheme? :)

Thanks

GazL 01-31-2009 03:28 PM

how about something along the lines of

Code:

find /original -type f -exec md5sum {} \; >/tmp/md5sums.orig
find /copy -type f -exec md5sum {} \; >/tmp/md5sums.copy

Then compare the results with
Code:

md5sum /tmp/md5sums.*
Or, if you want more detail on what changed.
Code:

diff /tmp/md5sums.orig /tmp/md5sums.copy

Tweak to taste.

edit: forgot to add, if you don't want the processing overhead of running md5sum against everything, then replacing md5sum with some invocation of stat such as 'stat -c '%n %s' in the 2 find commands should work reasonably well, but obviously won't be as thorough a check.

chrism01 02-02-2009 01:54 AM

According to the man page, rsync does a checksum after its copied each file, separate from the checksum it does to see if the file needs transferring:
Quote:

-c, --checksum
This forces the sender to checksum every regular file using a
128-bit MD4 checksum. It does this during the initial file-sys-
tem scan as it builds the list of all available files. The
receiver then checksums its version of each file (if it exists
and it has the same size as its sender-side counterpart) in
order to decide which files need to be updated: files with
either a changed size or a changed checksum are selected for
transfer. Since this whole-file checksumming of all files on
both sides of the connection occurs in addition to the automatic
checksum verifications that occur during a file’s transfer, this
option can be quite slow.

Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking its
whole-file checksum, but that automatic after-the-transfer veri-
fication has nothing to do with this option’s before-the-trans-
fer "Does this file need to be updated?" check.
2nd para ...


All times are GMT -5. The time now is 01:27 PM.