Latest LQ Deal: Linux Power User Bundle
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 01-31-2009, 02:17 PM   #1
Registered: Feb 2008
Location: Atlanta
Distribution: Ubuntu 9.10
Posts: 42

Rep: Reputation: 17
Testing integrity of backups?


I am responsible for backing up a little less than a TB of data for a few people, centrally stored on machine A. So far my backup plan has been using rsync to copy the data over the network to a removable drive on machine B every so often (it's a rather static environment).

It's worked fine so far, however I must admit having very paranoid tendencies and always fear that safety/backup mechanisms will fail when I need them most As such, I would like to know a simple and unobtrusive way to test the integrity of the data so I know that what is on the removable drive is a bit-for-bit copy of what's on the central machine.

So far I have tried using df to count the number of blocks used on the central machine's data partitions and compare it to the blocks used on the backup drive. The numbers differ but are pretty close. Somehow I get the feeling this is not an accurate way to compare-- especially since both drives are using different filesystems (main is on reiser, backup is ext3).

Furthermore this is ~1TB of other peoples' data so it would not be practical or ethical for me to try to manually examine every single file to make sure it is intact.

I know md5sum is a way to check the integrity of a single file, but judging from the man page it doesn't have a recursive option or any way to compare hashes. I found this Windows program which seems to be able to scan recursively and compare. Unfortunately it does me no good here :P

Any tips? How do you manage to sleep at night when you consider your backup scheme?

Old 01-31-2009, 03:28 PM   #2
Senior Member
Registered: May 2008
Posts: 4,370
Blog Entries: 7

Rep: Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815Reputation: 1815
how about something along the lines of

find /original -type f -exec md5sum {} \; >/tmp/md5sums.orig
find /copy -type f -exec md5sum {} \; >/tmp/md5sums.copy
Then compare the results with
md5sum /tmp/md5sums.*
Or, if you want more detail on what changed.
diff /tmp/md5sums.orig /tmp/md5sums.copy

Tweak to taste.

edit: forgot to add, if you don't want the processing overhead of running md5sum against everything, then replacing md5sum with some invocation of stat such as 'stat -c '%n %s' in the 2 find commands should work reasonably well, but obviously won't be as thorough a check.

Last edited by GazL; 01-31-2009 at 03:56 PM. Reason: revised: I decided 'stat' was a more appropriate choice than ls.
Old 02-02-2009, 01:54 AM   #3
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,374

Rep: Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383Reputation: 2383
According to the man page, rsync does a checksum after its copied each file, separate from the checksum it does to see if the file needs transferring:
-c, --checksum
This forces the sender to checksum every regular file using a
128-bit MD4 checksum. It does this during the initial file-sys-
tem scan as it builds the list of all available files. The
receiver then checksums its version of each file (if it exists
and it has the same size as its sender-side counterpart) in
order to decide which files need to be updated: files with
either a changed size or a changed checksum are selected for
transfer. Since this whole-file checksumming of all files on
both sides of the connection occurs in addition to the automatic
checksum verifications that occur during a file’s transfer, this
option can be quite slow.

Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking its
whole-file checksum, but that automatic after-the-transfer veri-
fication has nothing to do with this option’s before-the-trans-
fer "Does this file need to be updated?" check.
2nd para ...


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
testing iptables performance testing pavan.daemon Linux - Networking 2 09-28-2007 05:22 PM
Replace 'etch' with 'testing' in /etc/apt/sources.list to track 'testing' branch? Akhran Debian 3 04-09-2007 10:45 AM
Sarge-testing To Etch-testing? SMurf7 Debian 3 02-21-2006 10:59 PM
FC 3 integrity nanoprobe Fedora 9 01-18-2005 11:50 PM
Setup as getting debian testing files from ftp - will it stay with testing BrianHenderson Debian 2 09-02-2004 06:06 PM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:26 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration