LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 12-19-2006, 11:46 AM   #1
itnaa
Member
 
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55

Rep: Reputation: 15
Data Integrity Checks


Apologies upfront if this is a misplaced post...

I will be shuttling close to 30-50 GB data from one server to another, possibly over the network (LAN). It's going to be a periodic task, and I am investigating means of verifying the integrity of the transfer (since I will be erasing the data off the source).

A couple of questions:

1. Is md5sum the best means to verify the integrity of data transfers? Are there better/newer approaches?

2. Is there a function in the Perl library that will permit to check data integrity?

I hope these aren't stupid questions. Any response suggestions would be much appreciated. Thanks.

Itnaa Sarakaam
 
Old 12-20-2006, 12:23 PM   #2
Electro
Guru
 
Registered: Jan 2002
Posts: 6,042

Rep: Reputation: Disabled
Look into dar. It is like tar but it is designed for backups. You can also use cpio using its error correction option.
 
Old 12-20-2006, 01:16 PM   #3
anomie
Senior Member
 
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Scientific Linux, Debian, Fedora, Lubuntu, FreeBSD
Posts: 3,930
Blog Entries: 5

Rep: Reputation: Disabled
Quote:
Is md5sum the best means to verify the integrity of data transfers?
md5 would be appropriate for this task. (Alternatives include sha1 and sha256, etc.)
 
Old 12-21-2006, 07:59 AM   #4
itnaa
Member
 
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55

Original Poster
Rep: Reputation: 15
Thanks Folks. Appreciate your responses. Will follow up on your leads & suggestions.
 
Old 12-21-2006, 03:09 PM   #5
itnaa
Member
 
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55

Original Poster
Rep: Reputation: 15
... a bit of searching around... it appears that the key phrase is "message digest". It does appear that "they" would be safe? Ie, they are to be used to compute "uniqueness" as opposed to a focus on security. My impression is that SHA based algorithms are "better" than the MD5 ones...

If someone could point out to me the "safest" approach (as in uniqueness, as opposed to security), I would much appreciate it!!!!

A couple of links for the interested:

http://search.cpan.org/~gaas/Digest-1.15/Digest.pm
http://en.wikipedia.org/wiki/Cryptog..._hash_function

The former answers my question about Perl CPAN library functions. Both the above also have many other algorithms in play, but do linux implementations incorporate the others beyond MD5 & SHA algorithms?

An additional concern I have is typically how long would it take to generate the appropriate sum for a data block roughly DVD sized, and also if it were around 30GB-block? Are there any practical tips as how to approach generation of such sums for very large datasets? For eg, is it wise to generate these for the whole block or chop it up into smaller sub-blocks and then compute a series of such sums?

Any thoughts/tips would be much appreciated. Thanks.
 
Old 12-21-2006, 03:19 PM   #6
itnaa
Member
 
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55

Original Poster
Rep: Reputation: 15
Thanks for the dar reference. I checked it out and it uses CRC for confirming integrity. I am a bit confused about their statement:

"... hanks to CRC (cyclic redundancy checks), dar is able to detect data corruption in the archive. Only the file where data corruption occurred will not be possible to restore, but dar will restore the others even when compression or encryption (or both) is used...."

Question: after it archives that data, wouldn't it check to see if the process was truly successful? And perhaps redo it, if not?

I would hate to be told, during restoration, that a file was corrupt. I'm naive & a newbie in these matters, perhaps I've misunderstood things?
 
Old 12-21-2006, 07:26 PM   #7
Electro
Guru
 
Registered: Jan 2002
Posts: 6,042

Rep: Reputation: Disabled
When the backup is done, you have to test it to make sure it is correct, but do not test it on the same system. Test the backup on another similar computer. If it works, then you can state the condition as good.

Data will not always be 100% perfect. In the real world perfection is not any more than 99%, so data will have some corruption.

To keep the chances of data corruption down, put both computers on a line conditioner using excellent power supplies and using ECC and parity memory.

What dar can do is what they have said. If CRC or both compressiong and CRC is used and one or many areas in the dar archive is corrupted, it can resume restoring files in the archive. If you have incremental and differential backups, these files can then be hopefully saved if the full can not restore them.

Like the Nike slogan. Just do it.
 
Old 12-22-2006, 01:28 PM   #8
itnaa
Member
 
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55

Original Poster
Rep: Reputation: 15
thanks electro. do appreciate your suggestions & thoughts. as you note... time to take the plunge...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
amavisd-new header checks clpl1980 Fedora 1 12-19-2006 02:48 AM
Hardware failure checks elthox Suse/Novell 1 10-27-2006 10:38 AM
creating tar files with high data integrity edman007 Linux - Software 13 10-10-2006 02:00 PM
Checks during bootup?? halo14 Slackware 1 09-23-2004 09:52 AM
Integrity checks on RH 6.1 munyard Linux - Security 1 12-11-2002 07:29 AM


All times are GMT -5. The time now is 02:18 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration