Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
12-19-2006, 11:46 AM
|
#1
|
|
Member
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55
Rep:
|
Data Integrity Checks
Apologies upfront if this is a misplaced post...
I will be shuttling close to 30-50 GB data from one server to another, possibly over the network (LAN). It's going to be a periodic task, and I am investigating means of verifying the integrity of the transfer (since I will be erasing the data off the source).
A couple of questions:
1. Is md5sum the best means to verify the integrity of data transfers? Are there better/newer approaches?
2. Is there a function in the Perl library that will permit to check data integrity?
I hope these aren't stupid questions. Any response suggestions would be much appreciated. Thanks.
Itnaa Sarakaam

|
|
|
|
12-20-2006, 12:23 PM
|
#2
|
|
Guru
Registered: Jan 2002
Posts: 6,042
Rep: 
|
Look into dar. It is like tar but it is designed for backups. You can also use cpio using its error correction option.
|
|
|
|
12-20-2006, 01:16 PM
|
#3
|
|
Senior Member
Registered: Nov 2004
Location: Texas
Distribution: RHEL, Debian, FreeBSD, Ubuntu (desktop)
Posts: 3,859
Rep: 
|
Quote:
|
Is md5sum the best means to verify the integrity of data transfers?
|
md5 would be appropriate for this task. (Alternatives include sha1 and sha256, etc.)
|
|
|
|
12-21-2006, 07:59 AM
|
#4
|
|
Member
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55
Original Poster
Rep:
|
Thanks Folks. Appreciate your responses. Will follow up on your leads & suggestions.
|
|
|
|
12-21-2006, 03:09 PM
|
#5
|
|
Member
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55
Original Poster
Rep:
|
... a bit of searching around... it appears that the key phrase is "message digest". It does appear that "they" would be safe? Ie, they are to be used to compute "uniqueness" as opposed to a focus on security. My impression is that SHA based algorithms are "better" than the MD5 ones...
If someone could point out to me the "safest" approach (as in uniqueness, as opposed to security), I would much appreciate it!!!!
A couple of links for the interested:
http://search.cpan.org/~gaas/Digest-1.15/Digest.pm
http://en.wikipedia.org/wiki/Cryptog..._hash_function
The former answers my question about Perl CPAN library functions. Both the above also have many other algorithms in play, but do linux implementations incorporate the others beyond MD5 & SHA algorithms?
An additional concern I have is typically how long would it take to generate the appropriate sum for a data block roughly DVD sized, and also if it were around 30GB-block? Are there any practical tips as how to approach generation of such sums for very large datasets? For eg, is it wise to generate these for the whole block or chop it up into smaller sub-blocks and then compute a series of such sums?
Any thoughts/tips would be much appreciated. Thanks.
|
|
|
|
12-21-2006, 03:19 PM
|
#6
|
|
Member
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55
Original Poster
Rep:
|
Thanks for the dar reference. I checked it out and it uses CRC for confirming integrity. I am a bit confused about their statement:
"... hanks to CRC (cyclic redundancy checks), dar is able to detect data corruption in the archive. Only the file where data corruption occurred will not be possible to restore, but dar will restore the others even when compression or encryption (or both) is used...."
Question: after it archives that data, wouldn't it check to see if the process was truly successful? And perhaps redo it, if not?
I would hate to be told, during restoration, that a file was corrupt. I'm naive & a newbie in these matters, perhaps I've misunderstood things?
|
|
|
|
12-21-2006, 07:26 PM
|
#7
|
|
Guru
Registered: Jan 2002
Posts: 6,042
Rep: 
|
When the backup is done, you have to test it to make sure it is correct, but do not test it on the same system. Test the backup on another similar computer. If it works, then you can state the condition as good.
Data will not always be 100% perfect. In the real world perfection is not any more than 99%, so data will have some corruption.
To keep the chances of data corruption down, put both computers on a line conditioner using excellent power supplies and using ECC and parity memory.
What dar can do is what they have said. If CRC or both compressiong and CRC is used and one or many areas in the dar archive is corrupted, it can resume restoring files in the archive. If you have incremental and differential backups, these files can then be hopefully saved if the full can not restore them.
Like the Nike slogan. Just do it.
|
|
|
|
12-22-2006, 01:28 PM
|
#8
|
|
Member
Registered: Dec 2006
Distribution: CentOS 4.4 (2.6.9-42.0.2.ELsmp)
Posts: 55
Original Poster
Rep:
|
thanks electro. do appreciate your suggestions & thoughts. as you note... time to take the plunge... 
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 07:37 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|