Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I will be shuttling close to 30-50 GB data from one server to another, possibly over the network (LAN). It's going to be a periodic task, and I am investigating means of verifying the integrity of the transfer (since I will be erasing the data off the source).
A couple of questions:
1. Is md5sum the best means to verify the integrity of data transfers? Are there better/newer approaches?
2. Is there a function in the Perl library that will permit to check data integrity?
I hope these aren't stupid questions. Any response suggestions would be much appreciated. Thanks.
... a bit of searching around... it appears that the key phrase is "message digest". It does appear that "they" would be safe? Ie, they are to be used to compute "uniqueness" as opposed to a focus on security. My impression is that SHA based algorithms are "better" than the MD5 ones...
If someone could point out to me the "safest" approach (as in uniqueness, as opposed to security), I would much appreciate it!!!!
The former answers my question about Perl CPAN library functions. Both the above also have many other algorithms in play, but do linux implementations incorporate the others beyond MD5 & SHA algorithms?
An additional concern I have is typically how long would it take to generate the appropriate sum for a data block roughly DVD sized, and also if it were around 30GB-block? Are there any practical tips as how to approach generation of such sums for very large datasets? For eg, is it wise to generate these for the whole block or chop it up into smaller sub-blocks and then compute a series of such sums?
Any thoughts/tips would be much appreciated. Thanks.
Thanks for the dar reference. I checked it out and it uses CRC for confirming integrity. I am a bit confused about their statement:
"... hanks to CRC (cyclic redundancy checks), dar is able to detect data corruption in the archive. Only the file where data corruption occurred will not be possible to restore, but dar will restore the others even when compression or encryption (or both) is used...."
Question: after it archives that data, wouldn't it check to see if the process was truly successful? And perhaps redo it, if not?
I would hate to be told, during restoration, that a file was corrupt. I'm naive & a newbie in these matters, perhaps I've misunderstood things?
When the backup is done, you have to test it to make sure it is correct, but do not test it on the same system. Test the backup on another similar computer. If it works, then you can state the condition as good.
Data will not always be 100% perfect. In the real world perfection is not any more than 99%, so data will have some corruption.
To keep the chances of data corruption down, put both computers on a line conditioner using excellent power supplies and using ECC and parity memory.
What dar can do is what they have said. If CRC or both compressiong and CRC is used and one or many areas in the dar archive is corrupted, it can resume restoring files in the archive. If you have incremental and differential backups, these files can then be hopefully saved if the full can not restore them.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.