Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I just copied over a 100 GB directory with lots of other directories and files within that one directory. How can I make sure everything was copied fine and that nothing went wrong and I'm missing bits?
this will store the checksum of all the files inside the directory into file.md5, then you can check the copied directory against this file. Go to the location of the copied directory and issue:
Code:
md5sum -c /path/to/file.md5
2) you can install md5deep, which has a recursive option
This program is designed to "synchronize files and directories." One of the tricks that it uses is MD5, which it uses to determine if a file needs to be copied. If you use this tool, it will probably accomplish your objective for you, with no further programming tricks required.
Is there any way to made md5sum print out only the failed hashes and write it to a blank file? Because I have thousands of files and I can't look through thousands of lines to see which ones failed.
With so many files, md5sum will take forever, and will be difficult to work with. I would also recommend rsync for this. Normally, yes I would use md5sum, but for 100 GB ... mmm I dunno.
Thanks, I just finished the first md5sum command, working on the second one now.
Quote:
Originally Posted by sundialsvcs
I typically use rsync to do the copies.
This program is designed to "synchronize files and directories." One of the tricks that it uses is MD5, which it uses to determine if a file needs to be copied. If you use this tool, it will probably accomplish your objective for you, with no further programming tricks required.
Quote:
Originally Posted by H_TeXMeX_H
With so many files, md5sum will take forever, and will be difficult to work with. I would also recommend rsync for this. Normally, yes I would use md5sum, but for 100 GB ... mmm I dunno.
Thanks, I just finished the first md5sum command, working on the second one now.
Well, if you're still online you can interrupt the second one. I should have specified they are two alternatives which give identical results. The first one grep for FAILED, the second one grep excluding OK and leaving FAILED as well. As for your requirement. Anyway, a double check does not harm.
Quote:
Originally Posted by SentralOrigin
How would I use rsync?
If you're new to rsync is difficult to explain in a few world. You may take a look at the documentation, here. Its basic usage is to syncronize the content of two directories on two different machines. For example
Well, if you've already done the md5sum, then copy it over to the other directory and run 'md5sum -c' on it, and either output to a file or grep for FAILED.
Maybe worth noting this:
The command 'md5sum <file>' does not work for large files. I think 32GB is the limit (need to confirm the limit.)
'cat <file> | md5sum' always works.
-----------------
If you are only interested in comparing two directories and not so much in learning-by-doing, you can just copy and use the following perl script. I wrote it out of daily need, and it's good enough. It is made a little elaborate in order to work for all filename/dirname characters in ascii range [32..255]. Just a newline in filename can fail simpler approaches.
Usage is: perl -w dircompare.pl <orig-dir-path> <new-dir-path>
The paths can be on different file systems.
#! /usr/bin/perl -w
use strict;
use warnings;
use Cwd;
my $cwd = cwd;
print "Current directory: $cwd\n";
my $hold = {};
my ($odir, $cdir) = (shift, shift);
foreach my $dir ($odir, $cdir) {
print "$dir\n";
chdir $dir || die "Error: Could not chdir to $dir\n";
my @list = `find . -type f -exec md5sum {} \\;`;
my $h = {};
foreach (@list) {
/^([^ ]+) (.*)$/;
$h->{$2} = $1;
}
$hold->{$dir} = $h;
chdir $cwd;
}
my @okeys = keys %{$hold->{$odir}};
print "Note: Original Directory has ", $#okeys + 1, " files\n";
my @ckeys = keys %{$hold->{$cdir}};
print " Compared Directory has ", $#ckeys + 1, " files\n\n\n";
my (@kdne, @md5mm, @exk);
for (@okeys) {
if (!exists ${$hold->{$cdir}}{$_}) {
push @kdne, $_;
next;
}
if (${$hold->{$cdir}}{$_} ne ${$hold->{$odir}}{$_}) {
push @md5mm, $_;
}
}
for (@ckeys) {
if (!exists ${$hold->{$odir}}{$_}) {
push @exk, $_;
}
}
#LOGGING:
print "Error Type A: missing file or read denied in $cdir ...\n";
if(!@kdne) {
print " ... no errors\n";
}
else {
print "ErrorTypeA $_\n" for @kdne;
}
print "\n\n";
print "Error Type B: md5 mismatch between $cdir and $odir ...\n";
if(!@md5mm) {
print " ... no errors\n";
}
else {
print "ErrorTypeB $_\n" for @md5mm;
}
print "\n\n";
print "Error Type C: Extra/modified paths in $cdir ...\n";
if(!@exk) {
print " ... no errors\n";
}
else {
print "ErrorTypeC $_\n" for @exk;
}
print "\n\n END OF REPORT\n\n";
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.