LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 12-09-2008, 01:25 AM   #1
SentralOrigin
Member
 
Registered: Jul 2005
Distribution: Gentoo, Ubuntu
Posts: 318

Rep: Reputation: 30
How to create md5sum for a directory?


I just copied over a 100 GB directory with lots of other directories and files within that one directory. How can I make sure everything was copied fine and that nothing went wrong and I'm missing bits?
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 12-09-2008, 04:01 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
It is not possible to compute a checksum for an entire directory. You have to recursively check every single file. Two alternatives:

1) use find in conjunction with md5sum:
Code:
find directory -type f -print0 | xargs -0 md5sum >> file.md5
this will store the checksum of all the files inside the directory into file.md5, then you can check the copied directory against this file. Go to the location of the copied directory and issue:
Code:
md5sum -c /path/to/file.md5
2) you can install md5deep, which has a recursive option
Code:
md5deep -rl directory > file.md5
then proceed as above.
 
1 members found this post helpful.
Old 12-09-2008, 09:09 AM   #3
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,647
Blog Entries: 4

Rep: Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933
I typically use rsync to do the copies.

This program is designed to "synchronize files and directories." One of the tricks that it uses is MD5, which it uses to determine if a file needs to be copied. If you use this tool, it will probably accomplish your objective for you, with no further programming tricks required.
 
Old 12-09-2008, 11:21 AM   #4
SentralOrigin
Member
 
Registered: Jul 2005
Distribution: Gentoo, Ubuntu
Posts: 318

Original Poster
Rep: Reputation: 30
Is there any way to made md5sum print out only the failed hashes and write it to a blank file? Because I have thousands of files and I can't look through thousands of lines to see which ones failed.
 
Old 12-09-2008, 11:49 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Code:
md5sum -c file.md5 | grep FAILED$ > failed_hashes
md5sum -c file.md5 | grep -v OK$ > failed_hashes
 
Old 12-09-2008, 12:35 PM   #6
malaprop
LQ Newbie
 
Registered: Dec 2008
Location: TX
Distribution: Ubuntu 8.10
Posts: 26

Rep: Reputation: 16
Why not just use diff?

Code:
diff dir1 dir2
It'll give a listing of the differences between the two directories.
 
Old 12-09-2008, 02:22 PM   #7
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
With so many files, md5sum will take forever, and will be difficult to work with. I would also recommend rsync for this. Normally, yes I would use md5sum, but for 100 GB ... mmm I dunno.
 
Old 12-09-2008, 02:30 PM   #8
SentralOrigin
Member
 
Registered: Jul 2005
Distribution: Gentoo, Ubuntu
Posts: 318

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by colucix View Post
Code:
md5sum -c file.md5 | grep FAILED$ > failed_hashes
md5sum -c file.md5 | grep -v OK$ > failed_hashes
Thanks, I just finished the first md5sum command, working on the second one now.

Quote:
Originally Posted by sundialsvcs View Post
I typically use rsync to do the copies.

This program is designed to "synchronize files and directories." One of the tricks that it uses is MD5, which it uses to determine if a file needs to be copied. If you use this tool, it will probably accomplish your objective for you, with no further programming tricks required.
Quote:
Originally Posted by H_TeXMeX_H View Post
With so many files, md5sum will take forever, and will be difficult to work with. I would also recommend rsync for this. Normally, yes I would use md5sum, but for 100 GB ... mmm I dunno.
How would I use rsync?
 
Old 12-09-2008, 04:08 PM   #9
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by SentralOrigin View Post
Thanks, I just finished the first md5sum command, working on the second one now.
Well, if you're still online you can interrupt the second one. I should have specified they are two alternatives which give identical results. The first one grep for FAILED, the second one grep excluding OK and leaving FAILED as well. As for your requirement. Anyway, a double check does not harm.
Quote:
Originally Posted by SentralOrigin View Post
How would I use rsync?
If you're new to rsync is difficult to explain in a few world. You may take a look at the documentation, here. Its basic usage is to syncronize the content of two directories on two different machines. For example
Code:
rsync -avz -e ssh /path/to/local/dir user@host:/path/to/remote/dir
but don't use such a command before reading the man page or some documentation about rsync. It can be useful for future reference, anyway.
 
Old 12-10-2008, 03:31 AM   #10
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Well, if you've already done the md5sum, then copy it over to the other directory and run 'md5sum -c' on it, and either output to a file or grep for FAILED.
 
Old 06-17-2009, 11:03 PM   #11
@Loss
LQ Newbie
 
Registered: Jun 2009
Posts: 1

Rep: Reputation: 0
Maybe worth noting this:
The command 'md5sum <file>' does not work for large files. I think 32GB is the limit (need to confirm the limit.)
'cat <file> | md5sum' always works.
-----------------
If you are only interested in comparing two directories and not so much in learning-by-doing, you can just copy and use the following perl script. I wrote it out of daily need, and it's good enough. It is made a little elaborate in order to work for all filename/dirname characters in ascii range [32..255]. Just a newline in filename can fail simpler approaches.

Usage is: perl -w dircompare.pl <orig-dir-path> <new-dir-path>
The paths can be on different file systems.

#! /usr/bin/perl -w
use strict;
use warnings;
use Cwd;

my $cwd = cwd;
print "Current directory: $cwd\n";
my $hold = {};
my ($odir, $cdir) = (shift, shift);
foreach my $dir ($odir, $cdir) {
print "$dir\n";
chdir $dir || die "Error: Could not chdir to $dir\n";
my @list = `find . -type f -exec md5sum {} \\;`;
my $h = {};
foreach (@list) {
/^([^ ]+) (.*)$/;
$h->{$2} = $1;
}
$hold->{$dir} = $h;
chdir $cwd;
}

my @okeys = keys %{$hold->{$odir}};
print "Note: Original Directory has ", $#okeys + 1, " files\n";
my @ckeys = keys %{$hold->{$cdir}};
print " Compared Directory has ", $#ckeys + 1, " files\n\n\n";

my (@kdne, @md5mm, @exk);
for (@okeys) {
if (!exists ${$hold->{$cdir}}{$_}) {
push @kdne, $_;
next;
}
if (${$hold->{$cdir}}{$_} ne ${$hold->{$odir}}{$_}) {
push @md5mm, $_;
}
}

for (@ckeys) {
if (!exists ${$hold->{$odir}}{$_}) {
push @exk, $_;
}
}

#LOGGING:
print "Error Type A: missing file or read denied in $cdir ...\n";
if(!@kdne) {
print " ... no errors\n";
}
else {
print "ErrorTypeA $_\n" for @kdne;
}
print "\n\n";
print "Error Type B: md5 mismatch between $cdir and $odir ...\n";
if(!@md5mm) {
print " ... no errors\n";
}
else {
print "ErrorTypeB $_\n" for @md5mm;
}
print "\n\n";
print "Error Type C: Extra/modified paths in $cdir ...\n";
if(!@exk) {
print " ... no errors\n";
}
else {
print "ErrorTypeC $_\n" for @exk;
}
print "\n\n END OF REPORT\n\n";
 
Old 03-02-2012, 12:53 AM   #12
W3ird_N3rd
LQ Newbie
 
Registered: Mar 2012
Posts: 1

Rep: Reputation: Disabled
So sorry for kicking, but this is the first hit when searching for "md5sum directory" on Google and I hope I can help the next poor fool who finds this thread, searching with the wrong keywords.

The rsync suggestion is perfect and can be executed like this:
Code:
rsync -lrthvcn --delete /home/source/dir /home/destination/dir
Normally rsync would sync destination to be identical to source, but not in this case as it's a dry run, so it just checks whether or not there are any differences. If it outputs any filenames, that means there is a difference in them! If you get something like:

sending incremental file list

sent 174.92K bytes received 118 bytes 143.77 bytes/sec

and no filenames between those two lines, the contents are identical.

Just so you know what you're running, -lrthvcn stands for:

-l, --links copy symlinks as symlinks
-r, --recursive recurse into directories
-t, --times preserve modification times
-h, --human-readable output numbers in a human-readable format
-v, --verbose increase verbosity
-c, --checksum skip based on checksum, not mod-time & size
-n, --dry-run perform a trial run with no changes made
--delete delete extraneous files from destination dirs (don't worry as it's a dry run. do note when actually syncing and ask yourself if you want this.)
 
2 members found this post helpful.
Old 03-29-2012, 04:02 AM   #13
domsom
LQ Newbie
 
Registered: Mar 2012
Posts: 1

Rep: Reputation: Disabled
Like W3ird_N3rd, I was looking for the answer to "md5sum directory" on Google. While colucix' response provides the answer, the following basic extensions to his solution might be helpful for some:

To only get successful md5 sums into the checksum file (errors are written to the console):
Code:
find directory -type f -print0 | xargs -0 md5sum 1>> file.md5
To only show files that differ (and avoid scanning the result list for FAILED strings):
Code:
md5sum -c /path/to/file.md5 1> /dev/null
 
Old 04-09-2012, 12:14 PM   #14
anthalamus
LQ Newbie
 
Registered: Sep 2010
Posts: 4

Rep: Reputation: 0
wouldn't "tar -c . | md5sum" do the trick?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
create directory.?? peedaro Programming 9 11-20-2007 01:24 AM
create a directory in c++ crapodino Programming 12 02-01-2006 08:23 PM
md5sum on a directory with lots of files. solarmax Linux - Software 5 08-11-2005 05:22 PM
can't create new directory dummoi Linux - Newbie 4 06-24-2004 02:50 PM
Create a directory SnowSurfAir Linux - Software 15 07-21-2003 06:12 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration