LinuxQuestions.org
Social Bookmarking all things Linux and Open Source
Go Back   LinuxQuestions.org > Forums > Linux > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Thread Tools
Old 12-09-2008, 01:25 AM   #1
SentralOrigin
Member
 
Registered: Jul 2005
Distribution: Gentoo, Ubuntu
Posts: 251
Thanked: 0
How to create md5sum for a directory?


[Log in to get rid of this advertisement]
I just copied over a 100 GB directory with lots of other directories and files within that one directory. How can I make sure everything was copied fine and that nothing went wrong and I'm missing bits?
SentralOrigin is offline     Reply With Quote
Old 12-09-2008, 04:01 AM   #2
colucix
Senior Member
 
Registered: Sep 2003
Location: Bologna, Italia
Distribution: OpenSUSE 11.1 CentOS 5.3 VectorLinux 6.0
Posts: 4,352
Thanked: 319
It is not possible to compute a checksum for an entire directory. You have to recursively check every single file. Two alternatives:

1) use find in conjunction with md5sum:
Code:
find directory -type f -print0 | xargs -0 md5sum >> file.md5
this will store the checksum of all the files inside the directory into file.md5, then you can check the copied directory against this file. Go to the location of the copied directory and issue:
Code:
md5sum -c /path/to/file.md5
2) you can install md5deep, which has a recursive option
Code:
md5deep -rl directory > file.md5
then proceed as above.
colucix is online now     Reply With Quote
Old 12-09-2008, 09:09 AM   #3
sundialsvcs
Senior Member
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 2,616
Thanked: 30
I typically use rsync to do the copies.

This program is designed to "synchronize files and directories." One of the tricks that it uses is MD5, which it uses to determine if a file needs to be copied. If you use this tool, it will probably accomplish your objective for you, with no further programming tricks required.
sundialsvcs is offline     Reply With Quote
Old 12-09-2008, 11:21 AM   #4
SentralOrigin
Member
 
Registered: Jul 2005
Distribution: Gentoo, Ubuntu
Posts: 251
Thanked: 0

Original Poster
Is there any way to made md5sum print out only the failed hashes and write it to a blank file? Because I have thousands of files and I can't look through thousands of lines to see which ones failed.
SentralOrigin is offline     Reply With Quote
Old 12-09-2008, 11:49 AM   #5
colucix
Senior Member
 
Registered: Sep 2003
Location: Bologna, Italia
Distribution: OpenSUSE 11.1 CentOS 5.3 VectorLinux 6.0
Posts: 4,352
Thanked: 319
Code:
md5sum -c file.md5 | grep FAILED$ > failed_hashes
md5sum -c file.md5 | grep -v OK$ > failed_hashes
colucix is online now     Reply With Quote
Old 12-09-2008, 12:35 PM   #6
malaprop
LQ Newbie
 
Registered: Dec 2008
Location: TX
Distribution: Ubuntu 8.10
Posts: 26
Thanked: 0
Why not just use diff?

Code:
diff dir1 dir2
It'll give a listing of the differences between the two directories.
malaprop is offline     Reply With Quote
Old 12-09-2008, 02:22 PM   #7
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: 100% Slackware or Slamd64
Posts: 6,231
Blog Entries: 2
Thanked: 159
With so many files, md5sum will take forever, and will be difficult to work with. I would also recommend rsync for this. Normally, yes I would use md5sum, but for 100 GB ... mmm I dunno.
H_TeXMeX_H is online now     Reply With Quote
Old 12-09-2008, 02:30 PM   #8
SentralOrigin
Member
 
Registered: Jul 2005
Distribution: Gentoo, Ubuntu
Posts: 251
Thanked: 0

Original Poster
Quote:
Originally Posted by colucix View Post
Code:
md5sum -c file.md5 | grep FAILED$ > failed_hashes
md5sum -c file.md5 | grep -v OK$ > failed_hashes
Thanks, I just finished the first md5sum command, working on the second one now.

Quote:
Originally Posted by sundialsvcs View Post
I typically use rsync to do the copies.

This program is designed to "synchronize files and directories." One of the tricks that it uses is MD5, which it uses to determine if a file needs to be copied. If you use this tool, it will probably accomplish your objective for you, with no further programming tricks required.
Quote:
Originally Posted by H_TeXMeX_H View Post
With so many files, md5sum will take forever, and will be difficult to work with. I would also recommend rsync for this. Normally, yes I would use md5sum, but for 100 GB ... mmm I dunno.
How would I use rsync?
SentralOrigin is offline     Reply With Quote
Old 12-09-2008, 04:08 PM   #9
colucix
Senior Member
 
Registered: Sep 2003
Location: Bologna, Italia
Distribution: OpenSUSE 11.1 CentOS 5.3 VectorLinux 6.0
Posts: 4,352
Thanked: 319
Quote:
Originally Posted by SentralOrigin View Post
Thanks, I just finished the first md5sum command, working on the second one now.
Well, if you're still online you can interrupt the second one. I should have specified they are two alternatives which give identical results. The first one grep for FAILED, the second one grep excluding OK and leaving FAILED as well. As for your requirement. Anyway, a double check does not harm.
Quote:
Originally Posted by SentralOrigin View Post
How would I use rsync?
If you're new to rsync is difficult to explain in a few world. You may take a look at the documentation, here. Its basic usage is to syncronize the content of two directories on two different machines. For example
Code:
rsync -avz -e ssh /path/to/local/dir user@host:/path/to/remote/dir
but don't use such a command before reading the man page or some documentation about rsync. It can be useful for future reference, anyway.
colucix is online now     Reply With Quote
The Following User Says Thank You to colucix For This Useful Post:
Old 12-10-2008, 03:31 AM   #10
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: 100% Slackware or Slamd64
Posts: 6,231
Blog Entries: 2
Thanked: 159
Well, if you've already done the md5sum, then copy it over to the other directory and run 'md5sum -c' on it, and either output to a file or grep for FAILED.
H_TeXMeX_H is online now     Reply With Quote
Old 06-17-2009, 11:03 PM   #11
@Loss
LQ Newbie
 
Registered: Jun 2009
Posts: 1
Thanked: 0
Maybe worth noting this:
The command 'md5sum <file>' does not work for large files. I think 32GB is the limit (need to confirm the limit.)
'cat <file> | md5sum' always works.
-----------------
If you are only interested in comparing two directories and not so much in learning-by-doing, you can just copy and use the following perl script. I wrote it out of daily need, and it's good enough. It is made a little elaborate in order to work for all filename/dirname characters in ascii range [32..255]. Just a newline in filename can fail simpler approaches.

Usage is: perl -w dircompare.pl <orig-dir-path> <new-dir-path>
The paths can be on different file systems.

#! /usr/bin/perl -w
use strict;
use warnings;
use Cwd;

my $cwd = cwd;
print "Current directory: $cwd\n";
my $hold = {};
my ($odir, $cdir) = (shift, shift);
foreach my $dir ($odir, $cdir) {
print "$dir\n";
chdir $dir || die "Error: Could not chdir to $dir\n";
my @list = `find . -type f -exec md5sum {} \\;`;
my $h = {};
foreach (@list) {
/^([^ ]+) (.*)$/;
$h->{$2} = $1;
}
$hold->{$dir} = $h;
chdir $cwd;
}

my @okeys = keys %{$hold->{$odir}};
print "Note: Original Directory has ", $#okeys + 1, " files\n";
my @ckeys = keys %{$hold->{$cdir}};
print " Compared Directory has ", $#ckeys + 1, " files\n\n\n";

my (@kdne, @md5mm, @exk);
for (@okeys) {
if (!exists ${$hold->{$cdir}}{$_}) {
push @kdne, $_;
next;
}
if (${$hold->{$cdir}}{$_} ne ${$hold->{$odir}}{$_}) {
push @md5mm, $_;
}
}

for (@ckeys) {
if (!exists ${$hold->{$odir}}{$_}) {
push @exk, $_;
}
}

#LOGGING:
print "Error Type A: missing file or read denied in $cdir ...\n";
if(!@kdne) {
print " ... no errors\n";
}
else {
print "ErrorTypeA $_\n" for @kdne;
}
print "\n\n";
print "Error Type B: md5 mismatch between $cdir and $odir ...\n";
if(!@md5mm) {
print " ... no errors\n";
}
else {
print "ErrorTypeB $_\n" for @md5mm;
}
print "\n\n";
print "Error Type C: Extra/modified paths in $cdir ...\n";
if(!@exk) {
print " ... no errors\n";
}
else {
print "ErrorTypeC $_\n" for @exk;
}
print "\n\n END OF REPORT\n\n";
@Loss is offline     Reply With Quote

Reply

Bookmarks


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
create directory.?? peedaro Programming 9 11-20-2007 01:24 AM
create a directory in c++ crapodino Programming 12 02-01-2006 08:23 PM
md5sum on a directory with lots of files. solarmax Linux - Software 5 08-11-2005 05:22 PM
can't create new directory dummoi Linux - Newbie 4 06-24-2004 02:50 PM
Create a directory SnowSurfAir Linux - Software 15 07-21-2003 06:12 PM


All times are GMT -5. The time now is 02:37 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
RSS2  LQ Podcast
RSS2  LQ Radio
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration