LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-07-2011, 08:49 AM   #1
chris_carr
Member
 
Registered: Oct 2011
Location: Houston Tx
Distribution: RHEL 6
Posts: 57

Rep: Reputation: 0
Angry md5sum compare script


Ok so I have been working on a script for some time now and it seems everytime I get close I hit another brick wall.

What I'm trying to do:

I have DB files that have duplicate entries in them. I writing a script that will check the md5sum has keys and compare them to see if they match. So far I have not been able to get the script to jump into each file look at the keys and tell me if they are the same.

My script:
####################################################
#!/bin/sh

if [ "$(md5sum /home/path/file1.txt)" != "$(md5sum /home/path/file2.txt)" ]; then
echo "Files are the same"
else
echo "Files are different"
fi

####################################################

I know for a fact that not all of the md5sum hash keys are the same in both of these files but for some reason the script always says they are.


Example of the 2 files:

d1d4c848ed02854c611f83b0125485ee /pathtofile/file1

There are 2 files with 216 hash keys and path to files just like the example above, file 1 and file 2. but I do not see what I'm doing wrong here.

What I want the script to do:

I want my script to look at both file1 and file2 and compare each of the hash keys in each file and tell me if they are the same or if they are diffent. If they are different then I want them outputed to a seperate file. If anybody can help with this than please let me know.
 
Old 11-07-2011, 09:07 AM   #2
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,577
Blog Entries: 14

Rep: Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969
Code:
#!/bin/sh

if [ "$(md5sum /home/path/file1.txt |awk '{print $1)')" != "$(md5sum /home/path/file2.txt |awk '{print $1}')" ]; then
echo "Files are the same"
else
echo "Files are different"
fi
Add the awk to the output of md5sum so it limits the output to the sum value and exlcudes the file name. The files names are different so your original comparision was correctly saying they weren't equal due to that difference.
 
Old 11-07-2011, 09:12 AM   #3
chris_carr
Member
 
Registered: Oct 2011
Location: Houston Tx
Distribution: RHEL 6
Posts: 57

Original Poster
Rep: Reputation: 0
I tried your suggestion but I still have the same issue. The script is telling me they are all the same and everything is fine when I know they are not.
 
Old 11-07-2011, 09:22 AM   #4
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,577
Blog Entries: 14

Rep: Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969Reputation: 969
Please manually run the md5sum against your files and put the command lines used and the output here. You may THINK they are different but are not so may be troubleshooting the wrong issue. I suspect you may not be checking what you think you're checking.

When I checked the above syntax on my local system it worked fine. On two files that had same md5sum it said they were the same and on two that were different it said they were different.
 
Old 11-07-2011, 09:28 AM   #5
chris_carr
Member
 
Registered: Oct 2011
Location: Houston Tx
Distribution: RHEL 6
Posts: 57

Original Poster
Rep: Reputation: 0
I think this what you were asking me to do.
Attached Thumbnails
Click image for larger version

Name:	capture.jpg
Views:	73
Size:	101.5 KB
ID:	8324  
 
Old 11-07-2011, 10:17 AM   #6
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by chris_carr View Post
I tried your suggestion but I still have the same issue. The script is telling me they are all the same and everything is fine when I know they are not.
There's a typo in the script. Change != to =
 
Old 11-07-2011, 10:22 AM   #7
chris_carr
Member
 
Registered: Oct 2011
Location: Houston Tx
Distribution: RHEL 6
Posts: 57

Original Poster
Rep: Reputation: 0
Wink

Thank you catkin,

That did make the script tell me that the files are indeed different. Now all I need to do is figure out how to make the script test each file seperatly then tell me which ones are different.
 
Old 11-07-2011, 10:39 AM   #8
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
How about writing the md5sum and the file name to a temporary file and then sorting the file by md5sums? Something like:
Code:
#!/bin/bash

tmp_fn=/tmp/a-unique-name
dir=/home/path
> $tmp_fn    # Empty the file
while IFS= read -r -d '' file
do
   md5sum "$file" >> "$tmp_fn"
done < <(find $dir -type f -print0)
sort "$tmp_fn" > "$tmp_fn.sorted"
 
Old 11-07-2011, 11:14 AM   #9
chris_carr
Member
 
Registered: Oct 2011
Location: Houston Tx
Distribution: RHEL 6
Posts: 57

Original Poster
Rep: Reputation: 0
Hello catkin,

I have already done that with the md5Aoutput.txt and the md5Boutput.txt. Those are the 2 files I'm trying to compare.

################################################################################
bash-3.2$ head -5 md5Aoutput.txt
d1d4c848ed02854c611f83b0125485ee /home/path/path/file1
3feae4ce1e2bef61a74804176849b742 /home/path/path/file2
fc27d45c1ea2ac04b96bb6ccd86312c7 /home/path/path/file3
95ae4982b20ef6adc68582dd058a3b0a /home/path/path/file4
6284a7a28f55ab1b65a001185c73f4b3 /home/path/path/file5
bash-3.2$ head -5 md5Boutput.txt
7e03a8212bd750a533f4d34a0d7ea9b7 /home/path/path/file1
3feae4ce1e2bef61a74804176849b742 /home/path/path/file2
fc27d45c1ea2ac04b96bb6ccd86312c7 /home/path/path/file3
d1ab73508a98465cf09ab386a26e6d7d /home/path/path/file4
6284a7a28f55ab1b65a001185c73f4b3 /home/path/path/file5
#################################################################################

Those are just 5 of the 216 keys in each file.
 
Old 11-07-2011, 12:15 PM   #10
rreyes79
LQ Newbie
 
Registered: Oct 2010
Posts: 5

Rep: Reputation: 0
I am not at home right now so I can't test it, but your if statement is a little convoluted, and the sytax for looking up the md5 hash is a little messed up. Also your not equal should be changed to equal. I would break it up into something like this...

Code:
hash1=`md5sum /path/to/file1 | awk '{print $1}'`
hash2=`md5sum /path/to/file2 | awk '{print $1}'`
if [ $hash1 -eq $hash2 ]
then
  echo "Files are the same."
else
  echo "Files are different."
fi
Edit: I was responding to your original post and doing other work at the same time. It looks like other people got in before me

Last edited by rreyes79; 11-07-2011 at 12:23 PM.
 
Old 11-07-2011, 12:20 PM   #11
jmc1987
Member
 
Registered: Sep 2009
Location: Oklahoma
Distribution: Debian, CentOS, windows 7
Posts: 872

Rep: Reputation: 112Reputation: 112
I think MensaWater meant for you to run the files like this

$md5sum -c file.txt

to both of your files checking output.
 
Old 11-07-2011, 12:45 PM   #12
chris_carr
Member
 
Registered: Oct 2011
Location: Houston Tx
Distribution: RHEL 6
Posts: 57

Original Poster
Rep: Reputation: 0
@ rreyes79 Doing it that way just gives me the same issue. Its says they are exactly the same when they are not.
The way I had it afterward I was able to get md5 to at least say they were different...which is true.


@ jmc1987..ok...well I did it that way last week and they all come out and say all 216 keys in A and B are "OK"...so that really tells me nothing...what is md5sum comparing it to when you just run the -c by itself???? I dont understand how it says the file is ok...ok compared to what??


Now you guys see why I have been having so much trouble. Not only is it been a big pain in the AXX but explaining it has proven to just as complicated lol.


All I need is for the script to compare the 216 hash keys in md5Aoutput.txt to the 216 coresponding hash keys in md5Boutput.txt. Im starting to think that the md5sum command is not going to be the best tool for this job.
 
Old 11-07-2011, 12:54 PM   #13
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
I think I have understood the requirement now ...

Code:
#!/bin/bash

while read md5sum file
do
    matches=$( grep $md5sum md5Boutput.txt )
    [[ $matches != '' ]] && echo "$file:"$'\n'"$matches"
done < md5Aoutput.txt
 
1 members found this post helpful.
Old 11-07-2011, 01:39 PM   #14
chris_carr
Member
 
Registered: Oct 2011
Location: Houston Tx
Distribution: RHEL 6
Posts: 57

Original Poster
Rep: Reputation: 0
@catkin....This is the output from the script you wrote. I think this is what I'm looking for. But sence im not a expert programer I'm going to take a stab at what this is. Is this out put for all the files that were the same? Or the files that are different?


/home/path/dir/file1:
3feae4ce1e2bef61a74804176849b742 /home/path/dir/file1
/home/path/dir/file2:
fc27d45c1ea2ac04b96bb6ccd86312c7 /home/path/dir/file2
/home/path/dir/file3:
6284a7a28f55ab1b65a001185c73f4b3 /home/path/dir/file3
/home/path/dir/file4:
9b880143ebf1171949afb6bdd7d73d0f /home/path/dir/file4
/home/path/dir/file5:
804806f933311e0b8b7b1dd15b4cd16b /home/path/dir/file5

Last edited by chris_carr; 11-07-2011 at 01:42 PM.
 
Old 11-07-2011, 01:47 PM   #15
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Quote:
Originally Posted by chris_carr View Post
Is this out put for all the files that were the same? Or the files that are different?
It is files that are listed in both md5Aoutput.txt and md5Boutput.txt with the same MD5 sum.

The
Code:
while read md5sum file
do
    ...
done < md5Aoutput.txt
reads md5Aoutput.txt line by line, parsing the MD5 sum into $md5sum and the rest (split at the space) into $file.

Then
Code:
    matches=$( grep $md5sum md5Boutput.txt )
    [[ $matches != '' ]] && echo "$file:"$'\n'"$matches"
searches md5Boutput.txt for the MD5 sum, putting any matching lines in $matches, then it tests the contents of $matches and, if it is not empty, prints what you have seen.

Last edited by catkin; 11-07-2011 at 01:47 PM. Reason: typos
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
compare size script trintukaz Programming 36 09-24-2011 08:58 AM
Back up script (find, cp, md5sum, rm) Kuro Linux - Newbie 3 10-22-2010 02:08 PM
Please help on making a compare script Magil Programming 7 12-14-2009 06:13 AM
md5sum script Axion Linux - Software 4 02-14-2004 08:56 PM
Help with a Directory Compare Script bullfrog Linux - General 1 02-04-2003 09:05 AM


All times are GMT -5. The time now is 06:04 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration