LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   md5sum compare script (https://www.linuxquestions.org/questions/linux-newbie-8/md5sum-compare-script-912326/)

chris_carr 11-07-2011 07:49 AM

md5sum compare script
 
Ok so I have been working on a script for some time now and it seems everytime I get close I hit another brick wall.

What I'm trying to do:

I have DB files that have duplicate entries in them. I writing a script that will check the md5sum has keys and compare them to see if they match. So far I have not been able to get the script to jump into each file look at the keys and tell me if they are the same.

My script:
####################################################
#!/bin/sh

if [ "$(md5sum /home/path/file1.txt)" != "$(md5sum /home/path/file2.txt)" ]; then
echo "Files are the same"
else
echo "Files are different"
fi

####################################################

I know for a fact that not all of the md5sum hash keys are the same in both of these files but for some reason the script always says they are.


Example of the 2 files:

d1d4c848ed02854c611f83b0125485ee /pathtofile/file1

There are 2 files with 216 hash keys and path to files just like the example above, file 1 and file 2. but I do not see what I'm doing wrong here.

What I want the script to do:

I want my script to look at both file1 and file2 and compare each of the hash keys in each file and tell me if they are the same or if they are diffent. If they are different then I want them outputed to a seperate file. If anybody can help with this than please let me know.

MensaWater 11-07-2011 08:07 AM

Code:

#!/bin/sh

if [ "$(md5sum /home/path/file1.txt |awk '{print $1)')" != "$(md5sum /home/path/file2.txt |awk '{print $1}')" ]; then
echo "Files are the same"
else
echo "Files are different"
fi

Add the awk to the output of md5sum so it limits the output to the sum value and exlcudes the file name. The files names are different so your original comparision was correctly saying they weren't equal due to that difference.

chris_carr 11-07-2011 08:12 AM

I tried your suggestion but I still have the same issue. The script is telling me they are all the same and everything is fine when I know they are not. :(

MensaWater 11-07-2011 08:22 AM

Please manually run the md5sum against your files and put the command lines used and the output here. You may THINK they are different but are not so may be troubleshooting the wrong issue. I suspect you may not be checking what you think you're checking.

When I checked the above syntax on my local system it worked fine. On two files that had same md5sum it said they were the same and on two that were different it said they were different.

chris_carr 11-07-2011 08:28 AM

1 Attachment(s)
I think this what you were asking me to do.

catkin 11-07-2011 09:17 AM

Quote:

Originally Posted by chris_carr (Post 4518019)
I tried your suggestion but I still have the same issue. The script is telling me they are all the same and everything is fine when I know they are not. :(

There's a typo in the script. Change != to =

chris_carr 11-07-2011 09:22 AM

Thank you catkin,

That did make the script tell me that the files are indeed different. Now all I need to do is figure out how to make the script test each file seperatly then tell me which ones are different.

catkin 11-07-2011 09:39 AM

How about writing the md5sum and the file name to a temporary file and then sorting the file by md5sums? Something like:
Code:

#!/bin/bash

tmp_fn=/tmp/a-unique-name
dir=/home/path
> $tmp_fn    # Empty the file
while IFS= read -r -d '' file
do
  md5sum "$file" >> "$tmp_fn"
done < <(find $dir -type f -print0)
sort "$tmp_fn" > "$tmp_fn.sorted"


chris_carr 11-07-2011 10:14 AM

Hello catkin,

I have already done that with the md5Aoutput.txt and the md5Boutput.txt. Those are the 2 files I'm trying to compare.

################################################################################
bash-3.2$ head -5 md5Aoutput.txt
d1d4c848ed02854c611f83b0125485ee /home/path/path/file1
3feae4ce1e2bef61a74804176849b742 /home/path/path/file2
fc27d45c1ea2ac04b96bb6ccd86312c7 /home/path/path/file3
95ae4982b20ef6adc68582dd058a3b0a /home/path/path/file4
6284a7a28f55ab1b65a001185c73f4b3 /home/path/path/file5
bash-3.2$ head -5 md5Boutput.txt
7e03a8212bd750a533f4d34a0d7ea9b7 /home/path/path/file1
3feae4ce1e2bef61a74804176849b742 /home/path/path/file2
fc27d45c1ea2ac04b96bb6ccd86312c7 /home/path/path/file3
d1ab73508a98465cf09ab386a26e6d7d /home/path/path/file4
6284a7a28f55ab1b65a001185c73f4b3 /home/path/path/file5
#################################################################################

Those are just 5 of the 216 keys in each file.

rreyes79 11-07-2011 11:15 AM

I am not at home right now so I can't test it, but your if statement is a little convoluted, and the sytax for looking up the md5 hash is a little messed up. Also your not equal should be changed to equal. I would break it up into something like this...

Code:

hash1=`md5sum /path/to/file1 | awk '{print $1}'`
hash2=`md5sum /path/to/file2 | awk '{print $1}'`
if [ $hash1 -eq $hash2 ]
then
  echo "Files are the same."
else
  echo "Files are different."
fi

Edit: I was responding to your original post and doing other work at the same time. It looks like other people got in before me :)

jmc1987 11-07-2011 11:20 AM

I think MensaWater meant for you to run the files like this

$md5sum -c file.txt

to both of your files checking output.

chris_carr 11-07-2011 11:45 AM

@ rreyes79 Doing it that way just gives me the same issue. Its says they are exactly the same when they are not.
The way I had it afterward I was able to get md5 to at least say they were different...which is true.


@ jmc1987..ok...well I did it that way last week and they all come out and say all 216 keys in A and B are "OK"...so that really tells me nothing...what is md5sum comparing it to when you just run the -c by itself???? I dont understand how it says the file is ok...ok compared to what??


Now you guys see why I have been having so much trouble. Not only is it been a big pain in the AXX but explaining it has proven to just as complicated lol.


All I need is for the script to compare the 216 hash keys in md5Aoutput.txt to the 216 coresponding hash keys in md5Boutput.txt. Im starting to think that the md5sum command is not going to be the best tool for this job.

catkin 11-07-2011 11:54 AM

I think I have understood the requirement now ...

Code:

#!/bin/bash

while read md5sum file
do
    matches=$( grep $md5sum md5Boutput.txt )
    [[ $matches != '' ]] && echo "$file:"$'\n'"$matches"
done < md5Aoutput.txt


chris_carr 11-07-2011 12:39 PM

@catkin....This is the output from the script you wrote. I think this is what I'm looking for. But sence im not a expert programer I'm going to take a stab at what this is. Is this out put for all the files that were the same? Or the files that are different?


/home/path/dir/file1:
3feae4ce1e2bef61a74804176849b742 /home/path/dir/file1
/home/path/dir/file2:
fc27d45c1ea2ac04b96bb6ccd86312c7 /home/path/dir/file2
/home/path/dir/file3:
6284a7a28f55ab1b65a001185c73f4b3 /home/path/dir/file3
/home/path/dir/file4:
9b880143ebf1171949afb6bdd7d73d0f /home/path/dir/file4
/home/path/dir/file5:
804806f933311e0b8b7b1dd15b4cd16b /home/path/dir/file5

catkin 11-07-2011 12:47 PM

Quote:

Originally Posted by chris_carr (Post 4518193)
Is this out put for all the files that were the same? Or the files that are different?

It is files that are listed in both md5Aoutput.txt and md5Boutput.txt with the same MD5 sum.

The
Code:

while read md5sum file
do
    ...
done < md5Aoutput.txt

reads md5Aoutput.txt line by line, parsing the MD5 sum into $md5sum and the rest (split at the space) into $file.

Then
Code:

    matches=$( grep $md5sum md5Boutput.txt )
    [[ $matches != '' ]] && echo "$file:"$'\n'"$matches"

searches md5Boutput.txt for the MD5 sum, putting any matching lines in $matches, then it tests the contents of $matches and, if it is not empty, prints what you have seen.


All times are GMT -5. The time now is 04:49 PM.