snowman81 10-03-2009 06:45 AM

Use uniq on first part of file but print whole line.
I have a script that does an md5 hash on a bunch of files in 2 folders and prints them to a text file. I know I can run uniq to print out the well, unique, ones but the problem is the lines are from different folders. As an example:


d41d8cd98f00b204e9800998ecf8427e  /home/jason/Desktop/folder1/testin3


d41d8cd98f00b204e9800998ecf8427e  /home/jason/Desktop/folder2/testin2
I am going to see if I can use awk to only run uniq on the first part but actually print out the whole line but I wanted to see if anyone had a better idea? Thanks.

catkin 10-03-2009 06:54 AM

What do you want to do? Do you want to display all lines having the same md5 hash?

konsolebox 10-03-2009 06:54 AM

won't 'uniq -w 32' do the trick?

jschiwal 10-03-2009 07:04 AM

If you want to find files with duplicates, you can use
"sort list | uniq -w32 -D" to print the file and it's duplicates.

You can pipe the output through the cut command to remove the md5sum field from the final list.

sort, uniq, cut and around 100 other very useful commands are supplied by the coreutils package. I would recommend scanning through the info manual for coreutils. I even downloaded the source to create a print worthy pdf from the .texi source, and printed it for a 3-ring binder hard copy.

snowman81 10-03-2009 07:22 AM

Yup, you're both right. I'm looking at the man page and didn't even see the -w option. I feel sheepish now :)

