Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I seem to have this problem often. Linux intuitively renames files sometimes so if I had 'Car_pic.png' the duplicate would be named 'Car_pic(1).png'.
I am trying to figure out how to do this once and for all. I believe I need to pipe grep and sort commands together, sending duplicates to NULL (deleting them), or another directory as specified. Whether it's documents, logs, images, fonts or whatever, I always seem to end up with duplicate hell, I'm sure most of us do. Duplicates are a nightmare, and any database admin will know what I mean.
However, if the problem is clear, as in my .png example above, and the only difference between two files is an underscore '_' then this should be easy enough to sort. A more complex command might be to include checking the filesize, as one may have zero data (corrupt), and we don't want to keep that one!
I would be grateful for any tips.
Last edited by smudge|lala; 10-24-2006 at 03:54 AM.
It does work, thank you. If anyone knows a command sequence or script that might do something similar I'd be grateful for any input as I'm sure bash will do it. Dupefinder did find duplicates in under 10 seconds, but I'm not about to go through and mark 4111 files! Hence automating it by specifying the desired output.
mv file \*.png file_*.png or something
Last edited by smudge|lala; 03-27-2006 at 08:34 PM.
The md5sum checks that the *contents* of files are the same, rather than names. It then uses sort and uniq to compare checksums and pipes a list of file names to rm.
rm: cannot remove `Action': No such file or directory
I thought I coult edit the command and copy rather than remove into a new directory, and that returned:
Code:
User@localhost $ md5sum * | sort | uniq -d -w32 | cut -d' ' -f3 | xargs cp Unique/
md5sum: Unique: Is a directory
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
cp: `Fanatika': specified destination directory does not exist
Try `cp --help' for more information.
I know bash can do this, I just can't figure out how. Xargs is a really powerful command!
Last edited by smudge|lala; 03-27-2006 at 10:22 PM.
Thank you for your input guys, but it still isn't working. The uniq command looks interesting especially after the md5sum. The command hangs which makes me think an option/switch hasn't been set correctly, possibly with md5sum * but I'm not sure. Maybe xargs isn't getting the correct input to proceed?
In considering how to approach such a sort and purge, I suppose the system can take one file, and search for a duplicate by md5 anywhere, but in the same directory is more likely, then drop one of the two files if a duplicate is found. If this is how this command is trying to work, then where is all the data of comparisons, of md5sums going? Can xargs handle input from 4000 files?
I tried with only 6 files, 3 sets of duplicates. I get:
Code:
md5sum: BACKUPS: Is a directory
cp: cannot stat `XFile': No such file or directory
for each result. This is with the command I issued:
Trying again with md5sum * | sort | uniq -d -w32 | cut -d' ' -f3 | xargs rm I get the same error. I'm only using 6 small files to test, and binary or text, they should still work seeing as we're using md5 right?
Perhaps if they are named slightly differently, such as all my duplicates have an underscore '_' such as big_cat.png then surely I can do something like:
cp *.png | grep '_' dupebak/ although this doesn't work..
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.