LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 07-12-2019, 06:45 PM   #1
cesarsj
LQ Newbie
 
Registered: Mar 2019
Posts: 12

Rep: Reputation: Disabled
Question How could I find and remove duplicate files in Slackware by the terminal?


The command below saves the list of duplicate files to a file.

find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 20 > /home/cesarmsj/duplicate_files.txt

Now, I would like to remove duplicate files found, how could I do this ??

I would also like to know the total in MBytes of duplicate files found
 
Old 07-12-2019, 06:59 PM   #2
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: FreeBSD/Slackware-14.2+/ArcoLinux
Posts: 8,982

Rep: Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875
are the names the same in different area's? if you got your list, just run it through a loop and call to remove files.
Code:
while read f ; do rm "$f" ; done < file
if you have absolute path to them files in your list of files.
this might give you some ideas on how to add up or tally total amount of mb
https://www.cyberciti.biz/tips/linux...-examples.html
 
Old 07-12-2019, 08:13 PM   #3
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.6
Posts: 3,451

Rep: Reputation: 1157Reputation: 1157Reputation: 1157Reputation: 1157Reputation: 1157Reputation: 1157Reputation: 1157Reputation: 1157Reputation: 1157
Please post the first few lines (5-10) of duplicate_files.txt.

Are the duplicates in the same directory?
Do they have the same name?

Last edited by scasey; 07-12-2019 at 08:15 PM.
 
Old 07-12-2019, 08:18 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,994

Rep: Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871Reputation: 2871
Perhaps you should use one of the many tools that do this - usually you can get it to delete, or list attributes such as size, or a bunch of other useful things. For example fdupes.
 
Old 07-13-2019, 10:30 AM   #5
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (Chicago area)
Distribution: CentOS, MacOS, [Open]SuSE, Raspian, Red Hat, Slackware, Solaris, Tru64
Posts: 1,424

Rep: Reputation: 120Reputation: 120
Quote:
Originally Posted by cesarsj View Post
The command below saves the list of duplicate files to a file.

find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 20 > /home/cesarmsj/duplicate_files.txt

Now, I would like to remove duplicate files found, how could I do this ??
I've done something like on my systems. The exception is that I'm too paranoid about deleting the duplicates and breaking something that relies on that particular file being in that particular directory---I'm replacing them with symbolic links back to the first copy encountered.

I'm using a Perl script to do it (way too big to post here as it involves stuffing checksums/filepaths into a Pg database and uses SQL to pull out the information I need. I run this against several multi TB disks and the database speeds things up considerably) but from the command line, you could take the results of your checksum gathering and save it into a file, say, "file.checksums". Sort it if you like. Then extract all the checksums from that file, sort them, and run them through uniq, obtain the number of occurances, and save that result into a file ("checksums.count"). You would then scan through that list looking for any checksums that occur more than once. Then grep "file.checksums" for all the records that contain that checksum. This is the list of "original + duplicates" you need to work with. In my script, I select the first occurance as the "master" file. All of the others are files that I'll delete and replace with a symbolic link pointing to the "master". Then just continue looping through the "checksums.count" list.

It's not a one-liner so some fun with scripting is involved.

While you're writing this, though, I'd include a provision to build up the commands that are going to touch the files and display the commands when some variable, say "DRYRUN", is set to "true". Closely examine the commands that are being generated and make sure they're not doing something unintended before turning them loose on your filesystem (DRYRUN = false). I.e.,
Code:
CMD=" ... "
if [ ${DRYRUN} ]; then
    echo "${CMD}"
else
    ${CMD}
fi
I've used a bash function to provide this flexibility.

When doing the dry run, pay attention to what happens when you encounter files with spaces in the names. ($DIETY how I hate 'em.)

Quote:
I would also like to know the total in MBytes of duplicate files found
The simplest way to do this would be, IMHO, before you delete each file issue the command
Quote:
wc -c filepath
and append the results to a file. Obviously, you'll want to ensure that this file is empty beforehand. You can process the contents of that separately once the file removals are complete. You might consider setting you your script to accumulate this data even when DRYRUN is set so you have an idea of how much space you'll recover before even doing the removals.

Good luck. And remember: Backups are your friend.

When you're all done, you'll want to ask yourself the same question I do: How the heck did I get all these duplicate files in the first place?
 
Old 07-13-2019, 11:14 AM   #6
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: FreeBSD/Slackware-14.2+/ArcoLinux
Posts: 8,982

Rep: Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875Reputation: 1875
as far as total mb, check my math, but here, this might work
Code:
#!/bin/bash

path=$HOME/bin

total=0
while read f 
do
	total=$((total + f))
done < <(find "$path" -type f -exec du {} \; | awk '{print $1}' )
echo "$((total/1024)) MB"
if you got your files with absolute path you can just change how you read in the files the awk it to get just the first column to tally up the numbers.

Last edited by BW-userx; 07-13-2019 at 11:26 AM.
 
  


Reply

Tags
clean, duplicate, slackware 14.2


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Shell script to find duplicate files and old files maheshs97 Linux - Enterprise 6 10-18-2016 09:08 AM
how to find out duplicate email id's from the subscribe list and how to remove it? linuxrndblog Linux - Server 1 12-05-2013 03:30 AM
Find and remove duplicate phrases in a document sundays211 Linux - General 9 03-30-2011 11:09 PM
App to find and remove duplicate images? Zaraphrax Linux - General 5 12-14-2010 06:34 AM
does tar or bzip2 squash duplicate or near-duplicate files? garydale Linux - Software 6 11-19-2009 04:43 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:13 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration