LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-29-2022, 04:08 PM   #1
Faki
Member
 
Registered: Oct 2021
Posts: 574

Rep: Reputation: Disabled
Removing duplicate files


I am passing files and directories, so I can use grep on certain patterns.

Would like to remove duplicate files. I am using the following, which obviously fails, if files in directory match files named supplied directly in #@.

Code:
    declare -A tag
    for fda in "$@"; do
      [[ -f $fda ]] || [[ -d $fda ]] || continue  # invalid entry
      [[ ${tag[comint:${fda}]+E} ]]  && continue  # existing entry
      tag[comint:${fda}]=1
      fdir+=( "$fda" )
      [[ -f $fda ]] && fla+=( "$fda" )
      [[ -d $fda ]] && dra+=( "$fda" )
    done
For instance, running the function mysearch to search for .rc files in current directory (.) using -incl .rc.

Code:
./mysearch -p "Gnu" --incl .rc *.rc .
This produces the following files, which are repeated

Code:
linge-cellar.rc
linge-checkn.rc
linge-comint.rc
linge-comseq.rc
linge-console.rc
linge-curiplaya.rc
linge-dircolors.rc
linge-firefly.rc
linge-mosaic.rc
linge-parade.rc
./dvorak/dv-cmswap.rc
./dvorak/dv-hmcbar.rc
./linge-dircolors.rc
./linge-firefly.rc
./linge-checkn.rc
./linge-mosaic.rc
./linge-cellar.rc
./linge-console.rc
./linge-curiplaya.rc
./linge-comseq.rc
./linge-comint.rc
./linge-parade.rc
What can I do to remove duplicate files?
 
Old 05-29-2022, 06:20 PM   #2
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,344

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
Quote:
Originally Posted by Faki View Post

What can I do to remove duplicate files?
Merge the two lists into a single list, sort the combined list, read through the sorted list looking for two or more consecutive instances of the same name and delete the duplicates.
 
Old 05-29-2022, 06:30 PM   #3
SlowCoder
Senior Member
 
Registered: Oct 2004
Location: Southeast, U.S.A.
Distribution: Debian based
Posts: 1,250

Rep: Reputation: 164Reputation: 164
There are utilities that can do this. But if you want a script, this will find all files that have duplicates, based on md5sum. You can then manually delete the duplicates.

Code:
md5sum $(find . -type f -iname '*.rc') | sort | uniq -D -w32 > dupelist.txt
 
Old 05-29-2022, 08:33 PM   #4
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,776

Rep: Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933Reputation: 5933
That could be the output of multiple grep commands.

Is that what you were expecting?
 
Old 05-29-2022, 10:25 PM   #5
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Hi,

any reason not to use a fdupes for this?

Evo2.
 
1 members found this post helpful.
Old 05-29-2022, 10:35 PM   #6
suramya
Member
 
Registered: Jan 2022
Location: Earth
Distribution: Debian
Posts: 249

Rep: Reputation: 102Reputation: 102
Another option is to use fslint. It is a fantastic program that finds duplicates and allows you to delete/rename them if required. It has both GUI and command line modes but I have only used the GUI mode
 
Old 05-29-2022, 11:39 PM   #7
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Quote:
Originally Posted by suramya View Post
Another option is to use fslint. It is a fantastic program that finds duplicates and allows you to delete/rename them if required. It has both GUI and command line modes but I have only used the GUI mode
fslint seems to be unmaintained. See https://github.com/pixelb/fslint/issues/172
 
Old 05-30-2022, 02:12 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,016

Rep: Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342Reputation: 7342
yes, it is implemented several times on several different languages.
Look for dupfinder, duplicate finder, fdupes or similar.
looks like fslint is continued here: https://github.com/qarmin/czkawka
 
Old 05-30-2022, 02:35 PM   #9
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,821

Rep: Reputation: 1212Reputation: 1212Reputation: 1212Reputation: 1212Reputation: 1212Reputation: 1212Reputation: 1212Reputation: 1212Reputation: 1212
You can strip off a leading ./
Code:
for rawfda in "$@"; do
  fda=${rawfda#./}
 
Old 05-30-2022, 03:00 PM   #10
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,249

Rep: Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323
I wrote one. It lists files in a directory tree with the same byte size. You refer to that information and decide for yourself what to remove.

https://gist.github.com/duganchen/1e917c11fce44267b4c4

It’s much faster than literally anything else, because it doesn’t read the contents of the files.
 
Old 05-30-2022, 04:50 PM   #11
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Quote:
Originally Posted by dugan View Post
I wrote one. It lists files in a directory tree with the same byte size. You refer to that information and decide for yourself what to remove.

https://gist.github.com/duganchen/1e917c11fce44267b4c4

It’s much faster than literally anything else, because it doesn’t read the contents of the files.
fdupes starts by just comparing file sizes, then md5 and finally compares the file contents. So, it is actually both fast and safe. It also doesn't crash on dangling symlinks.

Evo2.
 
Old 05-30-2022, 05:52 PM   #12
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,249

Rep: Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323Reputation: 5323
Quote:
Originally Posted by evo2 View Post
It also doesn't crash on dangling symlinks.
Thanks for the bug report. It's fixed.
 
Old 05-31-2022, 12:58 PM   #13
GPGAgent
Senior Member
 
Registered: Oct 2018
Location: Surrey UK
Distribution: Mint 20 xfce 64bit
Posts: 1,026
Blog Entries: 3

Rep: Reputation: 133Reputation: 133
Code:
fdupes -r -d -N F1/ F2/ F3/
I've found fdupes does it for me, this removes all duplicates.
 
Old 06-01-2022, 06:03 PM   #14
suramya
Member
 
Registered: Jan 2022
Location: Earth
Distribution: Debian
Posts: 249

Rep: Reputation: 102Reputation: 102
Quote:
Originally Posted by evo2 View Post
fslint seems to be unmaintained. See https://github.com/pixelb/fslint/issues/172
Interesting..
I didn't know that it was not maintained any more. Since it works without issues on my machine (Debian 11 Unstable) so far I didn't see any need to visit the upstream site.
 
Old 06-01-2022, 06:40 PM   #15
evo2
LQ Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and CentOS
Posts: 6,724

Rep: Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705Reputation: 1705
Quote:
Originally Posted by suramya View Post
Interesting..
I didn't know that it was not maintained any more. Since it works without issues on my machine (Debian 11 Unstable) so far I didn't see any need to visit the upstream site.
Debian 11 is stable, not unstable. Debian unstable does not have a version number. How did you install fslint?

Evo2.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
removing duplicate files herakles_14 Ubuntu 9 08-19-2013 10:11 AM
Shell script to concatinate two text files and removing the duplicate entries punky Programming 13 06-17-2012 08:30 PM
File Mgmt. Removing duplicate files. heimbichner Linux - General 3 12-29-2011 10:11 PM
does tar or bzip2 squash duplicate or near-duplicate files? garydale Linux - Software 6 11-19-2009 04:43 PM
Removing files wihtout removing containing Direcotry caps_phisto Linux - General 2 10-07-2004 08:16 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration