LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices



Reply
 
Search this Thread
Old 09-16-2009, 05:36 AM   #1
Tekken
Member
 
Registered: Jun 2009
Posts: 48

Rep: Reputation: 15
Script to find the duplicate files


Hi,

Can any one help me with the script which can be used to find the duplicate files in a directory. I have directory called dir1 which has some sub-directories and there are some .i and .o files in it. There are some duplicate files in the different directories. I want to identify them and copy the files to other directory leaving the duplicates in the same dir1.
 
Old 09-16-2009, 06:06 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
How can you identify the duplicates? Is the name enough or do you also need to check the size and/or checksum?

I do not understand There are some duplicate files in the different directories. I want to identify them and copy the files to other directory leaving the duplicates in the same dir1. When you identify a file in a sub-directory of dir1 which is a duplicate of a file in dir1, which "other" directory do you want to copy it to and is it OK to change from having two duplicates to having 3 duplicates?
 
Old 09-16-2009, 11:27 AM   #3
Tekken
Member
 
Registered: Jun 2009
Posts: 48

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by catkin View Post
How can you identify the duplicates? Is the name enough or do you also need to check the size and/or checksum?

I do not understand There are some duplicate files in the different directories. I want to identify them and copy the files to other directory leaving the duplicates in the same dir1. When you identify a file in a sub-directory of dir1 which is a duplicate of a file in dir1, which "other" directory do you want to copy it to and is it OK to change from having two duplicates to having 3 duplicates?
Hi Catkin,

Thanks for your reply.

Identifying duplicate files with name will help me in first place. I have a directory in which i have many sub-directories under which i have *.i and *.o files. Now i want to identify the duplicate files and do the following,

1. If the files are with same name then it has to display the location of the files with same name and store the information to some file.

2. After that it has to check for the contents of file and if the contents of the file are same it has to delete one copy and retain the other so i don't have a duplicate copy with same content.

Hope you understood what i am looking for, could please help me here?

Thanks,
Tekken.

Last edited by Tekken; 09-16-2009 at 11:29 AM.
 
Old 09-16-2009, 12:04 PM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
Idea for a brute force way (OK if not doing often on large number of files), not tested
Code:
find dir1 -type f \( -name '*.i' -o -name '*.o' \) -print0 | while IFS= read -r -d '' filename1  # Note 1
do
    basename=<stuff> # Remove the path from $filename1, leaving only the basename
    count=0
    find dir1 -mindepth 2 -type f -name "$basename" -print0 | while IFS= read -r -d '' filename2
    do
        echo "'$filename1' duplicate found at '$filename2'" >> output.txt
        let count++   
    done
    if [[ $count -eq 1 ]]; then
        <do file comparing, moving or deleting stuff stuff>
    fi
done
Notes:
  1. Using robust method described here
 
Old 09-17-2009, 04:41 AM   #5
Tekken
Member
 
Registered: Jun 2009
Posts: 48

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by catkin View Post
Idea for a brute force way (OK if not doing often on large number of files), not tested
Code:
find dir1 -type f \( -name '*.i' -o -name '*.o' \) -print0 | while IFS= read -r -d '' filename1  # Note 1
do
    basename=<stuff> # Remove the path from $filename1, leaving only the basename
    count=0
    find dir1 -mindepth 2 -type f -name "$basename" -print0 | while IFS= read -r -d '' filename2
    do
        echo "'$filename1' duplicate found at '$filename2'" >> output.txt
        let count++   
    done
    if [[ $count -eq 1 ]]; then
        <do file comparing, moving or deleting stuff stuff>
    fi
done
Notes:
  1. Using robust method described here
Thanks for the script catkin,But to be frank i am new to linux and i dont know what all dose this script do.

What should i specify at basename=<stuff> and in the if;then construct?
 
Old 09-18-2009, 01:46 AM   #6
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 333

Rep: Reputation: 141Reputation: 141
This creates a list of all the duplicate filenames found in a directory.
It's an improved version of the code which is explained on this thread
http://www.linuxquestions.org/questi...d.php?t=752129
Code:
find /path/to/dir -type f |
rev | sort | sed -nr ':a N;/^([^/]*\/).*\n\1/p;D;ba' | uniq | rev
 
Old 03-30-2013, 12:29 PM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
There's a small application called fdupes that does exactly what the OP describes.

http://code.google.com/p/fdupes/

Last edited by colucix; 03-30-2013 at 12:57 PM. Reason: Removed parts no longer needed (addressing spam).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Software to find duplicate files mike_savoie Linux - Software 5 07-17-2010 04:04 PM
Find Duplicate Files caponewgp Linux - Newbie 9 09-10-2009 01:20 AM
I need a GUI that can find duplicate files davidguygc Linux - Software 2 05-17-2007 06:54 AM
A bash script to find duplicate image files fotoguy Programming 7 01-25-2007 07:47 PM
Script to find duplicate files within one or more directories peter88 Linux - General 6 12-10-2006 06:17 AM


All times are GMT -5. The time now is 07:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration