LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
LinkBack Search this Thread
Old 09-16-2009, 04:36 AM   #1
Tekken
Member
 
Registered: Jun 2009
Posts: 48

Rep: Reputation: 15
Script to find the duplicate files


Hi,

Can any one help me with the script which can be used to find the duplicate files in a directory. I have directory called dir1 which has some sub-directories and there are some .i and .o files in it. There are some duplicate files in the different directories. I want to identify them and copy the files to other directory leaving the duplicates in the same dir1.
 
Old 09-16-2009, 05:06 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,514
Blog Entries: 27

Rep: Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174
How can you identify the duplicates? Is the name enough or do you also need to check the size and/or checksum?

I do not understand There are some duplicate files in the different directories. I want to identify them and copy the files to other directory leaving the duplicates in the same dir1. When you identify a file in a sub-directory of dir1 which is a duplicate of a file in dir1, which "other" directory do you want to copy it to and is it OK to change from having two duplicates to having 3 duplicates?
 
Old 09-16-2009, 10:27 AM   #3
Tekken
Member
 
Registered: Jun 2009
Posts: 48

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by catkin View Post
How can you identify the duplicates? Is the name enough or do you also need to check the size and/or checksum?

I do not understand There are some duplicate files in the different directories. I want to identify them and copy the files to other directory leaving the duplicates in the same dir1. When you identify a file in a sub-directory of dir1 which is a duplicate of a file in dir1, which "other" directory do you want to copy it to and is it OK to change from having two duplicates to having 3 duplicates?
Hi Catkin,

Thanks for your reply.

Identifying duplicate files with name will help me in first place. I have a directory in which i have many sub-directories under which i have *.i and *.o files. Now i want to identify the duplicate files and do the following,

1. If the files are with same name then it has to display the location of the files with same name and store the information to some file.

2. After that it has to check for the contents of file and if the contents of the file are same it has to delete one copy and retain the other so i don't have a duplicate copy with same content.

Hope you understood what i am looking for, could please help me here?

Thanks,
Tekken.

Last edited by Tekken; 09-16-2009 at 10:29 AM.
 
Old 09-16-2009, 11:04 AM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,514
Blog Entries: 27

Rep: Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174Reputation: 1174
Idea for a brute force way (OK if not doing often on large number of files), not tested
Code:
find dir1 -type f \( -name '*.i' -o -name '*.o' \) -print0 | while IFS= read -r -d '' filename1  # Note 1
do
    basename=<stuff> # Remove the path from $filename1, leaving only the basename
    count=0
    find dir1 -mindepth 2 -type f -name "$basename" -print0 | while IFS= read -r -d '' filename2
    do
        echo "'$filename1' duplicate found at '$filename2'" >> output.txt
        let count++   
    done
    if [[ $count -eq 1 ]]; then
        <do file comparing, moving or deleting stuff stuff>
    fi
done
Notes:
  1. Using robust method described here
 
Old 09-17-2009, 03:41 AM   #5
Tekken
Member
 
Registered: Jun 2009
Posts: 48

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by catkin View Post
Idea for a brute force way (OK if not doing often on large number of files), not tested
Code:
find dir1 -type f \( -name '*.i' -o -name '*.o' \) -print0 | while IFS= read -r -d '' filename1  # Note 1
do
    basename=<stuff> # Remove the path from $filename1, leaving only the basename
    count=0
    find dir1 -mindepth 2 -type f -name "$basename" -print0 | while IFS= read -r -d '' filename2
    do
        echo "'$filename1' duplicate found at '$filename2'" >> output.txt
        let count++   
    done
    if [[ $count -eq 1 ]]; then
        <do file comparing, moving or deleting stuff stuff>
    fi
done
Notes:
  1. Using robust method described here
Thanks for the script catkin,But to be frank i am new to linux and i dont know what all dose this script do.

What should i specify at basename=<stuff> and in the if;then construct?
 
Old 09-18-2009, 12:46 AM   #6
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 328

Rep: Reputation: 140Reputation: 140
This creates a list of all the duplicate filenames found in a directory.
It's an improved version of the code which is explained on this thread
http://www.linuxquestions.org/questi...d.php?t=752129
Code:
find /path/to/dir -type f |
rev | sort | sed -nr ':a N;/^([^/]*\/).*\n\1/p;D;ba' | uniq | rev
 
Old 03-30-2013, 11:29 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
There's a small application called fdupes that does exactly what the OP describes.

http://code.google.com/p/fdupes/

Last edited by colucix; 03-30-2013 at 11:57 AM. Reason: Removed parts no longer needed (addressing spam).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Software to find duplicate files mike_savoie Linux - Software 5 07-17-2010 03:04 PM
Find Duplicate Files caponewgp Linux - Newbie 9 09-10-2009 12:20 AM
I need a GUI that can find duplicate files davidguygc Linux - Software 2 05-17-2007 05:54 AM
A bash script to find duplicate image files fotoguy Programming 7 01-25-2007 06:47 PM
Script to find duplicate files within one or more directories peter88 Linux - General 6 12-10-2006 05:17 AM


All times are GMT -5. The time now is 05:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration