LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Removing newer identical files in same directory? (https://www.linuxquestions.org/questions/linux-general-1/removing-newer-identical-files-in-same-directory-4175476078/)

Calab 09-05-2013 03:10 PM

Removing newer identical files in same directory?
 
I'd like to be able to scour a directory, removing all duplicated files except for the one with the oldest modification date.

I can't think of a method to do so? Is this a simple task for Linux or will it involve some scripting?

The logic is basically this...

1. Find the oldest file.
2. Remove any other files in this directory that are identical.
3. Find the next oldest file.
4. Remove any other files in the directory that are identical.
5. Repeat steps 3 and 4 until all the duplicates are gone.

...and what if I only wanted the most recent modification date instead of the oldest?

jailbait 09-05-2013 03:22 PM

If the files are identical what difference does it make as to which one you save?

-------------------------
Steve Stites

Calab 09-05-2013 03:27 PM

Quote:

Originally Posted by jailbait (Post 5022560)
If the files are identical what difference does it make as to which one you save?

We need to retain the timestamp.

If it's an easy task to delete all the duplicates without regarding the timestamp, it should be a simple matter to sort the file list before trying to delete the duplicates... shouldn't it?

ntubski 09-05-2013 05:54 PM

You could use fdupes, I checked the source and it does actually sort by mtime. However, that feature is not documented so it's a little dicey to rely on it. For a one-off it should be okay to check the order is correct:
Code:

# if fdupes sort by mtime, blank lines should be only diff
diff -u <(fdupes the-directory) <(fdupes -1 the-directory | awk '{system("ls -rt " $0)}')

If that's okay, then you can run
Code:

fdupes -d -N the-directory
If you want newest instead of oldest you have to do the sorting yourself:
Code:

dupdir=$(mktemp -d --tmpdir dups-XXXXXX)
fdupes the-directory | csplit --prefix="$dupdir"/dups -qz - '/^$/1' '{*}'
for dups in "$dupdir"/dups* ; do
    xargs stat -c '%Y %n' < "$dups" | sort -r -n -k1,1 | cut -d' ' -f2 | tail -n +2
    # remove -r from sort to keep oldest
done | xargs rm --
rm -r "$dupdir"



All times are GMT -5. The time now is 10:07 AM.