LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   linux creates duplicate files (https://www.linuxquestions.org/questions/linux-newbie-8/linux-creates-duplicate-files-566653/)

Wim Sturkenboom 07-08-2007 12:28 PM

Don't know about your tr and xargs, but you have created one long filename by the looks of it.
It looks like you have a mistake in your second tr (compared to one posted earlier)
Code:

tr ' ' '\000' |
versus
Code:

tr '' '\000' |
But I'm not sure about this part.

alexander_bosakov 07-08-2007 03:11 PM

Well, some suggestion what's happening: It's not linux that creates the duplicate entries, it is Windows. It's because of the FAT directory entry structure - it's intended to store DOS's 8.3 file names, so when Windows introduced the long names, they used additional directory entry records. If you look at such a directory under pure DOS, you'll see more than one name for a single file, e.g. "FILEBL~1.HTM", "..B L A B L.A B", ... or something like, for the file that Windows sees as FILEBLABLABLA.HTML. So, I suppose you mounted your network drive as type "fat", which treats such a filesystem the way DOS does, instead of mounting it as "vfat", which is the windows's way.

jschiwal 07-08-2007 05:11 PM

The stat commands you showed indicate the the files have different inodes.
That means that they are duplicate files and not just duplicate entries.

Don't worry about the lower case entries being a different color. I think
that indicates that they are associated with an application to display them.
I see the same thing.

You could check if they are truely unique by using the md5sum command to calculate their
hash values. Only identical files will contain the same hash values.

Code:

find /home/photographs/ -type f -iname "*.jpg" -exec md5sum '{}' \; | sort | uniq -w32 -D >duplicate_list
The list will contain a list of the original files and their duplicates.
Examine the list and see if you have pairs of .jpg and .JPG with the same md5sum values.
If the list is OK, you could remove the .jpg entries leaving only the .JPG entries to delete:
Code:

# lets preview this once first.  If an environmental locale setting is wrong,
#sed might select both lower and upper case in some cases.
sed '/\.jpg$/d' duplicate_list
# if you see only .JPG files displayed then it is safe to proceed
sed '/\.jpg$/d' duplicate_list | tr '\n' '\000' | xargs -0 -L100 rm -v

P.S. could you edit one of your previous posts so that the width of this thread isn't 400 characters!


All times are GMT -5. The time now is 10:44 AM.