linux creates duplicate files
Hi all,
I am still in the process of migrating from Win to Linux Suse 10.2, and I have come across a bizarre problem. All my files that were named under Win in capitals (esp. photographs, with names such as DCSF*.JPG, or *.JPG, etc.) end up being duplicated by Linux quite unbidden by me. For example, I end up having two files DCSF 001.JPG, and dcsf 001.jpg. Sometimes the duplication only affects the whole name or just the extension (i.e., *.JPG also appear as *.jpg). Anyone can tell me what is happening here? I am being driven beserk deleting the duplicates using Komparator, which crashes every few minutes. Especially when I copy files from my CD backups, how can I avoid this happening? Ben |
I have dualboot XP & SuSE 10.2. I've never seen anything like that. Where do these files exist? Did you copy them? If so how?
|
Hi,
Thanks for your response. Most of the files used to be stored on a FAT formatted network drive, connected through a NAS box, before being transferred to my linux EXT3 drive. I am trying to experiment and see if I can reproduce the moment they get duplicated, but right now my hard drive is littered with these duplicates, in the thousands. Ben |
How were they copied?
There is one thing to check. I want to make sure that these are indeed distinct files. Take one such pair of filenames and use stat to get information an post the answer. Code:
stat ABCD.JPG If every capital case named file has a lower-case named equivalent, a script could delete the uppercase names. For example, cd into the directory and run: Code:
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' Code:
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' -exec rm -v '{}' \; Let's find all of the files that have lower-case names, and delete the upper-case equivalents. Look at the output of: Code:
find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | less If so, then run: Code:
find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | less Code:
find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | (It is actually a single command line that I entered in the shell, split up after the pipes (|) for convenience.) The first line lists files with lowercase characters. The second converts them to upper case. Because some of the filenames can contain spaces, we need to convert return characters (\n) with NULL (\000). The third line does this. The xargs command takes the names coming in from the left of the pipe and uses them as arguments to the command listed (rm in this case). The -0 option uses NULL to separate arguments. This allows for whitespace in the filenames. The -L 1000 limits the number of arguments to 1000 at a time. This will prevent a memory overflow in the shell. |
Wow! Thanks a lot, I'll study your suggestions carefully and tomorrow post the results (especially to see if the files are real duplicates or links, but I think the former, since Komparator, fdupe and dff report them as such).
Ben |
Hi,
I did try with stat, and here is the result: stat dsc00115.jpg File: `dsc00115.jpg' Size: 21042 Blocks: 48 IO Block: 4096 regular file Device: fd06h/64774d Inode: 29492223 Links: 1 Access: (0777/-rwxrwxrwx) Uid: ( 2000/ fabbs) Gid: ( 500/administrators) Access: 2007-07-05 22:21:09.000000000 +0200 Modify: 2005-05-31 08:50:00.000000000 +0200 Change: 2007-06-27 23:02:53.000000000 +0200 me@desktop:/home/photographs/me> stat DSC00115.JPG File: `DSC00115.JPG' Size: 21042 Blocks: 48 IO Block: 4096 regular file Device: fd06h/64774d Inode: 29393973 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 2000/ fabbs) Gid: ( 500/administrators) Access: 2007-07-05 22:21:09.000000000 +0200 Modify: 2005-05-31 09:50:00.000000000 +0200 Change: 2007-06-27 23:02:53.000000000 +0200 So, so far it appears that they are distinct files, right? (BTW, if I issue the ls command, the lower case appear in green, not sure if this is how it should be). Now, since I don't know if all the uppercase files are a duplicate, and will be working on thousands of files, i'd rather take no chances. However, I am a bit confused concerning the commands above (I am beginning to get there with the command prompt, but still groping in the dark). First, I did: me@desktop:/home/photographs/> find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' ./DSC00116.JPG ./DSC00181.JPG ./DSC00121.JPG ./DSC00147.JPG ./DSC00144.JPG ./DSC00115.JPG ./DSC00145.JPG ./DSC00120.JPG ./DSC00148.JPG ./DSC00117.JPG ./DSC00182.JPG me@desktop:/home/photographs> ./DSC*.JPG bash: ./DSC00115.JPG: Permission denied if I do ls - l it returns -rw-rw-rw- 1 me administrators 21042 2005-05-31 09:50 DSC00115.JPG so everybody should have -rw access, but why then permission denied? But now it gets even more confusing for me. When I follow your commands: find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | less and then find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | less less returns the list of lower case files or the list of upper case files respectively. Now, I am not sure I understand what I am supposed to do at this stage. I issued q to get out of less, or should I be doing something different? When I run the last command, which I put all in one line for easy cutting and pasting: find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | less the output is as follows: me@desktop:/home/photographs> find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | tr ' ' '\000' | xargs -0 -L 1000 rm -v rm: cannot remove `./DSC00181.JPG\n./DSC00148.JPG\n./DSC00144.JPG\n./DSC00121.JPG\n./DSC00147.JPG\n./DSC00145.JPG\n./DSC00115.JPG\n./DSC00120.JPG\n./DSC00116.JPG\n./DSC00117.JPG\n./DSC00182.JPG\n': No such file or directory I am sure I am missing something, and have just printed the find man pages to understand thing a little better, but any help would be most appreciated. Thanks a lot. Ben |
Quote:
Quote:
|
Well, I think I have made a mess here by trying to change file permissions. I run chmode -R to change all attributes to 666. However, what happened is that I also changed permissions to all directories as well as files. Now, If I run (with all files in 666 mode)
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' and then: desktop:/home/photographs/fabbs/Documents/Photographs # ./*.* -bash: ./Babbo00.JPG: Permission denied desktop:/home/photographs/fabbs/Documents/Photographs # ./*.JPG -bash: ./Babbo00.JPG: Permission denied ls -l returns (for the file in question): -rw-rw-rw- 1 me mygroup 1325566 May 23 17:40 Babbo00.JPG If I change chmod to 777 for all files, the same commands then return: -bash: ./Babbo00.JPG: cannot execute binary file All the above results is while running as root. If I run with my user console, I get the following results: chmode 666 = find: cannot get current directory: Permission denied chmode 777 = bash: ./Babbo00.JPG: cannot execute binary file How can I get out of this mess? Is there an easy way to chmod only directories to 777 and leave all files in chmod 666? Sorry for adding more mess, but I am new to Linux and, well, messed around with the command line. Thanks a lot for your help. Ben |
Sorry about the previous message. I now sorted the problem with file permissions (I think) and have all files on 666 and directories on 777.
However, I run find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' and then ./*.JPG and I still get the message -bash: ./Babbo00.JPG: Permission denied I tried moving the offending file to a different directory and tried again, but got the same message for the next file on the list that has any uppercase characters. ls -l returns: -rw-rw-rw- 1 fabbs users 1325566 2007-05-23 17:40 Babbo00.JPG What is happening? Tx Ben |
As per Wim, why are you trying to execute (run) jpg files? They are not programs.(!)
|
Hi Chris,
Thanks for your response. Not sure I understand what you mean by trying to execute jpg files. Are you referring to the command ./*.JPG? That was my interpretation of Wim's suggestion that I run ./ABCD.JPG. Or have I got it wrong here and I literally should run: find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' ./ABC DEF.MPG ./ABCD.JPG ./EFGH.TXT I must admit I am trying to follow Wim's suggestions by using cutting and pasting, running ./*.JPG was my only attempt at interpreting his instructions. Ben |
You did not interprete my post correctly and I'm sorry if that that was because my post was not clear.
In post #6, you were trying to execute a picture and I only tried to explain why you got the specific error message. Quote:
|
Thanks for the clarification.
However, now I have all file permissions set to chmode 666 (, or -rw-rw-rw-, which I understand is not executable), and directories to 777 (if I have the directory to 666 I cannot access them). Still, when I issue the command ./*.JPG I get the message for all the files in capitals: -bash: ./Babbo00.JPG: Permission denied This happens also when I run the code as root. Why do I get the permission denied? Theoretically all users should be able to read-write to these files? That's when I get confused. Do I need to change any permissions? If so, how? And what does the command ./*.JPG exactly do? If it helps, I have copied of the directory on a different disk to experiment, and the command (find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' -exec rm -v '{}' \; ) seemed to work and delete all files in capitals, but unfortunately some of the files in capitals are unique, so I need to follow the second strategy you propose. Tx Ben |
All users can read and write them, but you are trying to run (execute) something that is not a program (code).
What do you want to achieve? You want to view the image? If so, you have to run a program and open the files in there. And most programs will allow you to pass the filename as an argument. Code:
my_program myfile.jpg (when I'm in the correct directory) Code:
gqview 01.jpg By the way, I think that we're far away from your original problem. PS Nobody mentioned a command ./*.JPG in the previous posts as far as I can see. You might be confused by the content in post #4 Code:
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' |
Bingo! I thought ./*.JPG was a command, not the result.
So, to recapitulate, I do: find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' and the result is a list of files with uppercase in all sub-directories (I can provide the full output, if necessary). Then I do: find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | less and I get a list of files with lowercase letters, but only in the main directory (the ones in the sub-directory now do not appear). To exit I press q (is that what I should do at this stage?) and get back to the command prompt. Then I do: find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | less and I get a list of all the files in uppercase, also only in the main directory (no subdirectories). Again, I do q to exit less back to the command prompt. Then I do: find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | tr ' ' '\000' | xargs -0 -L 1000 rm -v pressing enter each time after the vertical bar, and the result I get is: me@desktop:~/Desktop/2006> find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | > tr '[[:lower:]]' '[[:upper:]]' | > tr ' ' '\000' | > xargs -0 -L 1000 rm -v rm: cannot remove `./BAB2.JPG\n./BLACK.TIF\n./UNTITLED2.BMP\n./PSTEMP_3PICS.PSD\n./DSCF0007.JPG\n./DSC_0030.NEF\n./DSCF0002.JPG\n./SEPTEMBER.JPG\n./AUGUST.JPG\n./UNTITLED.BMP\n./DSCF1809.JPG\n./SALUTI.JPG\n./ORIGINAL.JPG\n./DSCF0006.JPG\n./DSCF0003.JPG\n./BAB1.JPG\n./LS.TXT\n./DSCF0190.JPG\n./DSCF0001.JPG\n': No such file or directory me@desktop:~/Desktop/2006> Still very confused, but I hope I am not beginning to abuse of everybody's patience.. Ben |
Don't know about your tr and xargs, but you have created one long filename by the looks of it.
It looks like you have a mistake in your second tr (compared to one posted earlier) Code:
tr ' ' '\000' | Code:
tr '' '\000' | |
Well, some suggestion what's happening: It's not linux that creates the duplicate entries, it is Windows. It's because of the FAT directory entry structure - it's intended to store DOS's 8.3 file names, so when Windows introduced the long names, they used additional directory entry records. If you look at such a directory under pure DOS, you'll see more than one name for a single file, e.g. "FILEBL~1.HTM", "..B L A B L.A B", ... or something like, for the file that Windows sees as FILEBLABLABLA.HTML. So, I suppose you mounted your network drive as type "fat", which treats such a filesystem the way DOS does, instead of mounting it as "vfat", which is the windows's way.
|
The stat commands you showed indicate the the files have different inodes.
That means that they are duplicate files and not just duplicate entries. Don't worry about the lower case entries being a different color. I think that indicates that they are associated with an application to display them. I see the same thing. You could check if they are truely unique by using the md5sum command to calculate their hash values. Only identical files will contain the same hash values. Code:
find /home/photographs/ -type f -iname "*.jpg" -exec md5sum '{}' \; | sort | uniq -w32 -D >duplicate_list Examine the list and see if you have pairs of .jpg and .JPG with the same md5sum values. If the list is OK, you could remove the .jpg entries leaving only the .JPG entries to delete: Code:
# lets preview this once first. If an environmental locale setting is wrong, |
All times are GMT -5. The time now is 12:23 AM. |