LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   linux creates duplicate files (https://www.linuxquestions.org/questions/linux-newbie-8/linux-creates-duplicate-files-566653/)

bnebradd 07-04-2007 02:56 PM

linux creates duplicate files
 
Hi all,

I am still in the process of migrating from Win to Linux Suse 10.2, and I have come across a bizarre problem. All my files that were named under Win in capitals (esp. photographs, with names such as DCSF*.JPG, or *.JPG, etc.) end up being duplicated by Linux quite unbidden by me. For example, I end up having two files DCSF 001.JPG, and dcsf 001.jpg. Sometimes the duplication only affects the whole name or just the extension (i.e., *.JPG also appear as *.jpg).

Anyone can tell me what is happening here? I am being driven beserk deleting the duplicates using Komparator, which crashes every few minutes. Especially when I copy files from my CD backups, how can I avoid this happening?

Ben

jschiwal 07-04-2007 03:00 PM

I have dualboot XP & SuSE 10.2. I've never seen anything like that. Where do these files exist? Did you copy them? If so how?

bnebradd 07-04-2007 03:10 PM

Hi,
Thanks for your response. Most of the files used to be stored on a FAT formatted network drive, connected through a NAS box, before being transferred to my linux EXT3 drive. I am trying to experiment and see if I can reproduce the moment they get duplicated, but right now my hard drive is littered with these duplicates, in the thousands.
Ben

jschiwal 07-04-2007 04:06 PM

How were they copied?

There is one thing to check. I want to make sure that these are indeed distinct files.
Take one such pair of filenames and use stat to get information an post the answer.

Code:

stat ABCD.JPG
  File: `ABCD.JPG'
  Size: 0              Blocks: 0          IO Block: 4096  regular empty file
Device: 305h/773d      Inode: 5472320    Links: 1
Access: (0640/-rw-r-----)  Uid: ( 1000/jschiwal)  Gid: ( 1001/jschiwal)
Access: 2007-07-04 15:18:22.000000000 -0500
Modify: 2007-07-04 15:18:22.000000000 -0500
Change: 2007-07-04 15:18:22.000000000 -0500
jschiwal@hpamd64:~> stat stat abcd.jpg
stat: cannot stat `stat': No such file or directory
  File: `abcd.jpg'
  Size: 0              Blocks: 0          IO Block: 4096  regular empty file
Device: 305h/773d      Inode: 5472323    Links: 1
Access: (0640/-rw-r-----)  Uid: ( 1000/jschiwal)  Gid: ( 1001/jschiwal)
Access: 2007-07-04 15:19:31.000000000 -0500
Modify: 2007-07-04 15:19:31.000000000 -0500
Change: 2007-07-04 15:19:31.000000000 -0500

I want to make sure that the Inodes are different. You might have hard links to the same file.

If every capital case named file has a lower-case named equivalent, a script could delete the uppercase names.

For example, cd into the directory and run:
Code:

find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
./ABC DEF.MPG
./ABCD.JPG
./EFGH.TXT

If it is ok to delete all of these files, you could use:
Code:

find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' -exec rm -v '{}' \;
If some of the ALL UPPER CASE files are uniq and you don't want to delete them, then another strategy is needed.
Let's find all of the files that have lower-case names, and delete the upper-case equivalents.
Look at the output of:
Code:

find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]'  | less
Make sure that the filenames all contain lowercase letters in the filenames.
If so, then run:
Code:

find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]'  | tr '[[:lower:]]' '[[:upper:]]' | less
The filenames should now be all upper case. If it is OK to delete these files then run:
Code:

find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]'  |
tr '[[:lower:]]' '[[:upper:]]' |
tr ' ' '\000' |
xargs -0 -L 1000 rm -v

I put this on multiple lines so it wouldn't get too long. Make sure that you press enter after the vertical bar (|). This continues the command after the pipe.
(It is actually a single command line that I entered in the shell, split up after the pipes (|) for convenience.)

The first line lists files with lowercase characters. The second converts them to upper case. Because some of the filenames can contain spaces, we need to convert return characters (\n) with NULL (\000). The third line does this. The xargs command takes the names coming in from the left of the pipe and uses them as arguments to the command listed (rm in this case). The -0 option uses NULL to separate arguments. This allows for whitespace in the filenames. The -L 1000 limits the number of arguments to 1000 at a time. This will prevent a memory overflow in the shell.

bnebradd 07-04-2007 04:48 PM

Wow! Thanks a lot, I'll study your suggestions carefully and tomorrow post the results (especially to see if the files are real duplicates or links, but I think the former, since Komparator, fdupe and dff report them as such).
Ben

bnebradd 07-05-2007 01:23 AM

Hi,
I did try with stat, and here is the result:

stat dsc00115.jpg
File: `dsc00115.jpg'
Size: 21042 Blocks: 48 IO Block: 4096 regular file
Device: fd06h/64774d Inode: 29492223 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 2000/ fabbs) Gid: ( 500/administrators)
Access: 2007-07-05 22:21:09.000000000 +0200
Modify: 2005-05-31 08:50:00.000000000 +0200
Change: 2007-06-27 23:02:53.000000000 +0200

me@desktop:/home/photographs/me> stat DSC00115.JPG
File: `DSC00115.JPG'
Size: 21042 Blocks: 48 IO Block: 4096 regular file
Device: fd06h/64774d Inode: 29393973 Links: 1
Access: (0666/-rw-rw-rw-) Uid: ( 2000/ fabbs) Gid: ( 500/administrators)
Access: 2007-07-05 22:21:09.000000000 +0200
Modify: 2005-05-31 09:50:00.000000000 +0200
Change: 2007-06-27 23:02:53.000000000 +0200

So, so far it appears that they are distinct files, right? (BTW, if I issue the ls command, the lower case appear in green, not sure if this is how it should be).

Now, since I don't know if all the uppercase files are a duplicate, and will be working on thousands of files, i'd rather take no chances. However, I am a bit confused concerning the commands above (I am beginning to get there with the command prompt, but still groping in the dark).

First, I did:

me@desktop:/home/photographs/> find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
./DSC00116.JPG
./DSC00181.JPG
./DSC00121.JPG
./DSC00147.JPG
./DSC00144.JPG
./DSC00115.JPG
./DSC00145.JPG
./DSC00120.JPG
./DSC00148.JPG
./DSC00117.JPG
./DSC00182.JPG
me@desktop:/home/photographs> ./DSC*.JPG
bash: ./DSC00115.JPG: Permission denied

if I do ls - l it returns
-rw-rw-rw- 1 me administrators 21042 2005-05-31 09:50 DSC00115.JPG
so everybody should have -rw access, but why then permission denied?

But now it gets even more confusing for me. When I follow your commands:
find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | less
and then
find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | less

less returns the list of lower case files or the list of upper case files respectively. Now, I am not sure I understand what I am supposed to do at this stage. I issued q to get out of less, or should I be doing something different? When I run the last command, which I put all in one line for easy cutting and pasting:
find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | less

the output is as follows:

me@desktop:/home/photographs> find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | tr ' ' '\000' | xargs -0 -L 1000 rm -v
rm: cannot remove `./DSC00181.JPG\n./DSC00148.JPG\n./DSC00144.JPG\n./DSC00121.JPG\n./DSC00147.JPG\n./DSC00145.JPG\n./DSC00115.JPG\n./DSC00120.JPG\n./DSC00116.JPG\n./DSC00117.JPG\n./DSC00182.JPG\n': No such file or directory

I am sure I am missing something, and have just printed the find man pages to understand thing a little better, but any help would be most appreciated.

Thanks a lot.

Ben

Wim Sturkenboom 07-05-2007 08:55 AM

Quote:

Originally Posted by bnebradd
I did try with stat, and here is the result:

stat dsc00115.jpg
File: `dsc00115.jpg'
Size: 21042 Blocks: 48 IO Block: 4096 regular file
Device: fd06h/64774d Inode: 29492223 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 2000/ fabbs) Gid: ( 500/administrators)
Access: 2007-07-05 22:21:09.000000000 +0200
Modify: 2005-05-31 08:50:00.000000000 +0200
Change: 2007-06-27 23:02:53.000000000 +0200

me@desktop:/home/photographs/me> stat DSC00115.JPG
File: `DSC00115.JPG'
Size: 21042 Blocks: 48 IO Block: 4096 regular file
Device: fd06h/64774d Inode: 29393973 Links: 1
Access: (0666/-rw-rw-rw-) Uid: ( 2000/ fabbs) Gid: ( 500/administrators)
Access: 2007-07-05 22:21:09.000000000 +0200
Modify: 2005-05-31 09:50:00.000000000 +0200
Change: 2007-06-27 23:02:53.000000000 +0200

So, so far it appears that they are distinct files, right? (BTW, if I issue the ls command, the lower case appear in green, not sure if this is how it should be).

Look at the access; the green ones have the x-permission

Quote:

Originally Posted by bnebradd
Now, since I don't know if all the uppercase files are a duplicate, and will be working on thousands of files, i'd rather take no chances. However, I am a bit confused concerning the commands above (I am beginning to get there with the command prompt, but still groping in the dark).

First, I did:

me@desktop:/home/photographs/> find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
./DSC00116.JPG
./DSC00181.JPG
./DSC00121.JPG
./DSC00147.JPG
./DSC00144.JPG
./DSC00115.JPG
./DSC00145.JPG
./DSC00120.JPG
./DSC00148.JPG
./DSC00117.JPG
./DSC00182.JPG
me@desktop:/home/photographs> ./DSC*.JPG
bash: ./DSC00115.JPG: Permission denied

if I do ls - l it returns
-rw-rw-rw- 1 me administrators 21042 2005-05-31 09:50 DSC00115.JPG
so everybody should have -rw access, but why then permission denied?

Why would you try to execute a picture? And there is no executable permission for you uppercase files (hence the error that you get).

bnebradd 07-05-2007 01:44 PM

Well, I think I have made a mess here by trying to change file permissions. I run chmode -R to change all attributes to 666. However, what happened is that I also changed permissions to all directories as well as files. Now, If I run (with all files in 666 mode)
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
and then:
desktop:/home/photographs/fabbs/Documents/Photographs # ./*.*
-bash: ./Babbo00.JPG: Permission denied
desktop:/home/photographs/fabbs/Documents/Photographs # ./*.JPG
-bash: ./Babbo00.JPG: Permission denied

ls -l returns (for the file in question):

-rw-rw-rw- 1 me mygroup 1325566 May 23 17:40 Babbo00.JPG

If I change chmod to 777 for all files, the same commands then return:

-bash: ./Babbo00.JPG: cannot execute binary file

All the above results is while running as root. If I run with my user console, I get the following results:
chmode 666 = find: cannot get current directory: Permission denied
chmode 777 = bash: ./Babbo00.JPG: cannot execute binary file

How can I get out of this mess? Is there an easy way to chmod only directories to 777 and leave all files in chmod 666?

Sorry for adding more mess, but I am new to Linux and, well, messed around with the command line.

Thanks a lot for your help.

Ben

bnebradd 07-05-2007 03:28 PM

Sorry about the previous message. I now sorted the problem with file permissions (I think) and have all files on 666 and directories on 777.

However, I run
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
and then
./*.JPG
and I still get the message
-bash: ./Babbo00.JPG: Permission denied

I tried moving the offending file to a different directory and tried again, but got the same message for the next file on the list that has any uppercase characters.

ls -l returns:
-rw-rw-rw- 1 fabbs users 1325566 2007-05-23 17:40 Babbo00.JPG

What is happening?

Tx

Ben

chrism01 07-06-2007 05:37 AM

As per Wim, why are you trying to execute (run) jpg files? They are not programs.(!)

bnebradd 07-06-2007 06:09 AM

Hi Chris,
Thanks for your response. Not sure I understand what you mean by trying to execute jpg files. Are you referring to the command
./*.JPG? That was my interpretation of Wim's suggestion that I run ./ABCD.JPG. Or have I got it wrong here and I literally should run:
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
./ABC DEF.MPG
./ABCD.JPG
./EFGH.TXT

I must admit I am trying to follow Wim's suggestions by using cutting and pasting, running ./*.JPG was my only attempt at interpreting his instructions.

Ben

Wim Sturkenboom 07-07-2007 12:23 AM

You did not interprete my post correctly and I'm sorry if that that was because my post was not clear.

In post #6, you were trying to execute a picture and I only tried to explain why
you got the specific error message.
Quote:

chmode 777 = bash: ./Babbo00.JPG: cannot execute binary file
And next you give it executable permission, but it's not a program (hence this specific error).

bnebradd 07-07-2007 04:09 AM

Thanks for the clarification.
However, now I have all file permissions set to chmode 666 (, or -rw-rw-rw-, which I understand is not executable), and directories to 777 (if I have the directory to 666 I cannot access them). Still, when I issue the command
./*.JPG
I get the message for all the files in capitals:
-bash: ./Babbo00.JPG: Permission denied
This happens also when I run the code as root. Why do I get the permission denied? Theoretically all users should be able to read-write to these files? That's when I get confused. Do I need to change any permissions? If so, how? And what does the command ./*.JPG exactly do?

If it helps, I have copied of the directory on a different disk to experiment, and the command (find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]' -exec rm -v '{}' \; ) seemed to work and delete all files in capitals, but unfortunately some of the files in capitals are unique, so I need to follow the second strategy you propose.
Tx
Ben

Wim Sturkenboom 07-08-2007 09:34 AM

All users can read and write them, but you are trying to run (execute) something that is not a program (code).

What do you want to achieve? You want to view the image? If so, you have to run a program and open the files in there.
And most programs will allow you to pass the filename as an argument.
Code:

my_program myfile.jpg
My image viewer is gqview, so if I'm in a terminal windows within my graphical environment, I can view an image like this
(when I'm in the correct directory)
Code:

gqview 01.jpg
You might be using KDE (I'm not familiar with KDE), but I think you can replace gqview with kview (if my memory serves me correctly).

By the way, I think that we're far away from your original problem.

PS
Nobody mentioned a command ./*.JPG in the previous posts as far as I can see.
You might be confused by the content in post #4
Code:

find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
./ABC DEF.MPG
./ABCD.JPG
./EFGH.TXT

which showed a command (the first line) and the results of that command.

bnebradd 07-08-2007 11:37 AM

Bingo! I thought ./*.JPG was a command, not the result.

So, to recapitulate, I do:
find . -wholename './[^[:lower:]][^[:lower:]]*\.[[:upper:]][[:upper:]][[:upper:]]'
and the result is a list of files with uppercase in all sub-directories (I can provide the full output, if necessary).
Then I do:

find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | less

and I get a list of files with lowercase letters, but only in the main directory (the ones in the sub-directory now do not appear). To exit I press q (is that what I should do at this stage?) and get back to the command prompt. Then I do:

find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' | tr '[[:lower:]]' '[[:upper:]]' | less

and I get a list of all the files in uppercase, also only in the main directory (no subdirectories). Again, I do q to exit less back to the command prompt. Then I do:

find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' |
tr '[[:lower:]]' '[[:upper:]]' |
tr ' ' '\000' |
xargs -0 -L 1000 rm -v

pressing enter each time after the vertical bar, and the result I get is:

me@desktop:~/Desktop/2006> find . -maxdepth 1 -wholename './[^[:upper:]][^[:upper:]]*\.[[:lower:]][[:lower:]][[:lower:]]' |
> tr '[[:lower:]]' '[[:upper:]]' |
> tr ' ' '\000' |
> xargs -0 -L 1000 rm -v
rm: cannot remove `./BAB2.JPG\n./BLACK.TIF\n./UNTITLED2.BMP\n./PSTEMP_3PICS.PSD\n./DSCF0007.JPG\n./DSC_0030.NEF\n./DSCF0002.JPG\n./SEPTEMBER.JPG\n./AUGUST.JPG\n./UNTITLED.BMP\n./DSCF1809.JPG\n./SALUTI.JPG\n./ORIGINAL.JPG\n./DSCF0006.JPG\n./DSCF0003.JPG\n./BAB1.JPG\n./LS.TXT\n./DSCF0190.JPG\n./DSCF0001.JPG\n': No such file or directory
me@desktop:~/Desktop/2006>

Still very confused, but I hope I am not beginning to abuse of everybody's patience..

Ben

Wim Sturkenboom 07-08-2007 12:28 PM

Don't know about your tr and xargs, but you have created one long filename by the looks of it.
It looks like you have a mistake in your second tr (compared to one posted earlier)
Code:

tr ' ' '\000' |
versus
Code:

tr '' '\000' |
But I'm not sure about this part.

alexander_bosakov 07-08-2007 03:11 PM

Well, some suggestion what's happening: It's not linux that creates the duplicate entries, it is Windows. It's because of the FAT directory entry structure - it's intended to store DOS's 8.3 file names, so when Windows introduced the long names, they used additional directory entry records. If you look at such a directory under pure DOS, you'll see more than one name for a single file, e.g. "FILEBL~1.HTM", "..B L A B L.A B", ... or something like, for the file that Windows sees as FILEBLABLABLA.HTML. So, I suppose you mounted your network drive as type "fat", which treats such a filesystem the way DOS does, instead of mounting it as "vfat", which is the windows's way.

jschiwal 07-08-2007 05:11 PM

The stat commands you showed indicate the the files have different inodes.
That means that they are duplicate files and not just duplicate entries.

Don't worry about the lower case entries being a different color. I think
that indicates that they are associated with an application to display them.
I see the same thing.

You could check if they are truely unique by using the md5sum command to calculate their
hash values. Only identical files will contain the same hash values.

Code:

find /home/photographs/ -type f -iname "*.jpg" -exec md5sum '{}' \; | sort | uniq -w32 -D >duplicate_list
The list will contain a list of the original files and their duplicates.
Examine the list and see if you have pairs of .jpg and .JPG with the same md5sum values.
If the list is OK, you could remove the .jpg entries leaving only the .JPG entries to delete:
Code:

# lets preview this once first.  If an environmental locale setting is wrong,
#sed might select both lower and upper case in some cases.
sed '/\.jpg$/d' duplicate_list
# if you see only .JPG files displayed then it is safe to proceed
sed '/\.jpg$/d' duplicate_list | tr '\n' '\000' | xargs -0 -L100 rm -v

P.S. could you edit one of your previous posts so that the width of this thread isn't 400 characters!


All times are GMT -5. The time now is 01:33 AM.