LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Duplicated filenames in the same directory (https://www.linuxquestions.org/questions/linux-software-2/duplicated-filenames-in-the-same-directory-872768/)

Wocky 04-03-2011 06:18 PM

Duplicated filenames in the same directory
 
The (WD 320GB) drive has a single ext3 FS on it. It has had some problems in the past, but all were fixed with fsck -y. Now there are several directories with duplicate filenames. The files with duplicated names are hard links of each other, but the names are identical. I've run several diagnostics over them, looking for, eg, non-printing characters in the name, but they are completely identical. Here are some examples:

Code:

$ ls -l | awk '$2 == 2{print}'
-rw-------    2 Wocky    users    27091695 Dec 17 22:51 MaliceAforethought.2010.12.19.mp3
-rw-------    2 Wocky    users    27091695 Dec 17 22:51 MaliceAforethought.2010.12.19.mp3
-rw-------    2 Wocky    users    26109909 Dec  1 12:04 RoundOnAWellKnownTheme.2010.12.05.mp3
-rw-------    2 Wocky    users    26109909 Dec  1 12:04 RoundOnAWellKnownTheme.2010.12.05.mp3
-rw-------    2 Wocky    users    27118862 Dec 10 11:51 ItsAGoodDay.2010.12.12.mp3
-rw-------    2 Wocky    users    27118862 Dec 10 11:51 ItsAGoodDay.2010.12.12.mp3
-rw-------    2 Wocky    users    27132655 Dec 13 22:48 AfterYouveGone.2010.12.12.mp3
-rw-------    2 Wocky    users    27132655 Dec 13 22:48 AfterYouveGone.2010.12.12.mp3
...
$ ls -l | awk '$2 == 2{print}' | wc -l
338
$ ls -l | awk '$2 == 2{print}' | sort | uniq | wc -l
169
$

These are (obviously) from a directory of mp3s, but similar duplications occur throughout the fs - there are several thousand files affected. Some of the diagnostics were programmes I wrote that accessed the directory itself (through the dirent structure).

I always thought duplicate filenames in the same directory were impossible in unix/linux; this appears to prove me wrong. Am I missing something?

(Kernel version 2.4.20 with xfs extensions. The installation was originally Red Hat 7, but I've changed almost everything, so it's probably more accurate to call it a custom distro.)

kbp 04-03-2011 07:38 PM

I know you said that you've checked but could you please post the output of 'ls -liq <dir>' where <dir> contains one of these duplicates ? .. your output doesn't show the full paths, can you confirm the paths are identical as well ?

carltm 04-03-2011 07:49 PM

It's possible for a filename to have a non-printing character,
such as ^H. If you had files named "file1.txt" and "file12^H.txt"
and typed "ls" it would appear that you have duplicates.

Use "ls | cat -vt" to see any non-printing characters.

Wocky 04-03-2011 07:57 PM

Quote:

Originally Posted by kbp (Post 4312830)
I know you said that you've checked but could you please post the output of 'ls -liq <dir>' where <dir> contains one of these duplicates ? .. your output doesn't show the full paths, can you confirm the paths are identical as well ?

Certainly.

Code:

$ ls -liq . | awk '$3 == 2{print}'
    356 drwx------    2 Wocky    users      86016 Mar 11 13:01 .
30736822 -rw-------    2 Wocky    users    27132655 Dec 13 22:48 AfterYouveGone.2010.12.12.mp3
30736822 -rw-------    2 Wocky    users    27132655 Dec 13 22:48 AfterYouveGone.2010.12.12.mp3
30736792 -rw-------    2 Wocky    users    27118862 Dec 10 11:51 ItsAGoodDay.2010.12.12.mp3
30736792 -rw-------    2 Wocky    users    27118862 Dec 10 11:51 ItsAGoodDay.2010.12.12.mp3
28754138 -rw-------    2 Wocky    users    27091695 Dec 17 22:51 MaliceAforethought.2010.12.19.mp3
28754138 -rw-------    2 Wocky    users    27091695 Dec 17 22:51 MaliceAforethought.2010.12.19.mp3
28852283 -rw-------    2 Wocky    users    26109909 Dec  1 12:04 RoundOnAWellKnownTheme.2010.12.05.mp3
28852283 -rw-------    2 Wocky    users    26109909 Dec  1 12:04 RoundOnAWellKnownTheme.2010.12.05.mp3
...

(This is only the first few; as indicated in the OP, there're more than 300 files. I can post the whole lot, but it's really just more of the same.)

Thanks
Wocky

Wocky 04-03-2011 08:03 PM

Quote:

Originally Posted by carltm (Post 4312836)
It's possible for a filename to have a non-printing character,
such as ^H. If you had files named "file1.txt" and "file12^H.txt"
and typed "ls" it would appear that you have duplicates.

Use "ls | cat -vt" to see any non-printing characters.

Thanks carltm. "ls -l | cat -vt" gives the same output as before. I've also tried "ls -liq", which prints the inode (-i) and quotes non-printing characters (-q).

Wocky

Wocky 04-03-2011 08:08 PM

Quote:

Originally Posted by kbp (Post 4312830)
I know you said that you've checked but could you please post the output of 'ls -liq <dir>' where <dir> contains one of these duplicates ? .. your output doesn't show the full paths, can you confirm the paths are identical as well ?


Sorry, kbp, I've just re-read that. Here it is again:

Code:

$ ls -liq /home/Wocky/temp_audio/mp3 | awk '$3 == 2{print}'
30736822 -rw-------    2 Wocky    users    27132655 Dec 13 22:48 AfterYouveGone.2010.12.12.mp3
30736822 -rw-------    2 Wocky    users    27132655 Dec 13 22:48 AfterYouveGone.2010.12.12.mp3
30736792 -rw-------    2 Wocky    users    27118862 Dec 10 11:51 ItsAGoodDay.2010.12.12.mp3
30736792 -rw-------    2 Wocky    users    27118862 Dec 10 11:51 ItsAGoodDay.2010.12.12.mp3
28754138 -rw-------    2 Wocky    users    27091695 Dec 17 22:51 MaliceAforethought.2010.12.19.mp3
28754138 -rw-------    2 Wocky    users    27091695 Dec 17 22:51 MaliceAforethought.2010.12.19.mp3
28852283 -rw-------    2 Wocky    users    26109909 Dec  1 12:04 RoundOnAWellKnownTheme.2010.12.05.mp3
28852283 -rw-------    2 Wocky    users    26109909 Dec  1 12:04 RoundOnAWellKnownTheme.2010.12.05.mp3
...

Wocky

carltm 04-03-2011 08:21 PM

This is odd. I would umount the filesystem and run fsck -y again.

jschiwal 04-03-2011 08:33 PM

The 2 is the link count to the same file.

If the inode numbers were different, they would be different files with identical names and you could use "find -inum <#> -exec mv '{}' dir/ \;" to process them.

If fsck won't correct the directory (they are files to the kernel).
One thing you might try is something like:
makedir dupes
find ./ -maxdepth 1 -links 2 -type f -exec '{}' dupes/ \;
mv dupes/* .
rmdir dupes

This will copy files with a link count of 2 in the current directory to the dupes/ subdirectory. Since they have the same name, you should end up with one file name in the subdirectory.

Maybe backup a couple of these to another directory for testing. To make sure it works. I can't test it since I can't create 2 links to the same file with the same name.


All times are GMT -5. The time now is 02:58 PM.