LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   bash command or sript to list files (https://www.linuxquestions.org/questions/linux-software-2/bash-command-or-sript-to-list-files-750181/)

sepi 08-25-2009 02:26 PM

bash command or sript to list files
 
Hello,

need a command or script to
list all files recursive without directories
one line per file, no extra lines like ls -AR1
should print file size and name
eg.:
12 file.ext
25684 file2.ext
589 file3.ext
...

catkin 08-25-2009 02:28 PM

Is this homework? Do you have any ideas? Have you tried anything?

sepi 08-25-2009 02:33 PM

no, it is not homework :)
I have two volumes with mostly (but not exactly) same files, but completly different directory structures. Need to known wich files exist in only one volume, and wich are dups.

tried ls with awk but all print extra lines

catkin 08-25-2009 02:45 PM

How can you identify a file? Are names unique within each volume (file system?)? As in any /foo/bar and /goo/bar files? If so the what further characteristics, beyond the name, will be enough to uniquely identify a file -- size in bytes, modification time, checksum ... ?

Are you only dealing with "normal" files or do you have multipli-linked files, symlinks, device files, fifos ... ?

catkin 08-25-2009 02:47 PM

How many files, roughly, in total?

sepi 08-25-2009 03:06 PM

hi,
files should identify by name and size
need not special files, but volumes does not contain any spec files, just normals
it is about 500k files in 900 GBytes in each volume, i think about 450k files are identical
the directory structure is completly different

catkin 08-25-2009 03:47 PM

Ouch! That's big! Performance will be significant and bash string manipulation is slow but I can't think how to handle whitespace in file names using awk (I'm not very proficient in awk so that doesn't mean it can't be done -- it almost certainly can). How about this for starters?
Code:

#!/bin/bash
find . -type f -exec /bin/ls -l {} \; | while read x x x x size x x name
do
        echo $size "${name##*/}"
done

Maybe could be speeded up by using xargs on the find. The output will need sorting ...

sepi 08-25-2009 04:15 PM

Thank you very mouch!
The solution is exactly what i need.
Runtime is not a problem, granted one core for it, will continue in the background.
Sorting not necessary, the output will be imported into mysql, then some simple query should show the dups and diffs.
thx again!

PS:
runtime was about 45 min.
result is about 8 MB
fine


All times are GMT -5. The time now is 08:46 AM.