Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
need a command or script to
list all files recursive without directories
one line per file, no extra lines like ls -AR1
should print file size and name
eg.:
12 file.ext
25684 file2.ext
589 file3.ext
...
no, it is not homework
I have two volumes with mostly (but not exactly) same files, but completly different directory structures. Need to known wich files exist in only one volume, and wich are dups.
How can you identify a file? Are names unique within each volume (file system?)? As in any /foo/bar and /goo/bar files? If so the what further characteristics, beyond the name, will be enough to uniquely identify a file -- size in bytes, modification time, checksum ... ?
Are you only dealing with "normal" files or do you have multipli-linked files, symlinks, device files, fifos ... ?
hi,
files should identify by name and size
need not special files, but volumes does not contain any spec files, just normals
it is about 500k files in 900 GBytes in each volume, i think about 450k files are identical
the directory structure is completly different
Ouch! That's big! Performance will be significant and bash string manipulation is slow but I can't think how to handle whitespace in file names using awk (I'm not very proficient in awk so that doesn't mean it can't be done -- it almost certainly can). How about this for starters?
Code:
#!/bin/bash
find . -type f -exec /bin/ls -l {} \; | while read x x x x size x x name
do
echo $size "${name##*/}"
done
Maybe could be speeded up by using xargs on the find. The output will need sorting ...
Thank you very mouch!
The solution is exactly what i need.
Runtime is not a problem, granted one core for it, will continue in the background.
Sorting not necessary, the output will be imported into mysql, then some simple query should show the dups and diffs.
thx again!
PS:
runtime was about 45 min.
result is about 8 MB
fine
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.