ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Gidday, sorry for ignorance, but I'm a truly ordinary shell scripter (and unfortunately I know nothing about any higher-level language), but I have an important script to run to count the number of files in our client directories.
Problem is that while we've not got a lot of clients (circa 100), they might have a LOT (> 1000s) of files.
And my script is REALLY slow:
# Set my main variables
MYDATE=$(date '+%Y%m%d')
MYTMP01=/tmp/comms01.txt
MYTMP02=/tmp/comms02.txt
MYLOGDIR=/Data/Software/LOGS/Daisy/
MYLOGFILE=Comms_Assets_"$MYDATE".csv
function clientsearch ()
{
find /Data/WIP/Comms/*/ -maxdepth 1 -type d -iname "_Assets"
}
function assetssearch ()
{
find . -type f -iname "*" -exec ls -1 {} \;
}
Are you saying (and yes I admit to being a bit dense and a slow-learner with this stuff), that my script can be cut down to this:
Code:
for dir in /Data/WIP/Comms/*/_Assets/ ; do
tally="$(find "$dir" -name "*" -type f -exec printf "x" ";")"
echo "$PWD,${#tally}"
done
I have two problems with that (and I fully admit that I've probably read/interpreted you wrong) ... firstly, all those types I originally calling with grep were exceptions - I didn't want to count any file, or file in a directory, that started with .hsresource (a directory), .ds_store, .hsancillary, .hsicon, .hsxmap or thumbs.db (all files I think)
And secondly, the $PWD is doing nothing, and I was originally using that so I knew what client had the file/asset count - ignore, I replaced $PWD with $dir and that fixed that! Just the above to fix ...
But, boy, if what I've interpreted was right - its SO much faster than my stuff! So if you've any tips to incorporate the above points, then that would be greatly appreciated.
... firstly, all those types I originally calling with grep were exceptions - I didn't want to count any file, or file in a directory, that started with .hsresource (a directory), .ds_store, .hsancillary, .hsicon, .hsxmap or thumbs.db (all files I think)
Oops. You're right. -v on grep removes it's arguments.
Of course we are both assuming that there are no mixed case filenames. The original script used the -i switch on grep to match case insensitivity.
Yes, and case insensitivity is important in our infrastructure. And I think you both missed the v switch with grep too - doesn't that mean to ignore lines with those directories/files in it?
Also, there are many directories under the <client>/_Assets directory - but the updated fix handled that fine, I don't think ls (on its own) would.
If you must use find, then use xargs instead of the '-exec' option to fins. It seems that the internal '-exec' would be faster, but my projects show that using xargs is *way* faster.
I don't quite understand exactly what you want to do, and why one find command can't do it. But, 'find' usually takes a while to run because it has to find many files, if you want to speed that up I would use a command that keeps an index, something like slocate, that would probably be the easiest way of speeding it up, especially if these clients only add or change a few files once in a while. A single run of 'find' might be another option, then parse that for the info you need, that should be faster than running find multiple times.
Thank you all, but especially TuxDev & KenJackson, this is what I've ended up with and it seems to work REALLY well (its super fast!)
Code:
# Set my main variables
date=$(date '+%Y%m%d')
logdir=/Data/Software/LOGS/
logfile=Company_Assets_"$date".csv
for dir in /Data/WIP/_*/*/_Assets/ /Data/WIP/_Press/_Templates\ and\ Styles/ /Data/Govt/*/ ; do
tally="$(find "$dir" \( ! -regex '.*/\..*' \) -type f ! -iname "thumbs.db" -exec printf "x" ";")"
echo "$dir,${#tally}" >>$logdir$logfile
done
I'm still not sure exactly how, or why, it works - but work it does.
My original miserable attempt took nearly two days, but this version literally ran in a fraction under two minutes!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.