LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 03-01-2010, 07:35 PM   #1
Crowey
Member
 
Registered: Jun 2006
Location: Perth, WA
Distribution: RHEL
Posts: 36

Rep: Reputation: 16
REALLY Slow Shell Script


Gidday, sorry for ignorance, but I'm a truly ordinary shell scripter (and unfortunately I know nothing about any higher-level language), but I have an important script to run to count the number of files in our client directories.

Problem is that while we've not got a lot of clients (circa 100), they might have a LOT (> 1000s) of files.

And my script is REALLY slow:
# Set my main variables
MYDATE=$(date '+%Y%m%d')
MYTMP01=/tmp/comms01.txt
MYTMP02=/tmp/comms02.txt
MYLOGDIR=/Data/Software/LOGS/Daisy/
MYLOGFILE=Comms_Assets_"$MYDATE".csv

function clientsearch ()
{
find /Data/WIP/Comms/*/ -maxdepth 1 -type d -iname "_Assets"
}

function assetssearch ()
{
find . -type f -iname "*" -exec ls -1 {} \;
}

clientsearch | while read MYDUMMY

do
cd "$MYDUMMY"
assetssearch > $MYTMP01
grep -Evi '(.hsresource|.ds_store|.hsancillary|.hsicon|.hsxmap|thumbs.db)' $MYTMP01 > $MYTMP02
MYCOUNT=`\cat $MYTMP02 | wc -l`
echo $PWD,$MYCOUNT >>$MYLOGDIR$MYLOGFILE
done
Any advice as to how I might improve the efficiency of this script would be greatly appreciated.

PS How slow? Well I started a cron job at 0005 yesterday, and as of 0930 today its only a little over half-way through all our clients.

Cheers
Crowey
 
Old 03-01-2010, 08:28 PM   #2
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 111Reputation: 111
Code:
MYDATE=$(date '+%Y%m%d')
MYTMP01=/tmp/comms01.txt
MYTMP02=/tmp/comms02.txt
MYLOGDIR=/Data/Software/LOGS/Daisy/
MYLOGFILE=Comms_Assets_"$MYDATE".csv
I generally use lowercase var names and never, ever prefix "MY" all over the place. Of *course* it's yours.

Code:
function clientsearch ()
{
find /Data/WIP/Comms/*/ -maxdepth 1 -type d -iname "_Assets"
}

function assetssearch ()
{
find . -type f -iname "*" -exec ls -1 {} \;
}
Just put these lines where you actually use them. The -exec on assetsearch creates a ton of useless processes, and the -iname has no effect.

Code:
clientsearch | while read MYDUMMY

do
...
done
This does not handle newlines in filenames correctly. Anyway, considering what "clientsearch" does, it's simpler to do
Code:
for dir in /Data/WIP/Comms/*/_Assets/ ; do
   ...
done
Code:
cd "$MYDUMMY"
assetssearch > $MYTMP01
grep -Evi '(.hsresource|.ds_store|.hsancillary|.hsicon|.hsxmap|thumbs.db)' $MYTMP01 > $MYTMP02
MYCOUNT=`\cat $MYTMP02 | wc -l`
echo $PWD,$MYCOUNT >>$MYLOGDIR$MYLOGFILE
You're doing a lot of acrobatics with temp files that isn't even necessary, find handles most of it:
Code:
tally="$(find "$dir" -name ".hsresource" -o -name ".ds_store" -o -name ".hsancillary" -o -name ".hsicon" -o -name ".hsxmap" -o -name "thumbs.db" -type f -exec printf "x" ";")"
echo "$PWD,${#tally}" >> "$log"
http://mywiki.wooledge.org/BashGuide

Last edited by tuxdev; 03-01-2010 at 08:31 PM.
 
1 members found this post helpful.
Old 03-01-2010, 08:57 PM   #3
Crowey
Member
 
Registered: Jun 2006
Location: Perth, WA
Distribution: RHEL
Posts: 36

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by tuxdev View Post
Code:
cd "$MYDUMMY"
assetssearch > $MYTMP01
grep -Evi '(.hsresource|.ds_store|.hsancillary|.hsicon|.hsxmap|thumbs.db)' $MYTMP01 > $MYTMP02
MYCOUNT=`\cat $MYTMP02 | wc -l`
echo $PWD,$MYCOUNT >>$MYLOGDIR$MYLOGFILE
You're doing a lot of acrobatics with temp files that isn't even necessary, find handles most of it:
Code:
tally="$(find "$dir" -name ".hsresource" -o -name ".ds_store" -o -name ".hsancillary" -o -name ".hsicon" -o -name ".hsxmap" -o -name "thumbs.db" -type f -exec printf "x" ";")"
echo "$PWD,${#tally}" >> "$log"

http://mywiki.wooledge.org/BashGuide
Mate, awesome stuff, thank you.

Are you saying (and yes I admit to being a bit dense and a slow-learner with this stuff), that my script can be cut down to this:
Code:
for dir in /Data/WIP/Comms/*/_Assets/ ; do
    tally="$(find "$dir" -name "*" -type f -exec printf "x" ";")"
    echo "$PWD,${#tally}"
done
I have two problems with that (and I fully admit that I've probably read/interpreted you wrong) ... firstly, all those types I originally calling with grep were exceptions - I didn't want to count any file, or file in a directory, that started with .hsresource (a directory), .ds_store, .hsancillary, .hsicon, .hsxmap or thumbs.db (all files I think)

And secondly, the $PWD is doing nothing, and I was originally using that so I knew what client had the file/asset count - ignore, I replaced $PWD with $dir and that fixed that! Just the above to fix ...

But, boy, if what I've interpreted was right - its SO much faster than my stuff! So if you've any tips to incorporate the above points, then that would be greatly appreciated.

Last edited by Crowey; 03-01-2010 at 09:00 PM.
 
Old 03-01-2010, 09:01 PM   #4
KenJackson
Member
 
Registered: Jul 2006
Location: Maryland, USA
Distribution: Fedora, Arch
Posts: 572

Rep: Reputation: 64
Quote:
Originally Posted by tuxdev View Post
Code:
tally="$(find "$dir" -name ".hsresource" -o -name ".ds_store" -o -name ".hsancillary" -o -name ".hsicon" -o -name ".hsxmap" -o -name "thumbs.db" -type f -exec printf "x" ";")"
echo "$PWD,${#tally}" >> "$log"
Ah! That's clever, gathering up 'x's and counting them with ${#..}. But you're still using find.

I was wondering about this.
Code:
COUNT="$(ls *.hsresource *.ds_store *.hsancillary *.hsicon *.hsxmap thumbs.db|wc -l)"
Of course we are both assuming that there are no mixed case filenames. The original script used the -i switch on grep to match case insensitivity.
 
1 members found this post helpful.
Old 03-01-2010, 09:07 PM   #5
KenJackson
Member
 
Registered: Jul 2006
Location: Maryland, USA
Distribution: Fedora, Arch
Posts: 572

Rep: Reputation: 64
Quote:
Originally Posted by Crowey View Post
... firstly, all those types I originally calling with grep were exceptions - I didn't want to count any file, or file in a directory, that started with .hsresource (a directory), .ds_store, .hsancillary, .hsicon, .hsxmap or thumbs.db (all files I think)
Oops. You're right. -v on grep removes it's arguments.

So the little chunk I just did would be:
Code:
COUNT="$(ls |grep -Evi '.hsresource|.ds_store|.hsancillary|.hsicon|.hsxmap|thumbs.db'|wc -l)"
 
1 members found this post helpful.
Old 03-01-2010, 09:09 PM   #6
Crowey
Member
 
Registered: Jun 2006
Location: Perth, WA
Distribution: RHEL
Posts: 36

Original Poster
Rep: Reputation: 16
Quote:
Originally Posted by KenJackson View Post
Of course we are both assuming that there are no mixed case filenames. The original script used the -i switch on grep to match case insensitivity.
Yes, and case insensitivity is important in our infrastructure. And I think you both missed the v switch with grep too - doesn't that mean to ignore lines with those directories/files in it?

Also, there are many directories under the <client>/_Assets directory - but the updated fix handled that fine, I don't think ls (on its own) would.

But I'm grateful to you both for contributing!

Cheers
Crowey
 
Old 03-01-2010, 10:32 PM   #7
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 111Reputation: 111
ah, then try this instead:
Code:
tally="$(find "$dir" ! \( -name "*.hsresource" -o -name "*.ds_store" -o -name "*.hsancillary" -o -name "*.hsicon" -o -name "*.hsxmap" -o -name "thumbs.db" \) -type f -exec printf "x" \;)"
You can use "shopt -s nocaseglob" for case-insensitive glob patterns (like the one in the for)

http://mywiki.wooledge.org/UsingFind

Last edited by tuxdev; 03-01-2010 at 10:34 PM.
 
1 members found this post helpful.
Old 03-02-2010, 02:44 AM   #8
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,758

Rep: Reputation: 469Reputation: 469Reputation: 469Reputation: 469Reputation: 469
If you must use find, then use xargs instead of the '-exec' option to fins. It seems that the internal '-exec' would be faster, but my projects show that using xargs is *way* faster.
 
Old 03-02-2010, 02:59 AM   #9
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
I don't quite understand exactly what you want to do, and why one find command can't do it. But, 'find' usually takes a while to run because it has to find many files, if you want to speed that up I would use a command that keeps an index, something like slocate, that would probably be the easiest way of speeding it up, especially if these clients only add or change a few files once in a while. A single run of 'find' might be another option, then parse that for the info you need, that should be faster than running find multiple times.
 
Old 03-02-2010, 07:02 PM   #10
Crowey
Member
 
Registered: Jun 2006
Location: Perth, WA
Distribution: RHEL
Posts: 36

Original Poster
Rep: Reputation: 16
Thank you all, but especially TuxDev & KenJackson, this is what I've ended up with and it seems to work REALLY well (its super fast!)

Code:
# Set my main variables
date=$(date '+%Y%m%d')
logdir=/Data/Software/LOGS/
logfile=Company_Assets_"$date".csv

for dir in /Data/WIP/_*/*/_Assets/ /Data/WIP/_Press/_Templates\ and\ Styles/ /Data/Govt/*/ ; do
    tally="$(find "$dir" \( ! -regex '.*/\..*' \) -type f ! -iname "thumbs.db" -exec printf "x" ";")"
    echo "$dir,${#tally}" >>$logdir$logfile
done
I'm still not sure exactly how, or why, it works - but work it does.

My original miserable attempt took nearly two days, but this version literally ran in a fraction under two minutes!

So, again, thank you very much for all your help.

Cheers
Crowey
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
pass variable from one shell script into another shell script xskycamefalling Programming 9 10-03-2009 01:45 AM
How to ssh from a shell script ? For ppl who can write shell scripts. thefountainhead100 Programming 14 10-22-2008 06:24 AM
help with execute mulitple shell script within shell script ufmale Programming 6 09-13-2008 12:21 AM
shell script problem, want to use shell script auto update IP~! singying304 Programming 4 11-29-2005 05:32 PM


All times are GMT -5. The time now is 10:10 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration