LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Bash shell array processing slower than expected (https://www.linuxquestions.org/questions/linux-general-1/bash-shell-array-processing-slower-than-expected-940450/)

steven.c.banks 04-18-2012 07:57 AM

Bash shell array processing slower than expected
 
I have a bash script on Solaris to identify the last login date for an application. It extracts an ID from the first array, and searches for it in the second array, using two nested while do done loops.

The script works as designed, but is much slower than I anticipated. I have commented out portions to try to identify what makes it slow but have not identified anything. I would have thought that since everything is in memory it would run more quickly than it does (about three hours where there are 917 entries in idarray and 558 entries in loginarray). The two arrays are unfortunately not in the same sequence.

Thanks in advance...

Code:

while [ $idarraycounter -lt $idarrayrows ]
do
  id=`echo ${idarray[$idarraycounter]}`
  loginarraycounter=0
  lastlogindate=""
  while [ $loginarraycounter -lt $loginarrayrows ]
  do
    loginid=`echo ${loginarray[$loginarraycounter]} | cut -f1 -d","`
    loginidlower=`echo $loginid | tr '[:upper:]' '[:lower:]'`
    loginidupper=`echo $loginid | tr '[:lower:]' '[:upper:]'`
    if [[ "$loginid" = "$id" || "$loginidlower" = "$id" || "$loginidupper" = "$id" ]]
    then
      lastlogindate=`echo ${loginarray[$loginarraycounter]} | cut -f2 -d","`
      loginarraycounter=$loginarrayrows
    else
      ((loginarraycounter++))
    fi
  done
  # Write ID,lastlogindate
  # Note - if ID is not found (i.e. no logins), $lastlogindate will still be blank
  echo "$id,$lastlogindate"
  ((idarraycounter++))
done


crabboy 04-18-2012 08:38 AM

Your script looks pretty good, but you are spawning a lot of processes looping over these arrays. You can try to eliminate some of them to speed it up. Try a case insensitive compare instead of spawning the 'tr's for the upper and lower compare. There is an example here:
http://www.linuxquestions.org/questi...sitive-676101/

You may also want to try to split the loginarray into two separate arrays, one with field 1 and the second with field2, this would eliminate the cut in the inner loop.

The time still seems quite long for what you are doing. I generally move to a better language when there is a need for arrays in a bash script.

pan64 04-18-2012 08:39 AM

you invoke at least 8 additional processes inside the double loop. that is not really efficient.
you can try to do the same with only one awk or perl script, and you will see the difference

H_TeXMeX_H 04-18-2012 09:44 AM

I say don't use arrays, bash is not good at that. Use grep or sed or awk on files. Do you have input files or just arrays ?

wpeckham 04-18-2012 12:15 PM

I would use
 
My goto tool for something like this would be Perl. It is made for things like this, and would eliminate all external calls.

There are some BASH tricks that would reduce external calls and speed your processing, but I really think porting to perl would result in an order of magnitude improvement.

steven.c.banks 04-19-2012 01:51 PM

Thanks for your replies. After studying awk for a while, I decided to replace the entire inner loop with:

lastlogindate=`cat /u01/export/reports/login-ids-and-dates-master.csv | grep -i "^$id," | cut -f2 -d","`

where /u01/export/reports/login-ids-and-dates-master.csv is the file that I used to load the inner array. I thought catting the file 900 times would be slow, but the script now runs in less than one minute.

Thanks again...

pan64 04-20-2012 01:10 AM

even, you do not need cat:
lastlogindate=`grep -i "^$id," /u01/export/reports/login-ids-and-dates-master.csv | cut -f2 -d","`
will also work


All times are GMT -5. The time now is 12:05 AM.