LinuxQuestions.org - [SOLVED] Bash shell array processing slower than expected

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - Bash shell array processing slower than expected (https://www.linuxquestions.org/questions/linux-general-1/bash-shell-array-processing-slower-than-expected-940450/)

Bash shell array processing slower than expected

I have a bash script on Solaris to identify the last login date for an application. It extracts an ID from the first array, and searches for it in the second array, using two nested while do done loops.

The script works as designed, but is much slower than I anticipated. I have commented out portions to try to identify what makes it slow but have not identified anything. I would have thought that since everything is in memory it would run more quickly than it does (about three hours where there are 917 entries in idarray and 558 entries in loginarray). The two arrays are unfortunately not in the same sequence.

Thanks in advance...

Code:

while [ $idarraycounter -lt $idarrayrows ]

do

  id=`echo ${idarray[$idarraycounter]}`

  loginarraycounter=0

  lastlogindate=""

  while [ $loginarraycounter -lt $loginarrayrows ]

  do

    loginid=`echo ${loginarray[$loginarraycounter]} | cut -f1 -d","`

    loginidlower=`echo $loginid | tr '[:upper:]' '[:lower:]'`

    loginidupper=`echo $loginid | tr '[:lower:]' '[:upper:]'`

    if [[ "$loginid" = "$id" || "$loginidlower" = "$id" || "$loginidupper" = "$id" ]]

    then

      lastlogindate=`echo ${loginarray[$loginarraycounter]} | cut -f2 -d","`

      loginarraycounter=$loginarrayrows

    else

      ((loginarraycounter++))

    fi

  done

  # Write ID,lastlogindate

  # Note - if ID is not found (i.e. no logins), $lastlogindate will still be blank

  echo "$id,$lastlogindate"

  ((idarraycounter++))

done

Your script looks pretty good, but you are spawning a lot of processes looping over these arrays. You can try to eliminate some of them to speed it up. Try a case insensitive compare instead of spawning the 'tr's for the upper and lower compare. There is an example here:
http://www.linuxquestions.org/questi...sitive-676101/

You may also want to try to split the loginarray into two separate arrays, one with field 1 and the second with field2, this would eliminate the cut in the inner loop.

The time still seems quite long for what you are doing. I generally move to a better language when there is a need for arrays in a bash script.

you invoke at least 8 additional processes inside the double loop. that is not really efficient.
you can try to do the same with only one awk or perl script, and you will see the difference

I say don't use arrays, bash is not good at that. Use grep or sed or awk on files. Do you have input files or just arrays ?

My goto tool for something like this would be Perl. It is made for things like this, and would eliminate all external calls.

There are some BASH tricks that would reduce external calls and speed your processing, but I really think porting to perl would result in an order of magnitude improvement.

Thanks for your replies. After studying awk for a while, I decided to replace the entire inner loop with:

lastlogindate=`cat /u01/export/reports/login-ids-and-dates-master.csv | grep -i "^$id," | cut -f2 -d","`

where /u01/export/reports/login-ids-and-dates-master.csv is the file that I used to load the inner array. I thought catting the file 900 times would be slow, but the script now runs in less than one minute.

Thanks again...

even, you do not need cat:
lastlogindate=`grep -i "^$id," /u01/export/reports/login-ids-and-dates-master.csv | cut -f2 -d","`
will also work