LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   awk / gawk / mawk optimisation (http://www.linuxquestions.org/questions/linux-general-1/awk-gawk-mawk-optimisation-906604/)

rusty_acorn 10-05-2011 01:05 PM

awk / gawk / mawk optimisation
 
Hello World,

Been using this website for a few years on and off now but never really felt the need to ask my own question - until now. Really need some help in this little conundrum - I know somebody out there has an answer. Please help would be very grateful and mightly impressed with a solution.

Here's the problem it's an awk optimisation theme.

set e=2.7182818284590452

set n=1
set nmax=`mawk 'END {print NR}' TWT.tmp`

while ($n <= $nmax)

if ($n == 1) then

mawk '{if (NR == '"$n"') {print $0, ($3/$4) * ('"$e"'^($4*((0)/2))-1) }}' TWT.tmp >! TWT.out2

else

mawk '{if (NR == '"$n"') {print $0, (($3/$4)+'"$ans"')*('"$e"'^($4*((0.004)/2))-1)+'"$ans"'}}' TWT.tmp >>! TWT.out2

endif

set ans=`mawk '{if (NR == '"$n"') {print $5}}' TWT.out2`

@ n ++
end


Basically as you can see from the script above the input is four columns of ascii. We do some basic arithmitic on each row and get an answer (this is ouput to the fifth column $5 - columns $1 and $2 are not required only $3 and $4 are used here).

To caluclate the next row the answer from the previous row is required. The problem is I can't figure out how to get this into a handy - and most importantly quick routine.

As you can see at the moment mawk is inside a while loop (csh script) which cycles through each row and stores the previous answer as the variable $ans. This makes the whole thing far too slow to be practically useful (22 days runtime at this rate - lots of input). What I need is to get rid of the shell script and just have a handy and efficient mawk line or two, which can store the previous rows output as a variable for calculating the next row - thereby removing the need for the while loop and hopefully making it lightening quick.

Some experimental input (TWT.tmp) is given below:

0.000 0.004 1468.5207490000 0.0000100000
0.004 0.004 1468.5207490000 0.0000100000
0.008 0.004 1468.5207490000 0.0000100000
0.012 0.004 1468.5207490000 0.0000100000
0.016 0.004 1468.5207490000 0.0000100000
0.020 0.004 1468.5207490000 0.0000100000
0.024 0.004 1468.5207490000 0.0000100000
0.028 0.004 1468.5207490000 0.0000100000
0.032 0.004 1468.5207490000 0.0000100000
0.036 0.004 1468.5207490000 0.0000100000
0.040 0.004 1468.5207490000 0.0000100000
0.044 0.004 1468.5207490000 0.0000100000
0.048 0.004 1468.5207490000 0.0000100000
0.052 0.004 1468.5207490000 0.0000100000

The output (TWT.out2) should look like this

0.000 0.004 1468.5207490000 0.0000100000 0
0.004 0.004 1468.5207490000 0.0000100000 2.93704
0.008 0.004 1468.5207490000 0.0000100000 5.87408
0.012 0.004 1468.5207490000 0.0000100000 8.81112
0.016 0.004 1468.5207490000 0.0000100000 11.7482
0.020 0.004 1468.5207490000 0.0000100000 14.6852
0.024 0.004 1468.5207490000 0.0000100000 17.6222
0.028 0.004 1468.5207490000 0.0000100000 20.5592
0.032 0.004 1468.5207490000 0.0000100000 23.4962
0.036 0.004 1468.5207490000 0.0000100000 26.4332
0.040 0.004 1468.5207490000 0.0000100000 29.3702
0.044 0.004 1468.5207490000 0.0000100000 32.3072
0.048 0.004 1468.5207490000 0.0000100000 35.2442
0.052 0.004 1468.5207490000 0.0000100000 38.1812

Hope somebody can help. Many thanks in advance for your thoughts on this.

A.Thyssen 10-06-2011 01:58 AM

First when posting code in a wiki, use a 'code' block rather than simply color it. That will preserve indents and and other problems (like smilies).

Your inefficency is that you are progressing through the file multiple times, once for each line of the file. You are also dealing with each line in sequence.

So why not just go though the file once only, saving the intermediate value in a internal variable!


Code:

e=2.7182818284590452
awk '
  BEGIN { e = '"$e"'; ans = 0 }
  NR == 1 { ans = ($3/$4) * (e^($4*(0/2))-1);
            print $0, ans
            next
          }
  {        ans = ($3/$4 + ans) * (e^($4*(0.004/2))-1) + ans;
            print $0, ans
          }
  ' TWT.tmp > TWT.out2

Results are the same as your script, and only does one pass through the file.

Technically the first line will always be zero as the zero in the
expression makes the result always zero.

EG: The first line may as well simply be...
Code:

      NR == 1 { print $0, 0; next; }
Similay you actually don't need to declare "ans = 0" as that is the default for any variable that has not been defined yet.

rusty_acorn 10-06-2011 04:47 AM

Thank you that was just what I was looking for. :hattip:

Learnt a lot from that one very useful indeed - many many thanks.

The key being

Code:

ans = ($3/$4 + ans) * (e^($4*(0.004/2))-1) + ans
All the best.

A.Thyssen 10-06-2011 07:06 PM

Good glad to help.

Can you hit the "Rep" link on the side of my answer ;)


All times are GMT -5. The time now is 10:56 AM.