Visit the LQ Articles and Editorials section
 Home Forums HCL Reviews Tutorials Articles Register Search Today's Posts Mark Forums Read
 LinuxQuestions.org [SOLVED] awk / gawk / mawk optimisation
 Linux - General This Linux forum is for general Linux questions and discussion. If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

 10-05-2011, 01:05 PM #1 rusty_acorn LQ Newbie   Registered: Oct 2011 Location: London Posts: 2 Rep: awk / gawk / mawk optimisation Hello World, Been using this website for a few years on and off now but never really felt the need to ask my own question - until now. Really need some help in this little conundrum - I know somebody out there has an answer. Please help would be very grateful and mightly impressed with a solution. Here's the problem it's an awk optimisation theme. set e=2.7182818284590452 set n=1 set nmax=`mawk 'END {print NR}' TWT.tmp` while (\$n <= \$nmax) if (\$n == 1) then mawk '{if (NR == '"\$n"') {print \$0, (\$3/\$4) * ('"\$e"'^(\$4*((0)/2))-1) }}' TWT.tmp >! TWT.out2 else mawk '{if (NR == '"\$n"') {print \$0, ((\$3/\$4)+'"\$ans"')*('"\$e"'^(\$4*((0.004)/2))-1)+'"\$ans"'}}' TWT.tmp >>! TWT.out2 endif set ans=`mawk '{if (NR == '"\$n"') {print \$5}}' TWT.out2` @ n ++ end Basically as you can see from the script above the input is four columns of ascii. We do some basic arithmitic on each row and get an answer (this is ouput to the fifth column \$5 - columns \$1 and \$2 are not required only \$3 and \$4 are used here). To caluclate the next row the answer from the previous row is required. The problem is I can't figure out how to get this into a handy - and most importantly quick routine. As you can see at the moment mawk is inside a while loop (csh script) which cycles through each row and stores the previous answer as the variable \$ans. This makes the whole thing far too slow to be practically useful (22 days runtime at this rate - lots of input). What I need is to get rid of the shell script and just have a handy and efficient mawk line or two, which can store the previous rows output as a variable for calculating the next row - thereby removing the need for the while loop and hopefully making it lightening quick. Some experimental input (TWT.tmp) is given below: 0.000 0.004 1468.5207490000 0.0000100000 0.004 0.004 1468.5207490000 0.0000100000 0.008 0.004 1468.5207490000 0.0000100000 0.012 0.004 1468.5207490000 0.0000100000 0.016 0.004 1468.5207490000 0.0000100000 0.020 0.004 1468.5207490000 0.0000100000 0.024 0.004 1468.5207490000 0.0000100000 0.028 0.004 1468.5207490000 0.0000100000 0.032 0.004 1468.5207490000 0.0000100000 0.036 0.004 1468.5207490000 0.0000100000 0.040 0.004 1468.5207490000 0.0000100000 0.044 0.004 1468.5207490000 0.0000100000 0.048 0.004 1468.5207490000 0.0000100000 0.052 0.004 1468.5207490000 0.0000100000 The output (TWT.out2) should look like this 0.000 0.004 1468.5207490000 0.0000100000 0 0.004 0.004 1468.5207490000 0.0000100000 2.93704 0.008 0.004 1468.5207490000 0.0000100000 5.87408 0.012 0.004 1468.5207490000 0.0000100000 8.81112 0.016 0.004 1468.5207490000 0.0000100000 11.7482 0.020 0.004 1468.5207490000 0.0000100000 14.6852 0.024 0.004 1468.5207490000 0.0000100000 17.6222 0.028 0.004 1468.5207490000 0.0000100000 20.5592 0.032 0.004 1468.5207490000 0.0000100000 23.4962 0.036 0.004 1468.5207490000 0.0000100000 26.4332 0.040 0.004 1468.5207490000 0.0000100000 29.3702 0.044 0.004 1468.5207490000 0.0000100000 32.3072 0.048 0.004 1468.5207490000 0.0000100000 35.2442 0.052 0.004 1468.5207490000 0.0000100000 38.1812 Hope somebody can help. Many thanks in advance for your thoughts on this.
 10-06-2011, 01:58 AM #2 A.Thyssen Member   Registered: May 2006 Location: Brisbane, Australia Posts: 119 Rep: First when posting code in a wiki, use a 'code' block rather than simply color it. That will preserve indents and and other problems (like smilies). Your inefficency is that you are progressing through the file multiple times, once for each line of the file. You are also dealing with each line in sequence. So why not just go though the file once only, saving the intermediate value in a internal variable! Code: ```e=2.7182818284590452 awk ' BEGIN { e = '"\$e"'; ans = 0 } NR == 1 { ans = (\$3/\$4) * (e^(\$4*(0/2))-1); print \$0, ans next } { ans = (\$3/\$4 + ans) * (e^(\$4*(0.004/2))-1) + ans; print \$0, ans } ' TWT.tmp > TWT.out2``` Results are the same as your script, and only does one pass through the file. Technically the first line will always be zero as the zero in the expression makes the result always zero. EG: The first line may as well simply be... Code: ` NR == 1 { print \$0, 0; next; }` Similay you actually don't need to declare "ans = 0" as that is the default for any variable that has not been defined yet. 1 members found this post helpful.
 10-06-2011, 04:47 AM #3 rusty_acorn LQ Newbie   Registered: Oct 2011 Location: London Posts: 2 Original Poster Rep: Thank you that was just what I was looking for. Learnt a lot from that one very useful indeed - many many thanks. The key being Code: `ans = (\$3/\$4 + ans) * (e^(\$4*(0.004/2))-1) + ans` All the best.
 10-06-2011, 07:06 PM #4 A.Thyssen Member   Registered: May 2006 Location: Brisbane, Australia Posts: 119 Rep: Good glad to help. Can you hit the "Rep" link on the side of my answer 1 members found this post helpful.