LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 10-05-2011, 12:05 PM   #1
rusty_acorn
LQ Newbie
 
Registered: Oct 2011
Location: London
Posts: 2

Rep: Reputation: Disabled
awk / gawk / mawk optimisation


Hello World,

Been using this website for a few years on and off now but never really felt the need to ask my own question - until now. Really need some help in this little conundrum - I know somebody out there has an answer. Please help would be very grateful and mightly impressed with a solution.

Here's the problem it's an awk optimisation theme.

set e=2.7182818284590452

set n=1
set nmax=`mawk 'END {print NR}' TWT.tmp`

while ($n <= $nmax)

if ($n == 1) then

mawk '{if (NR == '"$n"') {print $0, ($3/$4) * ('"$e"'^($4*((0)/2))-1) }}' TWT.tmp >! TWT.out2

else

mawk '{if (NR == '"$n"') {print $0, (($3/$4)+'"$ans"')*('"$e"'^($4*((0.004)/2))-1)+'"$ans"'}}' TWT.tmp >>! TWT.out2

endif

set ans=`mawk '{if (NR == '"$n"') {print $5}}' TWT.out2`

@ n ++
end


Basically as you can see from the script above the input is four columns of ascii. We do some basic arithmitic on each row and get an answer (this is ouput to the fifth column $5 - columns $1 and $2 are not required only $3 and $4 are used here).

To caluclate the next row the answer from the previous row is required. The problem is I can't figure out how to get this into a handy - and most importantly quick routine.

As you can see at the moment mawk is inside a while loop (csh script) which cycles through each row and stores the previous answer as the variable $ans. This makes the whole thing far too slow to be practically useful (22 days runtime at this rate - lots of input). What I need is to get rid of the shell script and just have a handy and efficient mawk line or two, which can store the previous rows output as a variable for calculating the next row - thereby removing the need for the while loop and hopefully making it lightening quick.

Some experimental input (TWT.tmp) is given below:

0.000 0.004 1468.5207490000 0.0000100000
0.004 0.004 1468.5207490000 0.0000100000
0.008 0.004 1468.5207490000 0.0000100000
0.012 0.004 1468.5207490000 0.0000100000
0.016 0.004 1468.5207490000 0.0000100000
0.020 0.004 1468.5207490000 0.0000100000
0.024 0.004 1468.5207490000 0.0000100000
0.028 0.004 1468.5207490000 0.0000100000
0.032 0.004 1468.5207490000 0.0000100000
0.036 0.004 1468.5207490000 0.0000100000
0.040 0.004 1468.5207490000 0.0000100000
0.044 0.004 1468.5207490000 0.0000100000
0.048 0.004 1468.5207490000 0.0000100000
0.052 0.004 1468.5207490000 0.0000100000

The output (TWT.out2) should look like this

0.000 0.004 1468.5207490000 0.0000100000 0
0.004 0.004 1468.5207490000 0.0000100000 2.93704
0.008 0.004 1468.5207490000 0.0000100000 5.87408
0.012 0.004 1468.5207490000 0.0000100000 8.81112
0.016 0.004 1468.5207490000 0.0000100000 11.7482
0.020 0.004 1468.5207490000 0.0000100000 14.6852
0.024 0.004 1468.5207490000 0.0000100000 17.6222
0.028 0.004 1468.5207490000 0.0000100000 20.5592
0.032 0.004 1468.5207490000 0.0000100000 23.4962
0.036 0.004 1468.5207490000 0.0000100000 26.4332
0.040 0.004 1468.5207490000 0.0000100000 29.3702
0.044 0.004 1468.5207490000 0.0000100000 32.3072
0.048 0.004 1468.5207490000 0.0000100000 35.2442
0.052 0.004 1468.5207490000 0.0000100000 38.1812

Hope somebody can help. Many thanks in advance for your thoughts on this.
 
Old 10-06-2011, 12:58 AM   #2
A.Thyssen
Member
 
Registered: May 2006
Location: Brisbane, Australia
Posts: 119

Rep: Reputation: 32
First when posting code in a wiki, use a 'code' block rather than simply color it. That will preserve indents and and other problems (like smilies).

Your inefficency is that you are progressing through the file multiple times, once for each line of the file. You are also dealing with each line in sequence.

So why not just go though the file once only, saving the intermediate value in a internal variable!


Code:
e=2.7182818284590452
awk '
   BEGIN { e = '"$e"'; ans = 0 }
   NR == 1 { ans = ($3/$4) * (e^($4*(0/2))-1);
             print $0, ans
             next
           }
   {         ans = ($3/$4 + ans) * (e^($4*(0.004/2))-1) + ans;
             print $0, ans
           }
   ' TWT.tmp > TWT.out2
Results are the same as your script, and only does one pass through the file.

Technically the first line will always be zero as the zero in the
expression makes the result always zero.

EG: The first line may as well simply be...
Code:
      NR == 1 { print $0, 0; next; }
Similay you actually don't need to declare "ans = 0" as that is the default for any variable that has not been defined yet.
 
1 members found this post helpful.
Old 10-06-2011, 03:47 AM   #3
rusty_acorn
LQ Newbie
 
Registered: Oct 2011
Location: London
Posts: 2

Original Poster
Rep: Reputation: Disabled
Smile

Thank you that was just what I was looking for.

Learnt a lot from that one very useful indeed - many many thanks.

The key being

Code:
ans = ($3/$4 + ans) * (e^($4*(0.004/2))-1) + ans
All the best.
 
Old 10-06-2011, 06:06 PM   #4
A.Thyssen
Member
 
Registered: May 2006
Location: Brisbane, Australia
Posts: 119

Rep: Reputation: 32
Good glad to help.

Can you hit the "Rep" link on the side of my answer
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Static awk or gawk binary rbautch Linux - Software 5 06-14-2011 01:41 AM
awk speed optimisation Rusty2727 Programming 22 11-23-2009 03:37 AM
awk or gawk question sharky Programming 4 10-24-2008 01:29 PM
telling glibc to use gawk instead of mawk Virtuality Linux - Newbie 6 01-12-2008 01:22 PM
awk, mawk, gawk - which is "better"? jayeola Programming 8 12-18-2007 08:51 PM


All times are GMT -5. The time now is 11:52 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration