ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
hi, i am getting this weird awk output. i am expecting those fields to add up to 0. can anyone figure out what is going on ?
It is a precision issue.
For a double-precision floating-point number, the closest representation to -124.81 is something like -124.810000000000002273736754, and the closest representation to 24.81 is something like 24.809999999999998721023076. This is because the machine representation of double-precision floating-point numbers is based on two, and not ten. Integers are exact up to about 4503599627370496 or so. Read more in the Wikipedia article on IEEE-754, the standard that just about all computers using floating-point numbers rely on.
If you sum up the machine representations (and add the hundred), you get something like -0.00000000000000355271367, which can also be written as -0.355271367e-15 in scientific notation.
The solution is simple. Instead of using the default floating-point representation, use an explicit one:
Code:
awk -F \* '/^CAS.PR/ {a = a + $4 + $7} END {printf("%.2f\n", a)}'
This one uses two decimal digits on the right side of the decimal point, and always uses the normal (non-scientific) notation.
The GNU Awk User's Manual describes printf pretty well. It also mentions which bits are unique to it, so with a bit of care you can use it as a reference even if you are using AIX awk.
(In case you are wondering, I always use parentheses around the parameters to printf in awk to remind myself and others that it is the traditional printf, and not just awk print. printf in awk works just about exactly the same way as the printf in C. Most awks also have sprintf, which allows you to "save" the formatted string into a variable.)
It's interesting though that this only shows up at zero (in this case). If you run the code snippet without $7 it resolves "accurately". The printf solution is a good one though.
It's interesting though that this only shows up at zero (in this case). If you run the code snippet without $7 it resolves "accurately". The printf solution is a good one though.
What do you mean? Using gawk-3.1.8,
Code:
$ printf '%s\n' -124.81 100 24.81 | awk '{ s += $1 } END { print s }'
-3.55271e-15
$ awk 'BEGIN { print OFMT }'
%.6g
I suspect that is default for most awk variants. For those who are unfamiliar with printf patterns, %.6g means "the floating point value using six significant digits, using the scientific notation when necessary".
Here, the result is zero, up to the given precision (actually, up to about 18 significant digits, as one can expect from double-precision IEEE-754 floating-point numbers). The issue is that the default floating-point pattern does not know the given precision, and just uses six significant digits. It's like saying "but there is this smudge to the right side of the number, so the actual value really is a tiny bit bigger".
I think %g is stupid. If you look at my awk snippets, I tend to use %.2f or similar. There, the 2 means two decimal digits on the right side of the decimal point. That way the "smudges" don't pollute my results -- but I need to know in advance how many decimal digits I want in my results.
Of course, if you change the order of the summation, the result will change, as the loss of precision due to cancellation changes. Remember, integer values (-2^52 .. 2^52) are exact, but the other two values are not. (Just because they're exact in decimal does not make them exact in IEEE-754 representation.) In other words,
Code:
$ printf '%s\n' -124.81 24.81 100 | awk '{ s += $1 } END { print s }'
0
$ printf '%s\n' 100 -124.81 24.81 | awk '{ s += $1 } END { print s }'
-3.55271e-15
$ printf '%s\n' 24.81 100 -124.81 | awk '{ s += $1 } END { print s }'
0
awk -F \* '/^CAS.PR/ {a = a + $4} END {print a}' aix.txt
-24.81
That's -24.81, not -24.81+/- a "smudge".
But it is -24.81 plus a smudge!
The actual value used for representing -24.81 using IEEE-754 double precision floating point numbers is exactly -24.809999999999998721023075631819665431976318359375 = -6983394172191375 × 2^-48.
The smudge just gets hidden because the default OFMT says to print the six significant digits, which here are -24.8100. %g does not print trailing decimal zeros, so it gets output as -24.81.
When the result gets close to zero, the smudge is all that is left over, and that's why it gets printed. There is just no way for poor awk to know which part of the result is actual result, and which part is just rounding smudges.
There are no magic bullets for this, either. There is no way to "always use a good output format". The needed output format depends not only on the precision and range of the input variables, but also on the computation done on them also (especially since the computation is limited in precision, and cancellation and loss of precision can and do occur). It is just one of those things that us humans have to know about, and take care of ourselves.
Last edited by Nominal Animal; 06-26-2012 at 11:58 PM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.