ostream<< operator with double variable is not accurate, solution ?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have tryed another method,
this time printing to the screen (i will also try it to a file)
using the function printf as follows:
Code:
printf("%.24E",lum);
Then the result has zeros after the 15th or 16th decimal digit and NO random numbers,
the result will appear like this:
5.131691280577226300000000E+10
and not
5.131691280577226257324219e+10
i guess i will try and use the fprintf to save these numbers to a file
instead of using the operator <<.
thank you guys.
Sergei and I were trying to get you to *think* about the issue. We failed
If you read any of the links we gave you, you would have seen something like this:
Quote:
Double precision, called "double" in the C language family, and "double precision" or "real*8" in Fortran. This is a binary format that occupies 64 bits (8 bytes) and its significand has a precision of 53 bits (about 16 decimal digits).
You can't set a precision of "24" because a double variable doesn't *have* a precision of "24".
The BEST you can do with a double variable is "15..16".
Which (uncoincidentally) is exactly what you're seeing in your data set!
Solutions?
a) use a long double
b) ignore the discrepancy
c) set your "precision" to 15
I understand that there are only 16 decimal digits in double, but i just said that
it better error out for trying to overlay or something.
I guess what i am saying is that i am complaining here
I wished the compiler would recognized i am trying to use more digits than
possible and use only the maximum possible digits instead, whatever.
BTW,
every digit might matter,
depending on your calculations ( for example f(x)=1/(x-x0) ).
I have tryed another method,
this time printing to the screen (i will also try it to a file)
using the function printf as follows:
Code:
printf("%.24E",lum);
Then the result has zeros after the 15th or 16th decimal digit and NO random numbers,
the result will appear like this:
5.131691280577226300000000E+10
and not
5.131691280577226257324219e+10
i guess i will try and use the fprintf to save these numbers to a file
instead of using the operator <<.
thank you guys.
In the former USSR they first taught mathematics and only then programming.
I understand that there are only 16 decimal digits in double, but I just said that it better error out for trying to overlay or something...
Very much like saying:
Quote:
Yes, I understand that the smoke coming from under my hood is because I haven't added any oil for the last 20,000 miles.
But the idiot light on the dashboard should have come on sooner.
I think I'll sue the auto manufacturer for negligence
Sigh...
PS:
*We* understand that "every digit might matter".
But we're not sure whether *you* understand that you can't use what you don't have. In other words, the extra digits in your input file will *effectively be ignored*.
Again:
Quote:
a) reduce the precision of your output,
and/or
b) use a larger floating point type
PPS:
Quote:
I understand that there are only 16 decimal digits in double
<= Wrong.
There are *at most* "16" digits.
Sometimes, there might be only 15 digits.
As you observed in your own data set.
It depends entirely on the specific floating point value.
PS:
*We* understand that "every digit might matter".
But we're not sure whether *you* understand that you can't use what you don't have. In other words, the extra digits in your input file will *effectively be ignored*.
Again:
PPS:
Paul, part of the OP's question is:
Quote:
solution ?
.
A very short answer to the OP (not that your long answer is wrong) would be: get educated (using, for example, already provided links).
The sadness of modern world is that, for example, in an IP bought for tens of thousands of dollars, in its test suite (!) I saw something (actually written in Verilog) that in "C" could be written as
Did they notice that the test suite seemed to be running a bit longer than they expected ?
For the record, I'm guessing that the code resembled this:
Code:
#define LIMIT 10.0
int minor_ver;
for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
{
<run_tests>
}
// This will truncate "minor_ver += 0.01" to "minor_ver += (int)0",
// which results in an infinite loop
This, on the other hand, would be OK:
Code:
#define LIMIT 10.0
double minor_ver;
for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
{
<run_tests>
}
// Integer "0" and floating point "0.0" are the same; they can
// be interchanged freely
And this is (at best) naive, and (at worst) cataclysmic:
Code:
#define LIMIT 10.0
double minor_ver;
for(minor_ver = 0; minor_ver != LIMIT; minor_ver += 0.01)
{
<run_tests>
}
// There's no guarantee that "real + real" will ever necessarily be
// *exactly* some particular value. And even if it happens to
// "seem" to work on one combination of platform/compiler, this
// code might break using a *different* compiler or running on a
// different platform...
PPS:
As it happens, I *did* give a solution (in Post #2), and then I repeated it (in Posts #5 and #9).
But I'm not sure a "solution" is ever terribly useful without understanding the underlying "why".
And I'm not sure understanding the underlying "why" is even useful if somebody has preconceived notions of how things "should" work, or insists that everything be "easy". To quote Bob Dylan: "Now I wish I could write you a melody so plain...
Did they notice that the test suite seemed to be running a bit longer than they expected ?
For the record, I'm guessing that the code resembled this:
Code:
#define LIMIT 10.0
int minor_ver;
for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
{
<run_tests>
}
// This will truncate "minor_ver += 0.01" to "minor_ver += (int)0",
// which results in an infinite loop
This, on the other hand, would be OK:
Code:
#define LIMIT 10.0
double minor_ver;
for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
{
<run_tests>
}
// Integer "0" and floating point "0.0" are the same; they can
// be interchanged freely
And this is (at best) naive, and (at worst) cataclysmic:
Code:
#define LIMIT 10.0
double minor_ver;
for(minor_ver = 0; minor_ver != LIMIT; minor_ver += 0.01)
{
<run_tests>
}
// There's no guarantee that "real + real" will ever necessarily be
// *exactly* some particular value. And even if it happens to
// "seem" to work on one combination of platform/compiler, this
// code might break using a *different* compiler or running on a
// different platform...
PPS:
As it happens, I *did* give a solution (in Post #2), and then I repeated it (in Posts #5 and #9).
But I'm not sure a "solution" is ever terribly useful without understanding the underlying "why".
And I'm not sure understanding the underlying "why" is even useful if somebody has preconceived notions of how things "should" work, or insists that everything be "easy". To quote Bob Dylan: "Now I wish I could write you a melody so plain...
The problem was they had noticed nothing - luckily for them they hadn't been hit by accuracy issue for a particular set of values.
The code
Code:
for(foo = 0; foo < LIMIT; foo += any_fp_number)
is wrong by definition - even when foo is of some FP type.
The code in question was your second example, which is wrong. I.e. even if it works, it works by luck. The potential problem is you can't guarantee number of iterations, i.e. you can't predict when 'minor_ver < LIMIT' goes FALSE.
This is because when you all the time add in this example 0.01, you can't guarantee that mathematically the sum is a multiple of 0.01.
So, if LIMIT 0.1, i.e. you expect 10 iterations, in reality you can have eleven - this is because after ten iterations you might have minor_ver to be actually equal something like 0.00999999999999999837, i.e. slightly less than the expected 0.01, so the comparison will still yield TRUE instead of the expected false.
And that was my point.
Instead only integer increment should be used, and the FP entities (which they didn't need in the first place, they were plain morons to use the FP code to represent in Verilog versions like 3.01, 3.02, etc. just for printing, and it was easy for them to print FP numbers) should be calculated based on the integer loop counter.
I am just reiterating the point - in the former USSR they first taught mathematics, then - numerical methods of computation and only then - programming. IMO a person not knowing the relevant mathematics chapters should not be at all allowed to touch floating point calculations.
I see threads/questions like this all over the place - the root cause is always the same - not knowing mathematics related to radix, representation of numbers, conceptual relationship between integer and floating point number in computers, lack of knowledge that in a digital computer ultimately all numbers are integer. I.e. there is no way sqrt(2) will be presented with needed for it infinite precision, there is no way a non-symbolic program will guarantee that sqrt(2) * sqrt(2) == 2, etc.
...
If one insists on a FP loop, it can be used, but this way:
Code:
#define LIMIT 0.1
#define STEP 0.01
#define GUARDBAND (0.5 * STEP)
double foo;
for(foo = 0.0; foo < (LIMIT - GUARDBAND); foo += STEP)
{
<loop_body>
} // the <loop_body> is guaranteed to be executed exactly LIMIT/STEP times.
The word "guaranteed" should be taken with a grain of salt - as STEP approaches FP mantissa resolution, the guarantee vanishes.
I actually had a real problem without the GUARDBAND - not in my code. It was related to IC layout SW, and sometimes number of rows/columns in the layout was off by one. The root cause happened to be lack of GUARDBAND in the calculation. This was the last problem before closing a $1.5M or so contract (in 2002-2003 prices) with the customer.
I agree with you completely; that's exactly the point I was trying to make in all *three* examples:
The technical term is "Programming by Coincidence". A practice which is, unfortunately, all too prevalent
And, if we are talking about Dylan, then "Take what you have gathered from coincidence" from "It's All Over Now, Baby Blue" - like it very much in Grateful Dead performance.
...
There was a story on slashdot about US soldiers death from SCUD - the root cause was accumulation of FP error while adding:
Code:
time_value += time_step;
.
The launcher system and the target detection system had physically different clocks which had been once synchronized; one of the clocks had greater step/accumulator precision than the other, so the launch command "launch at time T" was actually executed with an error of a couple of seconds or so, and "Patriot" couldn't hit the "SCUD".
The faulty clock had 24 bits mantissa - quite sufficient had it been properly implemented:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.