ostream<< operator with double variable is not accurate, solution ?

alextb · 03-27-2010, 07:20 AM

Hello,

I have tryed out this operator on program:

Code:

    #include <iostream.h>
    #include <iomanip.h>
    #include <fstream.h>

    ofstream myfile;
    double lum;

    lum=5.1316912805772263E+10;
    myfile.open ("example.txt");
    myfile.precision(24);
    myfile<<scientific<<lum;

and in the file "example.txt" the following appears:
"5.131691280577226257324219e+10"

which definitly is not the same number.
I guess somewhere in the convertion from double to char* ("<<")
something is not right, its only my guess.

anyone has an idea what is wrong here ?

and what can i do to save these double numbers in an
accurate manner in a file?

thank you

paulsm4 · 03-28-2010, 12:12 AM

Hi -

Just because you can *print* to 24 decimal places doesn't necessarily mean your variable *has* a precision of up to 24. It doesn't

.

Google for "C/C++ floating precision", for example:

http://bytes.com/topic/c/answers/597...recision-scale

If you want consistent results, I think you probably need to either:

a) reduce the precision of your output,
and/or
b) use a larger floating point type

'Hope that helps .. PSM

Sergei Steshenko · 03-28-2010, 02:58 AM

Quote:

Originally Posted by alextb

Hello,

I have tryed out this operator on program:

Code:

    #include <iostream.h>
    #include <iomanip.h>
    #include <fstream.h>

    ofstream myfile;
    double lum;

    lum=5.1316912805772263E+10;
    myfile.open ("example.txt");
    myfile.precision(24);
    myfile<<scientific<<lum;

and in the file "example.txt" the following appears:
"5.131691280577226257324219e+10"

which definitly is not the same number.
I guess somewhere in the convertion from double to char* ("<<")
something is not right, its only my guess.

anyone has an idea what is wrong here ?

and what can i do to save these double numbers in an
accurate manner in a file?

thank you

http://en.wikipedia.org/wiki/Floating_point ; periodic and non-periodic fractions depending on radix.

alextb · 03-28-2010, 05:56 AM

I have tryed another method,
this time printing to the screen (i will also try it to a file)
using the function printf as follows:

Code:

printf("%.24E",lum);

Then the result has zeros after the 15th or 16th decimal digit and NO random numbers,
the result will appear like this:
5.131691280577226300000000E+10
and not
5.131691280577226257324219e+10

i guess i will try and use the fprintf to save these numbers to a file
instead of using the operator <<.
thank you guys.

paulsm4 · 03-28-2010, 02:00 PM

Nooooooooo!

Sergei and I were trying to get you to *think* about the issue. We failed

If you read any of the links we gave you, you would have seen something like this:

Quote:

Double precision, called "double" in the C language family, and "double precision" or "real*8" in Fortran. This is a binary format that occupies 64 bits (8 bytes) and its significand has a precision of 53 bits (about 16 decimal digits).

You can't set a precision of "24" because a double variable doesn't *have* a precision of "24".

The BEST you can do with a double variable is "15..16".

Which (uncoincidentally) is exactly what you're seeing in your data set!

Solutions?
a) use a long double
b) ignore the discrepancy
c) set your "precision" to 15

bigearsbilly · 03-28-2010, 04:17 PM

as has been said, that is the nature of floating point in computer science.

mate,

even superman couldn't detect a 0.0000000000000001 % difference between 2 objects.

alextb · 03-28-2010, 05:10 PM

Hello again,

I understand that there are only 16 decimal digits in double, but i just said that
it better error out for trying to overlay or something.
I guess what i am saying is that i am complaining here

I wished the compiler would recognized i am trying to use more digits than
possible and use only the maximum possible digits instead, whatever.

BTW,
every digit might matter,
depending on your calculations ( for example f(x)=1/(x-x0) ).

Thank you guys
roger out.

Sergei Steshenko · 03-28-2010, 05:11 PM

Quote:

Originally Posted by alextb

I have tryed another method,
this time printing to the screen (i will also try it to a file)
using the function printf as follows:

Code:

printf("%.24E",lum);

Then the result has zeros after the 15th or 16th decimal digit and NO random numbers,
the result will appear like this:
5.131691280577226300000000E+10
and not
5.131691280577226257324219e+10

i guess i will try and use the fprintf to save these numbers to a file
instead of using the operator <<.
thank you guys.

In the former USSR they first taught mathematics and only then programming.

paulsm4 · 03-28-2010, 05:43 PM

Quote:

I understand that there are only 16 decimal digits in double, but I just said that it better error out for trying to overlay or something...

Very much like saying:

Quote:

Yes, I understand that the smoke coming from under my hood is because I haven't added any oil for the last 20,000 miles.
But the idiot light on the dashboard should have come on sooner.
I think I'll sue the auto manufacturer for negligence

Sigh...

PS:
*We* understand that "every digit might matter".

But we're not sure whether *you* understand that you can't use what you don't have. In other words, the extra digits in your input file will *effectively be ignored*.

Again:

Quote:

a) reduce the precision of your output,
and/or
b) use a larger floating point type

PPS:

Quote:

I understand that there are only 16 decimal digits in double
<= Wrong.
There are *at most* "16" digits.
Sometimes, there might be only 15 digits.
As you observed in your own data set.
It depends entirely on the specific floating point value.

Sergei Steshenko · 03-28-2010, 06:03 PM

Quote:

Originally Posted by paulsm4

Very much like saying:

Sigh...

PS:
*We* understand that "every digit might matter".

But we're not sure whether *you* understand that you can't use what you don't have. In other words, the extra digits in your input file will *effectively be ignored*.

Again:

PPS:

Paul, part of the OP's question is:

Quote:

solution ?

.

A very short answer to the OP (not that your long answer is wrong) would be: get educated (using, for example, already provided links).

The sadness of modern world is that, for example, in an IP bought for tens of thousands of dollars, in its test suite (!) I saw something (actually written in Verilog) that in "C" could be written as

Code:

for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
  {
  <run_tests>
  }

paulsm4 · 03-28-2010, 09:38 PM

Sergei -

Did they notice that the test suite seemed to be running a bit longer than they expected

?

For the record, I'm guessing that the code resembled this:

Code:

  #define LIMIT 10.0
  int minor_ver;
  for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
  {
  <run_tests>
  }
  // This will truncate "minor_ver += 0.01" to "minor_ver += (int)0",
  // which results in an infinite loop

This, on the other hand, would be OK:

Code:

  #define LIMIT 10.0
  double minor_ver;
  for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
  {
  <run_tests>
  }
  // Integer "0" and floating point "0.0" are the same; they can
  // be interchanged freely

And this is (at best) naive, and (at worst) cataclysmic:

Code:

  #define LIMIT 10.0
  double minor_ver;
  for(minor_ver = 0; minor_ver != LIMIT; minor_ver += 0.01)
  {
  <run_tests>
  }
  // There's no guarantee that "real + real" will ever necessarily be
  // *exactly* some particular value.  And even if it happens to
  // "seem" to work on one combination of platform/compiler, this
  // code might break using a *different* compiler or running on a
  // different platform...

PPS:
As it happens, I *did* give a solution (in Post #2), and then I repeated it (in Posts #5 and #9).

But I'm not sure a "solution" is ever terribly useful without understanding the underlying "why".

And I'm not sure understanding the underlying "why" is even useful if somebody has preconceived notions of how things "should" work, or insists that everything be "easy". To quote Bob Dylan: "Now I wish I could write you a melody so plain...

Sergei Steshenko · 03-28-2010, 10:12 PM

Quote:

Originally Posted by paulsm4

Sergei -

Did they notice that the test suite seemed to be running a bit longer than they expected

?

For the record, I'm guessing that the code resembled this:

Code:

  #define LIMIT 10.0
  int minor_ver;
  for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
  {
  <run_tests>
  }
  // This will truncate "minor_ver += 0.01" to "minor_ver += (int)0",
  // which results in an infinite loop

This, on the other hand, would be OK:

Code:

  #define LIMIT 10.0
  double minor_ver;
  for(minor_ver = 0; minor_ver < LIMIT; minor_ver += 0.01)
  {
  <run_tests>
  }
  // Integer "0" and floating point "0.0" are the same; they can
  // be interchanged freely

And this is (at best) naive, and (at worst) cataclysmic:

Code:

  #define LIMIT 10.0
  double minor_ver;
  for(minor_ver = 0; minor_ver != LIMIT; minor_ver += 0.01)
  {
  <run_tests>
  }
  // There's no guarantee that "real + real" will ever necessarily be
  // *exactly* some particular value.  And even if it happens to
  // "seem" to work on one combination of platform/compiler, this
  // code might break using a *different* compiler or running on a
  // different platform...

PPS:
As it happens, I *did* give a solution (in Post #2), and then I repeated it (in Posts #5 and #9).

But I'm not sure a "solution" is ever terribly useful without understanding the underlying "why".

And I'm not sure understanding the underlying "why" is even useful if somebody has preconceived notions of how things "should" work, or insists that everything be "easy". To quote Bob Dylan: "Now I wish I could write you a melody so plain...

The problem was they had noticed nothing - luckily for them they hadn't been hit by accuracy issue for a particular set of values.

The code

Code:

for(foo = 0; foo < LIMIT; foo += any_fp_number)

is wrong by definition - even when foo is of some FP type.

The code in question was your second example, which is wrong. I.e. even if it works, it works by luck. The potential problem is you can't guarantee number of iterations, i.e. you can't predict when 'minor_ver < LIMIT' goes FALSE.

This is because when you all the time add in this example 0.01, you can't guarantee that mathematically the sum is a multiple of 0.01.

So, if LIMIT 0.1, i.e. you expect 10 iterations, in reality you can have eleven - this is because after ten iterations you might have minor_ver to be actually equal something like 0.00999999999999999837, i.e. slightly less than the expected 0.01, so the comparison will still yield TRUE instead of the expected false.

And that was my point.

Instead only integer increment should be used, and the FP entities (which they didn't need in the first place, they were plain morons to use the FP code to represent in Verilog versions like 3.01, 3.02, etc. just for printing, and it was easy for them to print FP numbers) should be calculated based on the integer loop counter.

I am just reiterating the point - in the former USSR they first taught mathematics, then - numerical methods of computation and only then - programming. IMO a person not knowing the relevant mathematics chapters should not be at all allowed to touch floating point calculations.

I see threads/questions like this all over the place - the root cause is always the same - not knowing mathematics related to radix, representation of numbers, conceptual relationship between integer and floating point number in computers, lack of knowledge that in a digital computer ultimately all numbers are integer. I.e. there is no way sqrt(2) will be presented with needed for it infinite precision, there is no way a non-symbolic program will guarantee that sqrt(2) * sqrt(2) == 2, etc.

...

If one insists on a FP loop, it can be used, but this way:

Code:

#define LIMIT 0.1
#define STEP 0.01
#define GUARDBAND (0.5 * STEP)

double foo;

for(foo = 0.0; foo < (LIMIT - GUARDBAND); foo += STEP)
  {
  <loop_body>
  } // the <loop_body> is guaranteed to be executed exactly LIMIT/STEP times.

The word "guaranteed" should be taken with a grain of salt - as STEP approaches FP mantissa resolution, the guarantee vanishes.

I actually had a real problem without the GUARDBAND - not in my code. It was related to IC layout SW, and sometimes number of rows/columns in the layout was off by one. The root cause happened to be lack of GUARDBAND in the calculation. This was the last problem before closing a $1.5M or so contract (in 2002-2003 prices) with the customer.

paulsm4 · 03-28-2010, 11:16 PM

I agree with you completely; that's exactly the point I was trying to make in all *three* examples:

Quote:

code in question ... is wrong. I.e. even if it works, it works by luck.

The technical term is "Programming by Coincidence". A practice which is, unfortunately, all too prevalent

Sergei Steshenko · 03-29-2010, 12:23 AM

Quote:

Originally Posted by paulsm4

I agree with you completely; that's exactly the point I was trying to make in all *three* examples:

The technical term is "Programming by Coincidence". A practice which is, unfortunately, all too prevalent

And, if we are talking about Dylan, then "Take what you have gathered from coincidence" from "It's All Over Now, Baby Blue" - like it very much in Grateful Dead performance.

...

There was a story on slashdot about US soldiers death from SCUD - the root cause was accumulation of FP error while adding:

Code:

time_value += time_step;

.
The launcher system and the target detection system had physically different clocks which had been once synchronized; one of the clocks had greater step/accumulator precision than the other, so the launch command "launch at time T" was actually executed with an error of a couple of seconds or so, and "Patriot" couldn't hit the "SCUD".

The faulty clock had 24 bits mantissa - quite sufficient had it been properly implemented:

Code:

unsigned u_time;
...
time_value = (typeof(time_value))u_time++ * time_step;

graemef · 03-29-2010, 02:19 AM

If you spend a little time looking at the following it may help you understand what paulsm4 and Sergei are getting at.

Code:

1       0.500000000000     0.500000000000
1       0.250000000000     0.250000000000
0       0.125000000000     0.000000000000
0       0.062500000000     0.000000000000
0       0.031250000000     0.000000000000
1       0.015625000000     0.015625000000
0       0.007812500000     0.000000000000
1       0.003906250000     0.003906250000
1       0.001953125000     0.001953125000
0       0.000976562500     0.000000000000
1       0.000488281250     0.000488281250
1       0.000244140625     0.000244140625
0       0.000122070313     0.000000000000
0       0.000061035156     0.000000000000
1       0.000030517578     0.000030517578
1       0.000015258789     0.000015258789
                           0.772262573242