LinuxQuestions.org - c++ - how to avoid truncating or rounding a double numerical value

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - c++ - how to avoid truncating or rounding a double numerical value (https://www.linuxquestions.org/questions/programming-9/c-how-to-avoid-truncating-or-rounding-a-double-numerical-value-642979/)

babag

05-18-2008 12:19 AM

c++ - how to avoid truncating or rounding a double numerical value

working in mandriva linux 2007.1. compiling with g++.

i have some lengthy variable values that are rounding off
when i don't want them to. how can i avoid this?

an example is the value:

Code:

0.13134765625

i have tried declaring it as:

Code:

float RatioTrkToFullAP =  0.13134765625

double RatioTrkToFullAP =  0.13134765625

long double RatioTrkToFullAP =  0.13134765625

but all of these, when run as:

Code:

cout << RatioTrkToFullAP << endl;

print out as:

Code:

0.131348

how do i declare these so they'll retain the full value
of the variable. i really do require this level of
precision.

thanks,
BabaG

blackhole54

05-18-2008 01:03 AM

I am not really familiar with "cout," but I think the truncation is probably just happening when you print. Internally it probably has the proper precision. Try adjusting the format on cout or use fprintf formatted for sufficient precision.

paulsm4

05-18-2008 01:41 AM

Hi -

What you're looking for are C++ "manipulators" like "width()" and "precision()":

http://www.codeguru.com/forum/archiv.../t-295798.html

http://www.arachnoid.com/cpptutor/student3.html

babag

05-18-2008 12:56 PM

thanks to you both. i'd forgotten about setprecision.
this worked for the given example:

Code:

#include <iostream>

#include <iomanip>



double RatioTrkToFullAP = 0.13134765625;

cout << setflags(ios::fixed) <<setprecision(11) << RatioTrkToFullAP << endl;

however, i also have this, very long example:

Code:

#include <iostream>

#include <iomanip>



double RatioTrkToFullAP = 0.55105762217359591539022611232677;

cout << setflags(ios::fixed) <<setprecision(32) << RatioTrkToFullAP << endl;

which returns:

Code:

0.551057622173595884618180207326076924800872802734375

which, obviously is not the value that was input. (i had
to change the setprecision value to a larger value to
get the entire thiing to print out just to find out what
the system was doing. the above is the value i finally
found. it was truncated or rounded if i kept the original
value of 32 for setprecision.) it matches for the first
fiften spaces, then veers off.

so what's going on here, now?

thanks again, gradually getting through it with you help,
BabaG

osor	05-18-2008 01:06 PM

I do not think you understand the concept of floating point numbers. The rule-of-thumb is that a double-precision float will hold around 15 decimal digits. Almost any arithmetic you do will decrease the precision. The reasons for this are detailed many places (including on this forum).

If you want arbitrary precision arithmetic, look for a bignum library such as GMP.

babag

05-18-2008 02:18 PM

for now i'd just like to know why, since no operations
have been performed on the variable, it doesn't output
the value it was given. i don't know what to search for
on the forum to clarify this as, i expect, something as
simple as 'float' will return thousands of irrelevant
hits.

also, i've recently had trouble with the search on the
site. lately it's returned an error that the search
criteria was insufficient and must be at least three
words in length; this, even if i've entered whole
sentences.

thanks,
BabaG

dmail

05-18-2008 02:49 PM

Put simply, can you correctly store 1/3 in decimal? Well floating point numbers have the same sort of problem.
There are links at the bottom of the following thread for further reading.
http://www.linuxquestions.org/questi...light=IEEE+754

osor	05-18-2008 02:53 PM

Quote:

Originally Posted by babag (Post 3157215)

for now i'd just like to know why, since no operations
have been performed on the variable, it doesn't output
the value it was given.

Well, a double-precision float variable (i.e., “double”) is, if following IEEE 754, 64 bits wide. The first bit is the sign bit, the next eleven bits are for the exponent, and the remaining 52 bits are for the mantissa (and since this is normalized, the leading one is not actually stored, so there is a “hidden bit”).

The 15 or 16-digit rule of thumb comes from the limit on the mantissa. When you try to fit a wider number into smaller container, you must lose precision. In the C standard, the header file float.h contains the macro DBL_DIG which must be greater than or equal to 10, and represents “number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits”.

babag

05-18-2008 03:18 PM

thanks to you both. both clear and helpful!

BabaG

Dan04

05-18-2008 05:00 PM

Quote:

Originally Posted by osor (Post 3157240)

The most significant part of this being that numbers are stored in binary rather than in decimal. And that there are numbers that have terminating representations in decimal but not in binary. For example, decimal 0.2 = binary 0.0011 0011 0011 0011.... Just like dozenal 0.4 = decimal 0.333333333333...

All times are GMT -5. The time now is 09:54 AM.