c++ - how to avoid truncating or rounding a double numerical value
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I am not really familiar with "cout," but I think the truncation is probably just happening when you print. Internally it probably has the proper precision. Try adjusting the format on cout or use fprintf formatted for sufficient precision.
which, obviously is not the value that was input. (i had
to change the setprecision value to a larger value to
get the entire thiing to print out just to find out what
the system was doing. the above is the value i finally
found. it was truncated or rounded if i kept the original
value of 32 for setprecision.) it matches for the first
fiften spaces, then veers off.
so what's going on here, now?
thanks again, gradually getting through it with you help,
BabaG
I do not think you understand the concept of floating point numbers. The rule-of-thumb is that a double-precision float will hold around 15 decimal digits. Almost any arithmetic you do will decrease the precision. The reasons for this are detailed many places (including on this forum).
If you want arbitrary precision arithmetic, look for a bignum library such as GMP.
for now i'd just like to know why, since no operations
have been performed on the variable, it doesn't output
the value it was given. i don't know what to search for
on the forum to clarify this as, i expect, something as
simple as 'float' will return thousands of irrelevant
hits.
also, i've recently had trouble with the search on the
site. lately it's returned an error that the search
criteria was insufficient and must be at least three
words in length; this, even if i've entered whole
sentences.
Put simply, can you correctly store 1/3 in decimal? Well floating point numbers have the same sort of problem.
There are links at the bottom of the following thread for further reading. http://www.linuxquestions.org/questi...light=IEEE+754
for now i'd just like to know why, since no operations
have been performed on the variable, it doesn't output
the value it was given.
Well, a double-precision float variable (i.e., “double”) is, if following IEEE 754, 64 bits wide. The first bit is the sign bit, the next eleven bits are for the exponent, and the remaining 52 bits are for the mantissa (and since this is normalized, the leading one is not actually stored, so there is a “hidden bit”).
The 15 or 16-digit rule of thumb comes from the limit on the mantissa. When you try to fit a wider number into smaller container, you must lose precision. In the C standard, the header file float.h contains the macro DBL_DIG which must be greater than or equal to 10, and represents “number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits”.
Well, a double-precision float variable (i.e., “double”) is, if following IEEE 754, 64 bits wide. The first bit is the sign bit, the next eleven bits are for the exponent, and the remaining 52 bits are for the mantissa (and since this is normalized, the leading one is not actually stored, so there is a “hidden bit”).
The most significant part of this being that numbers are stored in binary rather than in decimal. And that there are numbers that have terminating representations in decimal but not in binary. For example, decimal 0.2 = binary 0.0011 0011 0011 0011.... Just like dozenal 0.4 = decimal 0.333333333333...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.