How to use a float/double without using float/double keyword?

geewhan · 06-14-2012, 09:05 PM

I know this is weird to ask, but how do you set a variable as a decimal without using the keywords float/double?

I ask this because when I am compiling the Kernel I get an error when I use float/double in the tcp_vegas.c file.

These are the errors I am getting, if anyone knows how to go around this please let me know. Though like I said it is because I was using the float/double keywords; I used u32 (unsigned 32) to replace the float/double which got rid of the errors.

Code:

ERROR: "__addsf3" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__fixunssfsi" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__mulsf3" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__muldf3" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__floatsisf" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__adddf3" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__fixunsdfsi" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__floatsidf" [net/ipv4/tcp_vegas.ko] undefined
ERROR: "__subsf3" [net/ipv4/tcp_vegas.ko] undefined

So is there an algorithm, function, or a way to do a float/double without using those keywords?
An example

float a = 3.145678; //I do not want this
(some algorithm/function)-> a = 3.145678654; //This is somewhat I want

I hope this makes sense, if it doesn't I will try to explain it better.

Thanks

neonsignal · 06-16-2012, 07:47 AM

I'm not sure I properly understand the problem. I'm guessing you are on a platform such as an ARM without an FPU, and aren't planning to use a floating point library.

The issue of the calls to the floating point library doesn't arise at the point where the floating point constants are defined (since the cross-compiler will convert the floats to their stored representation at compile time), but where the values are being used. In other words, it isn't the float/double keywords that are the problem, it is the floating point operations being performed.

If you really need floating point values, you can make use of casts (specifically pointer casts, which will change the assumed type without changing the representation), but I'm not sure that is really what you want (ie, it might get rid of the immediate compiler errors, but it won't work at run time).

geewhan · 06-16-2012, 04:06 PM

Thank you guys for your help.

neonsignal:
This is for the Linux Kernel. I am really interested using the fixed point. My research partner and I have talked about this filter function; however, we are not sure how to implement this function. Though, the code below shows some ideas.

Also I researched that you can treat a u32 as a 16-bit integer and 16-bit fraction. Though how would you create this because my knowledge is u32 is an unsigned 32 bits of integer, but still interesting.

Code:

/*new "fixed" function -----------------------------------------*/global int cy; //comment: use global variable  cy .
global int x, int y;


int  filter(int x, int y) 
{
   cy = 10000*a*x + b*cy; // comment: cy = 10000*y; x is input;
   y = cy/10000;   //comment: 1>a >0; 1>b>0;
   return y;
}

/*----------------------------*/

jschiwal · 06-17-2012, 02:43 AM

Moved: This thread is more suitable in Linux-Kernel and has been moved accordingly to help your thread/question get the exposure it deserves.

neonsignal · 06-17-2012, 08:19 AM

Fixed point arithmetic means assuming the binary number has an implicit decimal point at some place (for example, in the middle of a 32 bit value). Another way to think about this is to imagine that all stored numbers have been scaled by a factor (for example 2^16). One can even make it more complicated, and scale various parameters differently, in order to maximize the precision of the result for a given word size, but I won't discuss that here.

The beauty of fixed point arithmetic is that additions and subtractions are unaffected; one can use ordinary integer operations. Multiplications and divisions become more complicated, because they have to be scaled before or after the operation. Ideally one would make use of processor instructions such as double precision multiplies (that take two single precision numbers and produce a double precision result that can then be scaled). But to do this in C is tricky since you have to understand casting and promotion well, and have some idea about how to help the compiler optimization, since the language doesn't directly support mixed precision arithmetic.

For example, the following code does a multiply of two 32 values with an implicit 16 bit decimal point in C. Note the 16 bit shift right at the end to put the implicit decimal point back in the correct place. If compiled with optimization on for an x86 architecture, it will reduce down beautifully to a 32x32->64 bit multiply and a double-register shift.

Code:

#include <inttypes.h>
inline uint32_t mul(uint32_t i, uint32_t j)
{
	return ((uint64_t) i* (uint64_t) j) >> 16;
}

Now to come back to your code; you are effectively trying to build an IIR filter. This is a simpler problem, particularly if you don't need the full 32 bit precision.

Basically you want to iteratively calculate yi+1 = a.xi + b.yi. Typically in code one scales y, so as to retain more precision; in your example, you scale y by 10000 (which is called cy), to give cy = 10000*a*x + b*cy. The a and b are expressed as fractions that usually total to 1; the closer a is to 1, the less effect the filter has.

What makes this an easier problem is that one can use the scaling to advantage so that one doesn't have to express the factors a and b as fractions. Instead of multiplying x by 10000*a, instead use a new parameter aa which is the fraction a scaled by 10000, and treat it as an integer; now giving cy = aa*x + b*cy. Similar, since cy is scaled by 10000, then b*cy is equivalent to b*10000*cy/10000. So instead of multiplying cy/10000 by b*10000, instead use a new parameter bb which is the fraction b scaled by 10000, and treat it as an integer; now giving cy = aa*x + bb*(cy/10000). The entire operation can now be done with integer arithmetic.

Ideally instead of 10000 one would choose a scaling factor that is a power of 2, so that the compiler can optimize the expensive divide into a shift. One can optimize it even further if a is a binary fraction (removing all the multiplies), but I won't go into that here. One has to be careful that cy does not overflow; this requires either the judicious use of double precision arithmetic, or simply use a scaling power that is half the length of the precision (eg, for 32 bits, use 65536 as the scaling factor, and keep all the parameters including x within 16 bits, assuming unsigned arithmetic).