LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-23-2012, 10:30 PM   #1
geewhan
LQ Newbie
 
Registered: Mar 2012
Posts: 16

Rep: Reputation: Disabled
How to use kernel_fpu_begin and kernel_fpu_end correctly?


I would like to know how to use the functions, kernel_fpu_begin and kernel_fpu_end, correctly.

I did try to google this, but the other forums were not clear enough for me to understand.

I did try this doing these functions, but I got couple of errors. I think it may have to do with the include header file.
May be it is the wrong one.

The code below is something I tried.
Any help is really appreciated thanks.

Code:
#include <asm/i387.h>

static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
{
	kernel_fpu_begin();

	struct tcp_sock *tp = tcp_sk(sk);
	struct vegas *vegas = inet_csk_ca(sk);
	int target_queue_size;
	double avgRTT_Mult, rtt_Mult; 
	
	target_queue_size = 100;
	avgRTT_Mult = 0.99609375; //(255/256)
	rtt_Mult    = 0.00390625; //(1/256)


	avgRTT = (avgRTT_Mult*avgRtt) + (rtt_Mult*rtt);

	kernel_fpu_end();	
}
 
Old 06-24-2012, 12:27 AM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947
Is there a reason you do not use avgRTT = (255 * avgRTT + rtt) >> 8U; for this?

If they are both unsigned 32-bit integers, and you need the full unsigned 32-bit range, you can use
Code:
avgRTT = (u32)(((((u64)avgRTT) << 8U) - (u64)avgRTT + (u64)rtt) >> 8U);
Above, << 8U is unsigned multiplication by 256, and >> 8U is unsigned division by 256, so the expression is essentially (256 * [I]avgRTT[I] - avgRTT + rtt)/256 but with explicit casting to make sure the expression cannot overflow, and using a bit shift to avoid explicit 64-bit multiplication and division. It should work fine on all 32-bit and 64-bit architectures.
 
Old 06-25-2012, 09:35 AM   #3
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,508

Rep: Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811Reputation: 1811
Quote:
Originally Posted by Nominal Animal View Post
using a bit shift to avoid explicit 64-bit multiplication and division.
I was under the impression that the compiler can be trusted to do this kind of optimization, but I just checked with gcc 4.5.2 (32bit), and it only avoids the division. The version with multiplication does use fewer instructions, perhaps gcc decided it would be faster?
 
Old 06-25-2012, 09:41 PM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947
Quote:
Originally Posted by ntubski View Post
I was under the impression that the compiler can be trusted to do this kind of optimization, but I just checked with gcc 4.5.2 (32bit), and it only avoids the division. The version with multiplication does use fewer instructions, perhaps gcc decided it would be faster?
In normal userspace application code I never "optimize" multiplications to shifts and additions/subtractions (since profiling never points to that kind of code to be a bottleneck). The compilers do a good enough job, and readability is so much more important there.

In kernel and microcontroller code this tends to matter, and we can use more complicated expressions, requiring more skill from developers.

On x86-64 (Athlon II X4 640), using gcc -W -Wall -O3 -fomit-frame-pointer, the two functions
Code:
#include <stdint-gcc.h>

uint32_t running_average_1(const uint32_t avg, const uint32_t val)
{
    return (uint32_t)( ( (((uint64_t)avg) << 8U) - (uint64_t)avg + (uint64_t)val ) >> 8U );
}

uint32_t running_average_2(const uint32_t avg, const uint32_t val)
{
    return (uint32_t)( ( ((uint64_t)avg * (uint64_t)255U) + (uint64_t)val ) / (uint64_t)256U );
}
compile to exact same code:
Code:
        movl    %edi, %edi
        movl    %esi, %eax
        movq    %rdi, %rdx
        salq    $8, %rdx
        subq    %rdi, %rdx
        addq    %rdx, %rax
        shrq    $8, %rax
        ret
Adding -m32 makes gcc-4.6.3 stupid; it compiles the equivalent only for the multiplication case. gcc-4.4 -m32 -W -Wall -O3 -fomit-frame-pointer compiles the shift case to an explicit shift:
Code:
running_average_1:
	pushl	%ebx
	movl	8(%esp), %ecx
	xorl	%ebx, %ebx
	movl	%ebx, %edx
	movl	%ecx, %eax
	shldl	$8, %eax, %edx
	sall	$8, %eax
	subl	%ecx, %eax
	movl	12(%esp), %ecx
	sbbl	%ebx, %edx
	xorl	%ebx, %ebx
	addl	%ecx, %eax
	adcl	%ebx, %edx
	shrdl	$8, %edx, %eax
	popl	%ebx
	ret
and the explicit multiplication to a multiplication:
Code:
running_average_2:
	subl	$20, %esp
	movl	$255, %ecx
	movl	24(%esp), %eax
	movl	%ebx, 12(%esp)
	movl	28(%esp), %ebx
	movl	%esi, 16(%esp)
	xorl	%esi, %esi
	mull	%ecx
	addl	%eax, %ebx
	adcl	%edx, %esi
	shrdl	$8, %esi, %ebx
	movl	16(%esp), %esi
	movl	%ebx, %eax
	movl	12(%esp), %ebx
	addl	$20, %esp
	ret
with both running in the exact same time (67 cycles including the function call overhead) for random 32-bit unsigned inputs. (Note that gcc-4.6.3 generates consistently one or two clock cycles slower code!)

On the other hand, compiling it using
Code:
avr-gcc-4.5.3 -mmcu=avr2 avg.c -O3 -S
yields the following assembly:
Code:
	.file	"avg.c"
__SREG__ = 0x3f
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__CCP__ = 0x34
__tmp_reg__ = 0
__zero_reg__ = 1
	.text
.global	overhead
	.type	overhead, @function
overhead:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
	ldi r22,lo8(0)
	ldi r23,hi8(0)
	ldi r24,hlo8(0)
	ldi r25,hhi8(0)
/* epilogue start */
	ret
	.size	overhead, .-overhead

.global	running_average_1
	.type	running_average_1, @function
running_average_1:
	push r4
	push r5
	push r6
	push r7
	push r8
	push r9
	push r10
	push r11
	push r12
	push r13
	push r14
	push r15
	push r16
	push r17
/* prologue: function */
/* frame size = 0 */
/* stack size = 14 */
.L__stack_usage = 14
	mov r4,r18
	mov r5,r19
	mov r6,r20
	mov r7,r21
	mov r8,r22
	mov r9,r23
	mov r10,r24
	mov r11,r25
	mov r18,r22
	mov r19,r9
	mov r20,r10
	mov r21,r11
	ldi r22,lo8(0)
	ldi r23,lo8(0)
	ldi r24,lo8(0)
	ldi r25,lo8(0)
	ldi r16,lo8(8)
	rcall __ashldi3
	mov r30,r18
	sub r30,r8
	ldi r26,lo8(1)
	cp r18,r30
	brlo .L3
	ldi r26,lo8(0)
.L3:
	mov r31,r19
	sub r31,r9
	ldi r18,lo8(1)
	cp r19,r31
	brlo .L4
	ldi r18,lo8(0)
.L4:
	mov r19,r31
	sub r19,r26
	mov r26,r19
	ldi r19,lo8(1)
	cp r31,r26
	brlo .L5
	ldi r19,lo8(0)
.L5:
	or r18,r19
	mov r27,r20
	sub r27,r10
	ldi r19,lo8(1)
	cp r20,r27
	brlo .L6
	ldi r19,lo8(0)
.L6:
	mov r31,r27
	sub r31,r18
	ldi r18,lo8(1)
	cp r27,r31
	brlo .L7
	ldi r18,lo8(0)
.L7:
	or r19,r18
	mov r20,r21
	sub r20,r11
	ldi r18,lo8(1)
	cp r21,r20
	brlo .L8
	ldi r18,lo8(0)
.L8:
	mov r15,r20
	sub r15,r19
	ldi r19,lo8(1)
	cp r20,r15
	brlo .L9
	ldi r19,lo8(0)
.L9:
	or r18,r19
	mov r16,r22
	sub r16,r18
	ldi r18,lo8(1)
	cp r22,r16
	brlo .L11
	ldi r18,lo8(0)
.L11:
	mov r17,r23
	sub r17,r18
	ldi r18,lo8(1)
	cp r23,r17
	brlo .L13
	ldi r18,lo8(0)
.L13:
	mov r27,r24
	sub r27,r18
	ldi r18,lo8(1)
	cp r24,r27
	brlo .L15
	ldi r18,lo8(0)
.L15:
	sub r25,r18
	mov r18,r30
	add r18,r4
	ldi r19,lo8(1)
	cp r18,r30
	brlo .L16
	ldi r19,lo8(0)
.L16:
	mov r24,r26
	add r24,r5
	ldi r20,lo8(1)
	cp r24,r26
	brlo .L17
	ldi r20,lo8(0)
.L17:
	add r19,r24
	ldi r21,lo8(1)
	cp r19,r24
	brlo .L18
	ldi r21,lo8(0)
.L18:
	or r20,r21
	mov r24,r31
	add r24,r6
	ldi r21,lo8(1)
	cp r24,r31
	brlo .L19
	ldi r21,lo8(0)
.L19:
	add r20,r24
	ldi r22,lo8(1)
	cp r20,r24
	brlo .L20
	ldi r22,lo8(0)
.L20:
	or r21,r22
	mov r24,r15
	add r24,r7
	ldi r23,lo8(1)
	cp r24,r15
	brlo .L21
	ldi r23,lo8(0)
.L21:
	add r21,r24
	ldi r22,lo8(1)
	cp r21,r24
	brlo .L22
	ldi r22,lo8(0)
.L22:
	or r22,r23
	add r22,r16
	ldi r23,lo8(1)
	cp r22,r16
	brlo .L24
	ldi r23,lo8(0)
.L24:
	add r23,r17
	ldi r24,lo8(1)
	cp r23,r17
	brlo .L26
	ldi r24,lo8(0)
.L26:
	add r24,r27
	ldi r30,lo8(1)
	cp r24,r27
	brlo .L28
	ldi r30,lo8(0)
.L28:
	add r25,r30
	ldi r16,lo8(8)
	rcall __lshrdi3
	mov r25,r21
	mov r22,r18
	mov r23,r19
	mov r24,r20
/* epilogue start */
	pop r17
	pop r16
	pop r15
	pop r14
	pop r13
	pop r12
	pop r11
	pop r10
	pop r9
	pop r8
	pop r7
	pop r6
	pop r5
	pop r4
	ret
	.size	running_average_1, .-running_average_1

.global	running_average_2
	.type	running_average_2, @function
running_average_2:
	push r6
	push r7
	push r8
	push r9
	push r10
	push r11
	push r12
	push r13
	push r14
	push r15
	push r16
	push r17
	push r29
	push r28
	in r28,__SP_L__
	in r29,__SP_H__
	sbiw r28,8
	in __tmp_reg__,__SREG__
	cli
	out __SP_H__,r29
	out __SREG__,__tmp_reg__
	out __SP_L__,r28
/* prologue: function */
/* frame size = 8 */
/* stack size = 22 */
.L__stack_usage = 22
	mov r10,r18
	mov r11,r19
	mov r12,r20
	mov r13,r21
	clr r14
	clr r15
	clr r16
	clr r17
	std Y+1,r18
	std Y+2,r11
	std Y+3,r12
	std Y+4,r13
	std Y+5,r14
	std Y+6,r15
	std Y+7,r16
	std Y+8,r17
	mov r6,r22
	mov r7,r23
	mov r8,r24
	mov r9,r25
	mov r18,r22
	mov r19,r7
	mov r20,r8
	mov r21,r9
	ldi r22,lo8(0)
	ldi r23,lo8(0)
	ldi r24,lo8(0)
	ldi r25,lo8(0)
	ldi r16,lo8(4)
	rcall __ashldi3
	mov r15,r18
	sub r15,r6
	ldi r31,lo8(1)
	cp r18,r15
	brlo .L31
	ldi r31,lo8(0)
.L31:
	mov r30,r19
	sub r30,r7
	ldi r18,lo8(1)
	cp r19,r30
	brlo .L32
	ldi r18,lo8(0)
.L32:
	mov r17,r30
	sub r17,r31
	ldi r19,lo8(1)
	cp r30,r17
	brlo .L33
	ldi r19,lo8(0)
.L33:
	or r18,r19
	mov r30,r20
	sub r30,r8
	ldi r19,lo8(1)
	cp r20,r30
	brlo .L34
	ldi r19,lo8(0)
.L34:
	mov r14,r30
	sub r14,r18
	ldi r18,lo8(1)
	cp r30,r14
	brlo .L35
	ldi r18,lo8(0)
.L35:
	or r19,r18
	mov r20,r21
	sub r20,r9
	ldi r18,lo8(1)
	cp r21,r20
	brlo .L36
	ldi r18,lo8(0)
.L36:
	mov r13,r20
	sub r13,r19
	ldi r19,lo8(1)
	cp r20,r13
	brlo .L37
	ldi r19,lo8(0)
.L37:
	or r18,r19
	mov r12,r22
	sub r12,r18
	ldi r18,lo8(1)
	cp r22,r12
	brlo .L39
	ldi r18,lo8(0)
.L39:
	mov r11,r23
	sub r11,r18
	ldi r18,lo8(1)
	cp r23,r11
	brlo .L41
	ldi r18,lo8(0)
.L41:
	mov r10,r24
	sub r10,r18
	ldi r18,lo8(1)
	cp r24,r10
	brlo .L43
	ldi r18,lo8(0)
.L43:
	mov r9,r25
	sub r9,r18
	mov r18,r15
	mov r19,r17
	mov r20,r14
	mov r21,r13
	mov r22,r12
	mov r23,r11
	mov r24,r10
	mov r25,r9
	ldi r16,lo8(4)
	rcall __ashldi3
	add r18,r15
	ldi r31,lo8(1)
	cp r18,r15
	brlo .L44
	ldi r31,lo8(0)
.L44:
	add r19,r17
	ldi r30,lo8(1)
	cp r19,r17
	brlo .L45
	ldi r30,lo8(0)
.L45:
	mov r15,r31
	add r15,r19
	ldi r31,lo8(1)
	cp r15,r19
	brlo .L46
	ldi r31,lo8(0)
.L46:
	or r30,r31
	add r20,r14
	ldi r27,lo8(1)
	cp r20,r14
	brlo .L47
	ldi r27,lo8(0)
.L47:
	mov r17,r30
	add r17,r20
	ldi r19,lo8(1)
	cp r17,r20
	brlo .L48
	ldi r19,lo8(0)
.L48:
	or r27,r19
	add r21,r13
	ldi r26,lo8(1)
	cp r21,r13
	brlo .L49
	ldi r26,lo8(0)
.L49:
	mov r13,r27
	add r13,r21
	ldi r19,lo8(1)
	cp r13,r21
	brlo .L50
	ldi r19,lo8(0)
.L50:
	or r26,r19
	add r22,r12
	ldi r30,lo8(1)
	cp r22,r12
	brlo .L51
	ldi r30,lo8(0)
.L51:
	mov r14,r26
	add r14,r22
	ldi r19,lo8(1)
	cp r14,r22
	brlo .L52
	ldi r19,lo8(0)
.L52:
	or r30,r19
	add r23,r11
	ldi r31,lo8(1)
	cp r23,r11
	brlo .L53
	ldi r31,lo8(0)
.L53:
	add r30,r23
	ldi r19,lo8(1)
	cp r30,r23
	brlo .L54
	ldi r19,lo8(0)
.L54:
	or r31,r19
	add r24,r10
	ldi r19,lo8(1)
	cp r24,r10
	brlo .L55
	ldi r19,lo8(0)
.L55:
	add r31,r24
	ldi r20,lo8(1)
	cp r31,r24
	brlo .L56
	ldi r20,lo8(0)
.L56:
	or r19,r20
	add r25,r9
	add r25,r19
	ldd r11,Y+1
	add r18,r11
	ldi r19,lo8(1)
	cp r18,r11
	brlo .L57
	ldi r19,lo8(0)
.L57:
	ldd r16,Y+2
	add r15,r16
	ldi r20,lo8(1)
	cp r15,r16
	brlo .L58
	ldi r20,lo8(0)
.L58:
	add r19,r15
	ldi r24,lo8(1)
	cp r19,r15
	brlo .L59
	ldi r24,lo8(0)
.L59:
	or r20,r24
	ldd r26,Y+3
	add r17,r26
	ldi r21,lo8(1)
	cp r17,r26
	brlo .L60
	ldi r21,lo8(0)
.L60:
	add r20,r17
	ldi r24,lo8(1)
	cp r20,r17
	brlo .L61
	ldi r24,lo8(0)
.L61:
	or r21,r24
	ldd r23,Y+4
	add r23,r13
	ldi r22,lo8(1)
	ldd r10,Y+4
	cp r23,r10
	brlo .L62
	ldi r22,lo8(0)
.L62:
	add r21,r23
	ldi r24,lo8(1)
	cp r21,r23
	brlo .L63
	ldi r24,lo8(0)
.L63:
	or r22,r24
	ldi r23,lo8(1)
	ldd r11,Y+5
	cp r14,r11
	brlo .L64
	ldi r23,lo8(0)
.L64:
	add r22,r14
	ldi r24,lo8(1)
	cp r22,r14
	brlo .L65
	ldi r24,lo8(0)
.L65:
	or r23,r24
	ldi r24,lo8(1)
	ldd r12,Y+6
	cp r30,r12
	brlo .L66
	ldi r24,lo8(0)
.L66:
	add r23,r30
	ldi r26,lo8(1)
	cp r23,r30
	brlo .L67
	ldi r26,lo8(0)
.L67:
	or r24,r26
	ldi r30,lo8(1)
	ldd r13,Y+7
	cp r31,r13
	brlo .L68
	ldi r30,lo8(0)
.L68:
	add r24,r31
	ldi r26,lo8(1)
	cp r24,r31
	brlo .L69
	ldi r26,lo8(0)
.L69:
	or r30,r26
	add r25,r30
	ldi r16,lo8(8)
	rcall __lshrdi3
	mov r25,r21
	mov r22,r18
	mov r23,r19
	mov r24,r20
/* epilogue start */
	adiw r28,8
	in __tmp_reg__,__SREG__
	cli
	out __SP_H__,r29
	out __SREG__,__tmp_reg__
	out __SP_L__,r28
	pop r28
	pop r29
	pop r17
	pop r16
	pop r15
	pop r14
	pop r13
	pop r12
	pop r11
	pop r10
	pop r9
	pop r8
	pop r7
	pop r6
	ret
	.size	running_average_2, .-running_average_2
Now, avr2 is an 8-bit instruction set, which should explain why the assembly is so long. Multi-register shifts are done in the __ashldi3()/__lshldi3() and __ashrdi3() gcc-built-in functions. Even without going into the details, the code with the explicit shifts (red) is shorter and simpler: just 20 conditional jumps. In comparison, the code with the explicit multiplication (blue) is a lot longer and more complex, something like 36 conditional jumps.

Using hand assembly on an 8-bit architecture, all you really need is the logical equivalent of
Code:
# avg[0] is the low byte of avg, avg[3] is the highest byte
# val[0] is the low byte of val, val[3] is the highest byte
# res[0] is the low byte of result, res[3] is the highest byte
set a0 = 0
set a1 = avg[0]
set a2 = avg[1]
set a3 = avg[2]
set a4 = avg[3]

add res[0] to a0
add res[1] and carry to a1
add res[2] and carry to a2
add res[3] and carry to a3
add 0 and carry to a4

subtract avg[0] from a0
subtract avg[1] and carry from a1
subtract avg[2] and carry from a2
subtract avg[3] and carry from a3
subtract 0 and carry from a4

set res[0] = a1
set res[1] = a2
set res[2] = a3
set res[3] = a4
where carry is the carry flag. All CPUs and microcontrollers do have a carry flag; addition and subtraction with or without carry are usually separate mnemonics. Using avr2 MCU, the above code takes about twenty lines or so, and is at least an order of magnitude faster than the C code.

It is such a drastic improvement compared to the C code, that I'm wondering if I should have used the explicit multiplication only after all: on any limited architectures the code is better written in hand assembly (perhaps inlined, in C source files).

To be honest, I've only ever seen the division case being handled stupidly by the compiler, never the multiplication case. (If you ever run into a reference to __udivdi3(), you'll know what I mean.) That is why I used the explicit shift. Now I think I probably should not have, since the multiplication code is much more readable, and just as good for the generic cases. On the difficult architectures like avr2, writing it in inline assembly gives you an order of magnitude simpler and faster code, so tweaking the little bits is insignificant in comparison.

In other words, a very good point, ntubski. I guess it was an ill-advised premature optimization on my part.
 
Old 06-25-2012, 09:52 PM   #5
geewhan
LQ Newbie
 
Registered: Mar 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
I have a question about the first question you have posted, Nominal Animal.

The code

Code:
avgRTT = (u32)(((((u64)avgRTT) << 8U) - (u64)avgRTT + (u64)rtt) >> 8U);
This only covers the integers right? Not floating numbers

since 255/256 = 0.99609375 and 1/256 = 0.00390625 how does an unsign bits make it to a float?

I understand that shifting left means multiple and shifting right is division, but it is just for integers correct?

Or I am wrong and I should feel really dumb for missing something here.

Thanks
 
Old 06-25-2012, 10:29 PM   #6
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 9,078
Blog Entries: 4

Rep: Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177Reputation: 3177
The floating-point hardware of the CPU is fast and reliable: if in your estimation you need to use it in kernel-level code, do so.

Here's why kernel_fpu_begin/end calls are necessary: because the operating system is "lazy" about storing and restoring the floating-point context. It's a slightly expensive thing to do, and ... gosh darn it ... in a useful number of cases, the system might "bounce" into kernel mode, do something (that, as it happens, doesn't involve any floating point at all), and then "bounce" right back to running the same thread/process as before. No floating-point work was done. Therefore, the floating-point context could simply have remained exactly as it was throughout. And so, it's a usefully beneficial optimization for this to be the case. "Floating point kernel code" happens to be the exception rather than the rule. But, if you need it, "just do it."

Last edited by sundialsvcs; 06-25-2012 at 10:31 PM.
 
Old 06-25-2012, 11:10 PM   #7
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947Reputation: 947
Quote:
Originally Posted by geewhan View Post
Code:
avgRTT = (u32)(((((u64)avgRTT) << 8U) - (u64)avgRTT + (u64)rtt) >> 8U);
This only covers the integers right?
It works for 32-bit unsigned integers, or for any fixed-point unsigned 32-bit type. You can use
Code:
avgRTT = (avgRTT * 255 + rtt) >> 8;
for 24-bit signed or unsigned integers if the variables are of u32/i32 type, and for 56-bit signed or unsigned integers if the variables are of u64/i64 type.

For example, if you know you need six decimal digits on the right side, i.e. x.yyyyyy, you can simply multiply the value by a million when storing it in the integer. In the 24-bit case, your unsigned range is [0, 16.777215], and signed range is [-8.388608, 8.388607]. In the 56-bit case, your unsigned range is [0, 72057594037.927935], and signed range [-36028797018.963968, 36028797018.963967]. To read a fixed-point value from a string, you read the two integers on either side of the decimal point, x.y, except only up to six initial digits for y. If you get fewer than six digits for y, append zeros (multiply by ten), until you used six. Then,
Code:
value = x * 1000000 + y;
gives you the fixed-point value. To output it, you can use (a variation of)
Code:
if (value < 0)
    printf("-%u.%06u", (unsigned int)(-value) / 1000000U
                       (unsigned int)(-value) % 1000000U);
else
    printf("%u.%06u", (unsigned int)(value) / 1000000U
                      (unsigned int)(value) % 1000000U);
You can use any multiplier you choose, even one that is only known at runtime. Powers of two are very common, but for times, powers of ten are much simpler. In the kernel, nanoseconds (nine decimals) seem to be used quite widely; the 56-bit variant gives you a 833-day range. Full 64 bits gives 580 years or so at nanosecond precision, but remember the shift needs the extra eight bits. Microseconds (six decimals) seem to be used in networking; the 24-bit variant gives a range of about 16 seconds. For milliseconds (three decimals), the 24-bit variant gives a range of about four hours forty minutes.

Parsing non-powers of ten fixed-point values gets a bit tedious, unless you accept some error in the least significant digit. In practice, I just parse as if the scaling was the next power of then, multiply by multiplier, and divide by the power of ten I used. It introduces a bit of rounding error, though. And often requires 64-bit division (by a power of ten), using __udivdi3(), if that says anything to you.

The reason I'd prefer you to avoid floating point is that such code is extremely difficult to get into upstream kernels. The first thing you'll be told is to convert the math to fixed point. If you refuse, your code will only run on a limited number of architectures -- a lot of the embedded ones have no hardware floating-point support at all, and just cannot use floating-point kernel code. If you work with kernel code, why not do it in a way that gives you the skills necessary to be a Linux kernel programmer? Anybody can add their own code to the Linux kernel, but not everybody can get their code accepted upstream.

Kernel coding is not about writing code that works, it is about writing code that works and is acceptable to the maintainers/community.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] using feof correctly in C josecolella Programming 3 11-12-2011 03:08 AM
[SOLVED] Wicd won't install correctly/run correctly (64-bit Slackware 13) bgraybr Slackware 3 04-01-2010 06:57 PM
how to use kernel_fpu_begin() function? lucs Programming 6 10-18-2005 07:57 AM
did I do this correctly shanenin Linux - Software 1 09-24-2003 05:31 PM
am i doing this correctly? dunkyb Linux - Networking 19 02-07-2003 02:52 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:23 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration