c++ sized type (word)

PatrickNew · 02-19-2008, 11:14 AM

Okay, so I know that we cannot rely on 'int' to have any particular size. If you need it to be exactly 32 bits, use int_32_t (or the like).

Here's my question - what if I need a type whose size is exactly equal to the word size of the processor? On a 32-bit machine, it should be 32 bits, 64 on 64-bit, 8 bits on a little PIC, etc.

Is 'int' guaranteed to be the word size? I know it usually works out that way, but I'm really looking for portability.

osor · 02-19-2008, 01:53 PM

Quote:

Originally Posted by PatrickNew

Here's my question - what if I need a type whose size is exactly equal to the word size of the processor? On a 32-bit machine, it should be 32 bits, 64 on 64-bit, 8 bits on a little PIC, etc.

Word size (at least in this context) is generally the sizeof(void*). By the C standard, the types:

Code:

intptr_t

and

Code:

uintptr_t

Satisfy should this requirement. Notice that these types are optional by the C99 standard, required by the POSIX standard, used by the Windows API (so presumably required by Win32), but not in the current C++ standard. They will, by most predictions be in C++0x (the next C++ standard).

ta0kira · 02-19-2008, 02:58 PM

AFAIK, that will be based on the kernel's pointer size, meaning a 32-bit kernel on a 64-bit machine will have a 32-bit pointer size.
ta0kira

tuxdev · 02-19-2008, 03:02 PM

That's exactly what int is supposed to mean, the native CPU word size. I'm not sure, but x86_64 uses a 32-bit int for backwards compatibility, but in that case you could kind of see that processor as having two native word sizes, for some warped definition of "native".

PatrickNew · 02-19-2008, 03:19 PM

Quote:

Originally Posted by tuxdev

That's exactly what int is supposed to mean, the native CPU word size.

Got it, thanks to all. I knew that held in general, but I didn't know if there was any guarantee.

ta0kira · 02-19-2008, 03:41 PM

Quote:

Originally Posted by tuxdev

That's exactly what int is supposed to mean, the native CPU word size. I'm not sure, but x86_64 uses a 32-bit int for backwards compatibility, but in that case you could kind of see that processor as having two native word sizes, for some warped definition of "native".

Are you saying a 32-bit compiler on a 64-bit machine will give you a 64-bit int? I use 32-bit Slack 12 on AMD 64 and I have a pointer size of 32.

Code:

#include <stdio.h>
#include <inttypes.h>

int main()
{ fprintf(stderr, "%i %i %i\n", sizeof(int), sizeof(void*), sizeof(intptr_t)); }

They make it that way so a program can run on both, but object sizes have to be fixed at compile time.
ta0kira

PatrickNew · 02-19-2008, 03:57 PM

As I understand the distinction, the x86 / x86_64 is a special case, because the designers of x86_64 intentionally made it backwards compatible. If my understanding is correct, one can run (on x86_64 hardware) a 32-bit program on a 32-bit OS, a 32-bit program on a 64-bit OS, and a 64-bit program on a 64-bit OS. If the OS is 32 bit, then 32 bits is indeed the size I want.

Essentially, what I really want is the largest size such that I can be relatively certain that there is a pretty direct mapping between my code and assembly. I'm trying to move data around memory as fast as I can. If I use a type too small, the compiler is doing a bunch of work to adjust the size of my variables to match word size/alignment/etc. If it's too big, the compiler will generate code to split it into two smaller types, which is no good either. I would just write it in assembly, except that I would like it to be portable.

EDIT:
And yes, I know about memcpy(), memmove(), and friends. I'm doing some pretty low-level stuff, and I need a bit more flexibility, such as the ability to bail out midway, etc.

osor · 02-19-2008, 04:28 PM

Quote:

Originally Posted by ta0kira

Are you saying a 32-bit compiler on a 64-bit machine will give you a 64-bit int?

It depends on the arch. If you have an ILP64 arch, then yes, but for an LP64 arch, no (this is by definition what ILP and LP mean). On *nix, x86-64 is LP64, but on windows x86-64 is LLP64 (in which case both int and long are 32 bits, but long long is 64 bits).

Quote:

Originally Posted by PatrickNew

As I understand the distinction, the x86 / x86_64 is a special case, because the designers of x86_64 intentionally made it backwards compatible. If my understanding is correct, one can run (on x86_64 hardware) a 32-bit program on a 32-bit OS, a 32-bit program on a 64-bit OS, and a 64-bit program on a 64-bit OS.

I think this is true of other arches such as ppc64 vs. ppc and sparc64 vs. sparc32.

Also, there are other things to take into account besides pointer size. For example, some arches have vector ops which allow for operations on two to four intptr_t’s at a time (e.g., SSE for x86 and altivec for ppc).

osor · 02-19-2008, 04:52 PM

Btw, I forgot to say that most modern 64-bit Unix arches use the LP64 model (this includes sparc64, ia64, ppc64, ALPHA, HPPA, and x86-64). The only ILP64 machines I can think of are Cray-variants.

Dan04 · 02-20-2008, 12:13 AM

Quote:

Originally Posted by PatrickNew

As I understand the distinction, the x86 / x86_64 is a special case, because the designers of x86_64 intentionally made it backwards compatible.

There's also the fact that

char = 8 bit
short = 16 bit
int = 32 bit
long = 64 bit

gives a nice neat one-to-one correspondence between integer sizes and C keywords. If int were 64-bits, then you'd need a name for int32_t: "short" would work, but then what would you call int16_t? "short short"?

PatrickNew · 02-20-2008, 12:25 AM

Quote:

Originally Posted by Dan04

There's also the fact that

char = 8 bit
short = 16 bit
int = 32 bit
long = 64 bit

Maybe on your machine, but a quick test program assures me that on mine, long is 32 bit. Now, a long long is guaranteed to be 64 bits, but none of the others are guaranteed.

Here's a trivia question for anyone who knows. char is not required to be 8 bits, is it currently (within the last 15 years) implemented as anything else?

ta0kira · 02-20-2008, 03:40 AM

I believe it's been implemented as 7 bits. AFAIK unsigned char needs to be 8 bits, at least in C++. Could be wrong.
ta0kira

dmail · 02-20-2008, 04:46 AM

Quote:

Originally Posted by ta0kira

I believe it's been implemented as 7 bits. AFAIK unsigned char needs to be 8 bits, at least in C++. Could be wrong.
ta0kira

signed char, unsigned char and char all have a sizeof one defined in the C++ standard.
edit:
And just for confirmation

Quote:

The sizeof operator yields the number of bytes in the object representation of its operand. The operand
is either an expression, which is not evaluated, or a parenthesized type-id. The sizeof operator shall not
be applied to an expression that has function or incomplete type, or to an enumeration type before all its
enumerators have been declared, or to the parenthesized name of such types, or to an lvalue that designates
a bit-field. sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1; the
result of sizeof applied to any other fundamental type (3.9.1) is implementation-defined. [Note: in particular,
sizeof(bool) and sizeof(wchar_t) are implementation-defined.69) ] [Note: See 1.7 for
the definition of byte and 3.9 for the definition of object representation. ]

osor · 02-20-2008, 10:45 AM

Quote:

Originally Posted by PatrickNew

Maybe on your machine, but a quick test program assures me that on mine, long is 32 bit. Now, a long long is guaranteed to be 64 bits, but none of the others are guaranteed.

I don’t think Dan meant that was unconditionally true. He meant that if it were true (i.e., an LP64 arch), then there would be a nice 1-1 correspondence between standard C integer types and common integer sizes.

Btw, long long is guaranteed to be at least 64 bits.