run-time problems due to migration from x86 to x86_64

aryan1 · 01-25-2010, 06:01 AM

Hi All,

I recently upgraded OS to its 64-bit version, namely, Ubuntu 9.10 64-bit.

After that, I began to experience anomalies (missing output data, infinite loops, etc) when executing my C++ application, which was previously running without any problems on x86.

Even though there is no application crash, I suspect that some third-party libraries along with even my own sources have problems due to the incompatibilities in data type lengths between x86 and x86_64.

I am in a situation where debugging the sources to find out about the real cause of the problem is difficult...

I want to hear about your suggestions.

Can I build my application using GCC compiler options with x86 settings on a x86_64 platform ?

Even if I am allowed to do that, is it really the right thing to do ? What might be the negative side-effects ?

Thanks.

jf.argentino · 01-25-2010, 06:46 AM

Quote:

Even though there is no application crash, I suspect that some third-party libraries along with even my own sources have problems due to the incompatibilities in data type lengths between x86 and x86_64.

Quite sure this this the problem, assumption of the data type size and / or max value is a bug source when porting an application. When that kind of knowing is needed you _MUST_ use integer types defined in stdint.h (an std c requirement header).

Quote:

I am in a situation where debugging the sources to find out about the real cause of the problem is difficult...

Well, you can try a prayer to the C++ god, but I think (s)he's quite busy these days...

Quote:

Can I build my application using GCC compiler options with x86 settings on a x86_64 platform ?

AFAIK you'll need all the libraries your software use in 32bits version too. Depending on your distro, you can install some of them in a x86_64 system, they will stay in another path (/lib32, /usr/lib32... in FEDORA)

Quote:

Even if I am allowed to do that, is it really the right thing to do ?

IMHO, it's not, since you'll won't take advantage of the 64bits system. But worst, maybe one day the 32bits system will be obsoletes on regular computer, and then you'll face the same problem, and the most you're waiting to fix something, the harder it is to correct, and the less the new version is tested to discover potential new bugs...
This is an academic point of view. As I never play this kind of game, I'm not aware of "real" and immediate side-effects that could rise...

aryan1 · 01-25-2010, 07:04 AM

Quote:

Originally Posted by jf.argentino

When that kind of knowing is needed you _MUST_ use integer types defined in stdint.h (an std c requirement header)

One of the bugs, which was also caused by the migration to x86_64, that I recently fixed was due to an infinite loop where the loop condition was comparing an "unsigned int" (32-bit long on x86, 32-bit long on x86_64) to std::size_t (32-bit long on x86, 64-bit long on x86_64).

"unsigned int" is defined in stdint.h, however, it still caused me problem. Hence, I replaced that "unsigned int" variable with size_t, and it resolved my problem.

So, in this case, based on what you say, using size_t in sources is not always a good idea ? ...

jf.argentino · 01-25-2010, 07:51 AM

Quote:

"unsigned int" is defined in stdint.h

Are you sure? The only unsigned type I can found in stdint.h are uint8_t, uint16_t, uint32_t and uint64_t (you can add "least" and "fast" family but I never play with them)

Quote:

I replaced that "unsigned int" variable with size_t, and it resolved my problem.

This is a different problem: you need to specify the number of bits of a type (and then using stdint.h) when using ntohX and htonX functions for example, and I'm agree these functions names are not coherent with the fact that short and long sizes depend on the platform, it would be better to name them hton16, hton32... but that too late...
When you're using "size_t", you don't care of the _REAL_ size of the type behind, but you care about the fact that the type is usable for an offset / dimension on the platform you're using.

To resume:
you must use stdint.h types family when you need an absolute sized type (for exchanging over the network or with hardware)
you must use size_t, ptrdiff_t... when you need a platform size dependent type.

johnsfine · 01-25-2010, 07:55 AM

Quote:

Originally Posted by aryan1

After that, I began to experience anomalies (missing output data, infinite loops, etc) when executing my C++ application, which was previously running without any problems on x86.

In my experience, most of those come from coding tricks that assume an int and a pointer are the same size. Many programs use unions or casts to convert between ints and pointers.

Quote:

I am in a situation where debugging the sources to find out about the real cause of the problem is difficult.

It usually is for that kind of bug.

Quote:

Can I build my application using GCC compiler options with x86 settings on a x86_64 platform ?

Yes. Directly compiling 32 bit code from source should be no problem. But you may need to put a little extra effort into getting the right libraries.

Most of the 32 bit libraries you are likely to need are available in one of the packages designed as 32 bit library packages for 64 bit Debian systems. More obscure libraries you might need are only available as 32 bit packages. I don't know the best way to install a 32 bit Debian package on 64 bit Ubuntu (assuming name conflicts with some 64 bit package you also have), but it can be done.

Quote:

Even if I am allowed to do that, is it really the right thing to do ? What might be the negative side-effects ?

I think it is best in your situation. There might be some performance or capacity benefits you miss by keeping your code 32 bit. But there might be performance gained by keeping it 32 bit.

In a perfect world, finding and fixing the bad code that makes your application non portable is a more important task. In the real world, you need to budget your time.

Quote:

Originally Posted by aryan1

One of the bugs, which was also caused by the migration to x86_64, that I recently fixed was due to an infinite loop where the loop condition was comparing an "unsigned int" (32-bit long on x86, 32-bit long on x86_64) to std::size_t (32-bit long on x86, 64-bit long on x86_64).

More detail might help us understand the kind of bug to expect from whoever wrote that code. If the values fit in unsigned int, there should be nothing wrong with comparing to size_t.

I've found and fixed many bugs where someone stuffed a negative special value (error code etc.) into both an unsigned int and a size_t and then compared. The result is unequal because negative values don't really "fit" in unsigned int. I've even seen a few that stuffed a negative offset into an unsigned int and then added it to a size_t that always had a positive value big enough for the algebraic result to be positive. Try that and you'll see the result is correct on x86 but 2**32 too high on x86_64.

"unsigned int" is defined in stdint.h, however, it still caused me problem. Hence, I replaced that "unsigned int" variable with size_t, and it resolved my problem.

Quote:

So, in this case, based on what you say, using size_t in sources is not always a good idea ? ...

I'm not sure I understand what you're asking.

As I described above, it is generally safe to mix uses of unsigned int with size_t for true positive numbers that fit in unsigned int. It is not safe to mix them for error codes (std::string::npos etc.) or other tricks that stuff negative values into unsigned variables.

In each place you should not mix them, that means picking one or the other. I don't think size_t is either uniformly better or uniformly worse for all possible places that there is a reason not to mix them.

I think most language purists would lean toward using more std::size_t and less unsigned int because of these issues. For example, the std::string functions that might return npos are documented as returning size_t. If you put the function return in an unsigned int and then compare it to npos, it will never be equal to npos. That is just a bug of not storing the return value in the correct type.

In the inner loops where you really care about every little spec of performance, unsigned int is the best data type for x86_64, especially for use as an index into anything like an array or vector. Either int or std::size_t tends to generate slightly larger code that likely runs a little slower in key inner loops.

I even use uint to get the return value from all those std::string functions (because I'm in the habit of always coding for efficiency). The compiler and options I use will warn about (x==npos) when x was declared uint, and I change that to (x==(uint)npos). I can't say I really recommend copying that style. The efficiency gains are modest. I have some other reasons for coding that way that likely would not apply to you.

aryan1 · 01-25-2010, 09:11 AM

Quote:

Originally Posted by johnsfine

In my experience, most of those come from coding tricks that assume an int and a pointer are the same size. Many programs use unions or casts to convert between ints and pointers.

I agree that, at least, some of the problems that I am experiencing might be caused by this since I am casting from u_char* to C-style structs in different fragments of the code, and these structs might have members with varying sizes depending on the underlying platform, not absolute ones. To fix this, I will try to replace them by their equivalents in stdint.h

Quote:

I think it is best in your situation. There might be some performance or capacity benefits you miss by keeping your code 32 bit. But there might be performance gained by keeping it 32 bit.

I used -m32 option. However, for example, when linking my application sources to a static library that I wrote, I got the following error message:

Code:

/usr/bin/ld: skipping incompatible libclassreg.a when searching for -lclassreg
/usr/bin/ld: cannot find -lclassreg
collect2: ld returned 1 exit status

In the above code, libclassreg.a was compiled using g++ with -Wall -g -fPIC -m32 -W -c options, and my application was compiled using the same options as libclassreg.a, however, was linked against libclassreg.a using g++ without any 32-bit specific option.

Should I also add -m32 option to linker to link against libclassreg.a ?

Quote:

As I described above, it is generally safe to mix uses of unsigned int with size_t for true positive numbers that fit in unsigned int. It is not safe to mix them for error codes (std::string::npos etc.) or other tricks that stuff negative values into unsigned variables.

I agree that this is exactly what is happening. However, I got this thing (a mixture of them for std::string error codes) working in 32-bit OS...

johnsfine · 01-25-2010, 09:29 AM

Quote:

Originally Posted by aryan1

when linking my application sources to a static library that I wrote, I got the following error message:

Do you use a gcc command to link or an ld command?

I always use a gcc command to link, rather than a ld command, because it tends to be easier to get the options right and get it to find all the right libraries.

I do always use -m32 in the gcc command for linking 32 bit images on a 64 bit platform. I'm not sure whether it is always necessary or whether gcc and/or the linker can figure that out from the type of the first .o file.

Quote:

/usr/bin/ld: skipping incompatible libclassreg.a

I don't know whether that means it found a 64 bit one when looking for 32 bit or vice versa or what.

I assume libclassreg.a exists somewhere on your system as a 64 bit library. Are you sure it isn't finding that one and failing to find the 32 bit one you say you built?

Quote:

In the above code, libclassreg.a was compiled using g++ with -Wall -g -fPIC -m32 -W -c options,

Why -fPIC? Are you linking it into a .so file?

Quote:

I agree that this is exactly what is happening. However, I got this thing (a mixture of them for std::string error codes) working in 32-bit OS...

On 32 bit std::size_t and unsigned int are the same size so any mixture of them will work if either one of them would have worked unmixed.

If you found one bug comparing a uint to npos, did you then grep the entire source code for all occurrences of npos and glance at each to see if they were portable? That is the right way to attack these problems. Don't try nor expect to find every portability bug by debugging. When you find a portability bug by debugging do a text search of the source code for that same bug anywhere else it might appear.

aryan1 · 01-26-2010, 02:02 AM

Quote:

Originally Posted by johnsfine

Do you use a gcc command to link or an ld command?

I use gcc.

Quote:

I assume libclassreg.a exists somewhere on your system as a 64 bit library. Are you sure it isn't finding that one and failing to find the 32 bit one you say you built?

I don't think so. I use the options that I mentioned in my previous message to build the single object file which goes into libclassreg.a, and then use "ar" command with -cvq option to create the static library.

After doing this, the first shared object which tries to link against libclassreg.a, using linker options -shared -Wl,-soname, generates that "skipping incompatible..." error. I double-checked the path specified for libclassreg.a in that error message, and it is the path for 32-bit libclassreg.a library...(Before checking the path, I cleaned all object files along with static and dynamic libraries (.so files) from the source tree and rebuilt the application)

Quote:

Why -fPIC? Are you linking it into a .so file?

Yes.