help: wide character ?inconsistencies?

maxreason · 01-27-2008, 08:28 PM

I am porting a large 3D simulation/graphics engine from windoze to fedora8 linux (the 32-bit version now, 64-bit thereafter) via the eclipse IDE (gcc, I presume).

All this code (about 43 zillion lines) is written with a wide character data-type called "cXX" in the code. The cXX data-type is a typedef to either "c16" or "c32" depending on the size of the OS wide character (16-bit or 32-bit wide characters), where "c16" is typedef for "unsigned short" and "c32" is typedef for "unsigned long" (but can tolerate "unsigned int" or "unsigned long int"). The only thing the code requires is --- the wide character type must be unsigned. I mean, after all, who wants negative characters? And what [reasonable] purpose does dividing the range of characters into positive and negative ranges serve, after all? At least that was probably my thinking long ago in a galaxy far, far away when I wrote the code and made this assumption. And boy did I make that assumption in spades!

Which leads me to the current question.

The normal linux system libraries seem to typedef the wide character (wchar_t) into a 32-bit *signed* integer ("signed long"). On the other hand, there is another data-type that I see near wchar_t in many of the headers called wint_t, which seems to typedef to a 32-bit unsigned integer ("unsigned long") - though maybe that is not even a character (not sure yet).

My program adopts zero "high-level packages", so on windoze it adopts only normal OpenGL functions, plus WGL functions to create OpenGL compatible windows and contexts. On linux it will adopt only normal OpenGL, GLX and xlib functions.

My problem is this. While the linux system functions seem to assume wchar_t is a 32-bit signed integer, the /usr/include/X11/xlib file contains a typedef of wchar_t to "unsigned long". In ***either*** case, my program *must* keep its wide character type as an unsigned integer. My code much prefers a 32-bit unsigned integer type, but works on windoze with a 16-bit unsigned type.

I cannot find any gcc compiler switch to force wide characters to be unsigned, except one for making WINE applications, which forces 16-bit unsigned. However, besides being revolting (turning down the uniform 32-bit wide character support linux function libraries provide), it would make my program non-interoperable with the bulk of linux libraries, and linux applications that wish to adopt or extend my package.

I cannot quite understand how xlib can conflict with standard function libraries on linux. It makes a mess, I would think.

I seem to be "missing something" here. It should be possible for separate packages (that are not statically linked) to consider their 32-bit characters to have opposite signs --- without causing each other any problems in execution. OTOH, obviously any statically linked code must satisfy compiler type-checking happy to compile into an executable (not sure whether any of the OpenGL or xlib are static linked).

What I do know is this. My code contains zillions of lines that assume and depend upon its wide character being unsigned to perform its computations and function correctly. And I certainly prefer not to go through and add several zillion casts to the code everywhere it calls any system or library function (that may or may not assume a signed character type).

What are my options? The first answer (if anyone has it) is - how can I force the linux function libraries to assume the wchar_t wide-character data-type is one of the 32-bit unsigned integer data-types? Any other answers?

I am not used to "faking out" system software or standard libraries, but in this case, if anyone can tell me how to let my wide character be u32 in my code, while letting system/library functions think my wide character/string arguments are s32, that will be just fine with me! How can I do that --- given the fact the gcc compiler amazingly does not [seem to] recognize a compiler switch to force 32-bit wide characters to unsigned (like they and everyone else does for 8-bit and 16-bit wide characters). What a bummer. :-(

rubadub · 01-28-2008, 03:13 PM

to help alter all the zillions of lines of code, have a look at awk and grep

Then look at this and this, but maybe you should just do your own typedef for all you variables in a single header. This could then simply be regenerated for whatever compiler it's being compiled upon. Are you totally ANSI compliant?

maxreason · 01-28-2008, 09:27 PM

Quote:

Originally Posted by rubadub

to help alter all the zillions of lines of code, have a look at awk and grep.

Then look at this and this, but maybe you should just do your own typedef for all you variables in a single header. This could then simply be regenerated for whatever compiler it's being compiled upon. Are you totally ANSI compliant?

Thanks. Yes, I've seen those references before, and dozens of others - not counting all the header files I've looked through (though the combinatorial explosion of conditional [pre-processor] statements often prevents me from being certain which of many situations apply).

I have no idea whether my code is ANSI compliant! Perhaps I should find out, and probably I should order a couple good/simple/concise C and C++ books that distinguish what ANSI compliance is. Actually, I haven't had any C or C++ books lying around for many years (while I wrote those zillion of lines of code).

However, I am very thoughtful and self-conscious about the architectures of my applications, and the code I write. And I adhere to simple, straightforward (but hightly efficient) practices.

The fact that I designed and implemented programming languages/compilers/IDEs in the past is a double-edge sword, I suppose. It makes me notice, cringe-at, and mostly avoid the most horrific aspects of the design of the C (and C++) languages. OTOH, who knows how close my modus-operandi is to ANSI compliant. And frankly, you'll need to tell me what are the practical consequences of ANSI compliance, because I have no idea! All I know is, the state of system software is utterly horrific (and yes, much worse on windoze that linux). So, if *this* is "compliant", my response would be "who cares?" and "how would that help?".

Last night I read a few dozen emails between [what seemed to be] the implementors of [some of the] C/C++ compilers for linux (including gcc), including discussions of how to choose whether wchar_t would be signed or unsigned. Regardless of the final results (which I still cannot sort out), the considerations taken into account, and the thinking processes involved were mostly horrible! :-(

It appears I have two general choices. (1): keep the code in my applications clean, consistent and mostly free of endless useless syntax - but then I must write my own function libraries that hide the endless craziness and inconsistencies of common/system function libraries. (2): clutter my code with garbage (endless #if/#ifdef/#ifndef blocks, casts, etc) to make my code work with whatever variants I can *guess* may exist in different implementations. Usually I choose door #1.

It is absurd to make character data-types signed, but after reading those old forum conversations, I can understand why just about any result is possible (the issues they considered were almost ALL utterly irrelevant side-issues like "who did what when and where").

I suspect ANSI compliant for C and C++ are quite different. My code contains no classes, but I usually compile with the C++ compiler to take advantage of a few actual improvements and conveniences in C++ syntax. Unfortunately, it is looking like wchar_t might have become a C++ "built-in data-type" that cannot be changed or over-ridden (as I tried to do in my code last night). Amazingly, however, it does appear that some C++ compilers make wchar_t a signed integer, while others make wchar_t an unsigned integer. Sheesh! If this is what "standards" and "compliance" does, I want none of it. :-(

At this point in time, C++ has become a gigantic accumulation of horrible decisions made by different people along the way (implemented differently here-vs-there), and generally on the basis of arbitrary side-issues, and rarely careful thought. The only salvation at all is - the originators of C were far more thoughtful and careful than those who followed.

Thanks for your comments.

rubadub · 01-29-2008, 11:44 PM

Quote:

'The C Programming Language' by Brian Kernigham and Dennis Ritchie (Prentice Hall, 1978)

I believe that Dennis guy hitched a ride along with that Arthur Dent character. Personally I think it's worth buying just for the paper it's printed on, but I started out in graphics and like to draw on soggy toilet paper.

It's interesting that you complain about the state of affairs but then state you don't care about adhering to standards, so therefore with relation to the following quote, I need to ask:

Quote:

The fact that I designed and implemented programming languages/compilers/...

were these compliant?

Since you don't even consider using standards why do you just define what you want and use it?

All the books on games programming I read years ago all basically had a header file or two which used loads of typedefs to define the variable types it intended to use. This meant that only that file needed to be updated to suit the environment.