why fwprintf writes chars instead of wchars on Linux system
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
why fwprintf writes chars instead of wchars on Linux system
On linux system I noticed that 1 byte characters are written inside the file instead of the 4 bytes of the wchar_t. Actually I wanted to write a data in unicode char format in the output file instead of 1 byte char format.
Could you please through some light if anyone has resolved this kind of issue.
Note: I'm using a gcc compiler to compile the C code. But same code was working fine on Windows OS and got the result as expected.
The difference is not in the compiler, but in the way in which the locale is handled by the write to the file. On GNU/Linux systems you would typically have utf8 encoding rather than utf16.
You should probably first set the locale to a unicode one (use 'locale -a' to list the ones you have available). You don't need to open the file in binary mode.
If you need the file to be utf16 encoded rather than utf8 for compatibility reasons, you can either call the iconv function from the program, or use the iconv program:
Thanks for your input. My requirement is to read the UTF-16BE file format which has the Japanese unicode char filed in the data field, to process the input data file (ie. UTF-16BE) and to write a data again in UTF-16BE file format.
The issue here is I'm not able to write a data file in UTF-16BE under GNU/Linux and also I have checked the availability of unicode set under GNU/Linux.
ie. used the 'locale -a' command and got the following unicode set list from GNU/Linux.
en_US.iso88591
en_US.iso885915
en_US.utf8
Is there anyway to set the unicode UTF-16BE on GNU/Linux system?
Thanks!
Senthil
Quote:
Originally Posted by neonsignal
The difference is not in the compiler, but in the way in which the locale is handled by the write to the file. On GNU/Linux systems you would typically have utf8 encoding rather than utf16.
You should probably first set the locale to a unicode one (use 'locale -a' to list the ones you have available). You don't need to open the file in binary mode.
If you need the file to be utf16 encoded rather than utf8 for compatibility reasons, you can either call the iconv function from the program, or use the iconv program:
Is there anyway to set the unicode UTF-16BE on GNU/Linux system?
Microsoft Windows originally used UCS-2 (and now UTF-16) because of their early implementation of Unicode. GNU/Linux uses the UTF-8 alternative. If you wish to use Unicode encodings other than UTF-8 on Linux, you use conversion functions such as iconv, which will handle any encoding you can think of and more beside. Note that UTF-8 and UTF-16 can both be used to represent any of the Unicode code points, and are designed to be functionally equivalent.
If you are porting code from Windows, you also need to be careful of the wchar_t definition, since on gcc it is 32 bits (not 16 bits). See for example the output from this code:
Thanks for your detail explanation about unicode used in Windows and GNU Linux.
I'll give you little more details about my requirement.
The input data file format (i.e. UTF-16BE) should be like this.
infield1~infield2~infield3~Japanese unicode char (UTF-16BE with CSV of "~")
Would need to feed this data into C component running on GNU Linux and the output file format should be like this
outfield1~outfield2~outfield3~Japanese unicode char (UTF-16BE with CSV of "~")
The question here is how to retain the Japanese unicode char while writing into output file using the wide char library funtion in C on GNU Linux system?
Thanks!
Senthil
Quote:
Originally Posted by neonsignal
Microsoft Windows originally used UCS-2 (and now UTF-16) because of their early implementation of Unicode. GNU/Linux uses the UTF-8 alternative. If you wish to use Unicode encodings other than UTF-8 on Linux, you use conversion functions such as iconv, which will handle any encoding you can think of and more beside. Note that UTF-8 and UTF-16 can both be used to represent any of the Unicode code points, and are designed to be functionally equivalent.
If you are porting code from Windows, you also need to be careful of the wchar_t definition, since on gcc it is 32 bits (not 16 bits). See for example the output from this code:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.