Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
11-18-2009, 04:03 AM
|
#1
|
|
LQ Newbie
Registered: Nov 2009
Posts: 3
Rep:
|
why fwprintf writes chars instead of wchars on Linux system
On linux system I noticed that 1 byte characters are written inside the file instead of the 4 bytes of the wchar_t. Actually I wanted to write a data in unicode char format in the output file instead of 1 byte char format.
Could you please through some light if anyone has resolved this kind of issue.
Note: I'm using a gcc compiler to compile the C code. But same code was working fine on Windows OS and got the result as expected.
code:
#include <wchar.h>
#include <stdio.h>
int main( int argc, char *argv[])
{
wchar_t *bigString = L"1234567787955_-+:";
FILE *fp = fopen( "bigString.hex", "wb");
fwprintf( fp, L"%ls", bigString);
fclose( fp);
}
Last edited by senthilpeace; 11-18-2009 at 11:36 PM.
|
|
|
|
11-18-2009, 05:18 PM
|
#2
|
|
Senior Member
Registered: Sep 2009
Location: Washington U.S.
Distribution: Damn Small Linux, KateOs, M$ Ickdows Vista, My own OS
Posts: 2,136
Rep: 
|
4 byte chars are a HUGE waste of space (4x) and i assume that glibc thinks its not required.
thats 4000000000 different chars possable.
|
|
|
|
11-18-2009, 05:42 PM
|
#3
|
|
Senior Member
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 2,967
Rep: 
|
I'm pretty sure there was a thread about the same thing in the last year. I'm not sure what came of it.
Kevin Barry
|
|
|
|
11-19-2009, 08:18 PM
|
#4
|
|
Senior Member
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Squeeze (Fluxbox WM)
Posts: 1,357
|
The difference is not in the compiler, but in the way in which the locale is handled by the write to the file. On GNU/Linux systems you would typically have utf8 encoding rather than utf16.
You should probably first set the locale to a unicode one (use 'locale -a' to list the ones you have available). You don't need to open the file in binary mode.
Code:
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main( int argc, char *argv[])
{
setlocale( LC_ALL, "en_US.utf8");
const wchar_t *bigString = L"1234567787955_-+:";
FILE *fp = fopen( "bigString.hex", "w");
fwprintf( fp, L"%ls", bigString);
fclose( fp);
}
If you need the file to be utf16 encoded rather than utf8 for compatibility reasons, you can either call the iconv function from the program, or use the iconv program:
Code:
iconv -f utf-8 -t utf-16 bigString.hex
|
|
|
|
11-19-2009, 11:47 PM
|
#5
|
|
LQ Newbie
Registered: Nov 2009
Posts: 3
Original Poster
Rep:
|
Thanks for your input. My requirement is to read the UTF-16BE file format which has the Japanese unicode char filed in the data field, to process the input data file (ie. UTF-16BE) and to write a data again in UTF-16BE file format.
The issue here is I'm not able to write a data file in UTF-16BE under GNU/Linux and also I have checked the availability of unicode set under GNU/Linux.
ie. used the 'locale -a' command and got the following unicode set list from GNU/Linux.
en_US.iso88591
en_US.iso885915
en_US.utf8
Is there anyway to set the unicode UTF-16BE on GNU/Linux system?
Thanks!
Senthil
Quote:
Originally Posted by neonsignal
The difference is not in the compiler, but in the way in which the locale is handled by the write to the file. On GNU/Linux systems you would typically have utf8 encoding rather than utf16.
You should probably first set the locale to a unicode one (use 'locale -a' to list the ones you have available). You don't need to open the file in binary mode.
Code:
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main( int argc, char *argv[])
{
setlocale( LC_ALL, "en_US.utf8");
const wchar_t *bigString = L"1234567787955_-+:";
FILE *fp = fopen( "bigString.hex", "w");
fwprintf( fp, L"%ls", bigString);
fclose( fp);
}
If you need the file to be utf16 encoded rather than utf8 for compatibility reasons, you can either call the iconv function from the program, or use the iconv program:
Code:
iconv -f utf-8 -t utf-16 bigString.hex
|
|
|
|
|
11-20-2009, 01:13 AM
|
#6
|
|
Senior Member
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Squeeze (Fluxbox WM)
Posts: 1,357
|
Quote:
|
Is there anyway to set the unicode UTF-16BE on GNU/Linux system?
|
Microsoft Windows originally used UCS-2 (and now UTF-16) because of their early implementation of Unicode. GNU/Linux uses the UTF-8 alternative. If you wish to use Unicode encodings other than UTF-8 on Linux, you use conversion functions such as iconv, which will handle any encoding you can think of and more beside. Note that UTF-8 and UTF-16 can both be used to represent any of the Unicode code points, and are designed to be functionally equivalent.
If you are porting code from Windows, you also need to be careful of the wchar_t definition, since on gcc it is 32 bits (not 16 bits). See for example the output from this code:
Code:
#include <wchar.h>
#include <stdio.h>
#include <string.h>
int main( int argc, char *argv[])
{
const wchar_t *bigString = L"1234567787955_-+:";
FILE *fp = fopen( "bigString.hex", "wb");
fwrite( bigString, sizeof(wchar_t), wcslen(bigString), fp);
fclose( fp);
}
There are also wcrtomb and mbrtowc conversion functions, but again, beware of implementation differences between different systems.
Last edited by neonsignal; 11-20-2009 at 06:45 AM.
|
|
|
|
11-20-2009, 05:52 AM
|
#7
|
|
LQ Newbie
Registered: Nov 2009
Posts: 3
Original Poster
Rep:
|
Thanks for your detail explanation about unicode used in Windows and GNU Linux.
I'll give you little more details about my requirement.
The input data file format (i.e. UTF-16BE) should be like this.
infield1~infield2~infield3~Japanese unicode char (UTF-16BE with CSV of "~")
Would need to feed this data into C component running on GNU Linux and the output file format should be like this
outfield1~outfield2~outfield3~Japanese unicode char (UTF-16BE with CSV of "~")
The question here is how to retain the Japanese unicode char while writing into output file using the wide char library funtion in C on GNU Linux system?
Thanks!
Senthil
Quote:
Originally Posted by neonsignal
Microsoft Windows originally used UCS-2 (and now UTF-16) because of their early implementation of Unicode. GNU/Linux uses the UTF-8 alternative. If you wish to use Unicode encodings other than UTF-8 on Linux, you use conversion functions such as iconv, which will handle any encoding you can think of and more beside. Note that UTF-8 and UTF-16 can both be used to represent any of the Unicode code points, and are designed to be functionally equivalent.
If you are porting code from Windows, you also need to be careful of the wchar_t definition, since on gcc it is 32 bits (not 16 bits). See for example the output from this code:
Code:
#include <wchar.h>
#include <stdio.h>
#include <string.h>
int main( int argc, char *argv[])
{
const wchar_t *bigString = L"1234567787955_-+:";
FILE *fp = fopen( "bigString.hex", "wb");
fwrite( bigString, wcslen(bigString), sizeof(wchar_t), fp);
fclose( fp);
}
There are also wcrtomb and mbrtowc conversion functions, but again, beware of implementation differences between different systems.
|
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 01:17 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|