LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-18-2009, 04:03 AM   #1
senthilpeace
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Rep: Reputation: 0
Red face why fwprintf writes chars instead of wchars on Linux system


On linux system I noticed that 1 byte characters are written inside the file instead of the 4 bytes of the wchar_t. Actually I wanted to write a data in unicode char format in the output file instead of 1 byte char format.

Could you please through some light if anyone has resolved this kind of issue.

Note: I'm using a gcc compiler to compile the C code. But same code was working fine on Windows OS and got the result as expected.

code:

#include <wchar.h>
#include <stdio.h>

int main( int argc, char *argv[])
{
wchar_t *bigString = L"1234567787955_-+:";
FILE *fp = fopen( "bigString.hex", "wb");
fwprintf( fp, L"%ls", bigString);
fclose( fp);
}

Last edited by senthilpeace; 11-18-2009 at 11:36 PM.
 
Old 11-18-2009, 05:18 PM   #2
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,227

Rep: Reputation: 170Reputation: 170
4 byte chars are a HUGE waste of space (4x) and i assume that glibc thinks its not required.
thats 4000000000 different chars possable.
 
Old 11-18-2009, 05:42 PM   #3
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
I'm pretty sure there was a thread about the same thing in the last year. I'm not sure what came of it.
Kevin Barry
 
Old 11-19-2009, 08:18 PM   #4
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Wheezy (Fluxbox WM)
Posts: 1,363
Blog Entries: 52

Rep: Reputation: 353Reputation: 353Reputation: 353Reputation: 353
The difference is not in the compiler, but in the way in which the locale is handled by the write to the file. On GNU/Linux systems you would typically have utf8 encoding rather than utf16.

You should probably first set the locale to a unicode one (use 'locale -a' to list the ones you have available). You don't need to open the file in binary mode.
Code:
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main( int argc, char *argv[])
{
	setlocale( LC_ALL, "en_US.utf8");
	const wchar_t *bigString = L"1234567787955_-+:";
	FILE *fp = fopen( "bigString.hex", "w");
	fwprintf( fp, L"%ls", bigString);
	fclose( fp);
}
If you need the file to be utf16 encoded rather than utf8 for compatibility reasons, you can either call the iconv function from the program, or use the iconv program:
Code:
iconv -f utf-8 -t utf-16 bigString.hex
 
Old 11-19-2009, 11:47 PM   #5
senthilpeace
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Red face

Thanks for your input. My requirement is to read the UTF-16BE file format which has the Japanese unicode char filed in the data field, to process the input data file (ie. UTF-16BE) and to write a data again in UTF-16BE file format.

The issue here is I'm not able to write a data file in UTF-16BE under GNU/Linux and also I have checked the availability of unicode set under GNU/Linux.

ie. used the 'locale -a' command and got the following unicode set list from GNU/Linux.

en_US.iso88591
en_US.iso885915
en_US.utf8

Is there anyway to set the unicode UTF-16BE on GNU/Linux system?

Thanks!
Senthil





Quote:
Originally Posted by neonsignal View Post
The difference is not in the compiler, but in the way in which the locale is handled by the write to the file. On GNU/Linux systems you would typically have utf8 encoding rather than utf16.

You should probably first set the locale to a unicode one (use 'locale -a' to list the ones you have available). You don't need to open the file in binary mode.
Code:
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main( int argc, char *argv[])
{
	setlocale( LC_ALL, "en_US.utf8");
	const wchar_t *bigString = L"1234567787955_-+:";
	FILE *fp = fopen( "bigString.hex", "w");
	fwprintf( fp, L"%ls", bigString);
	fclose( fp);
}
If you need the file to be utf16 encoded rather than utf8 for compatibility reasons, you can either call the iconv function from the program, or use the iconv program:
Code:
iconv -f utf-8 -t utf-16 bigString.hex
 
Old 11-20-2009, 01:13 AM   #6
neonsignal
Senior Member
 
Registered: Jan 2005
Location: Melbourne, Australia
Distribution: Debian Wheezy (Fluxbox WM)
Posts: 1,363
Blog Entries: 52

Rep: Reputation: 353Reputation: 353Reputation: 353Reputation: 353
Quote:
Is there anyway to set the unicode UTF-16BE on GNU/Linux system?
Microsoft Windows originally used UCS-2 (and now UTF-16) because of their early implementation of Unicode. GNU/Linux uses the UTF-8 alternative. If you wish to use Unicode encodings other than UTF-8 on Linux, you use conversion functions such as iconv, which will handle any encoding you can think of and more beside. Note that UTF-8 and UTF-16 can both be used to represent any of the Unicode code points, and are designed to be functionally equivalent.

If you are porting code from Windows, you also need to be careful of the wchar_t definition, since on gcc it is 32 bits (not 16 bits). See for example the output from this code:
Code:
#include <wchar.h>
#include <stdio.h>
#include <string.h>
int main( int argc, char *argv[])
{
	const wchar_t *bigString = L"1234567787955_-+:";
	FILE *fp = fopen( "bigString.hex", "wb");
	fwrite( bigString, sizeof(wchar_t), wcslen(bigString), fp);
	fclose( fp);
}
There are also wcrtomb and mbrtowc conversion functions, but again, beware of implementation differences between different systems.

Last edited by neonsignal; 11-20-2009 at 06:45 AM.
 
Old 11-20-2009, 05:52 AM   #7
senthilpeace
LQ Newbie
 
Registered: Nov 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Thanks for your detail explanation about unicode used in Windows and GNU Linux.

I'll give you little more details about my requirement.

The input data file format (i.e. UTF-16BE) should be like this.

infield1~infield2~infield3~Japanese unicode char (UTF-16BE with CSV of "~")

Would need to feed this data into C component running on GNU Linux and the output file format should be like this

outfield1~outfield2~outfield3~Japanese unicode char (UTF-16BE with CSV of "~")

The question here is how to retain the Japanese unicode char while writing into output file using the wide char library funtion in C on GNU Linux system?

Thanks!
Senthil


Quote:
Originally Posted by neonsignal View Post
Microsoft Windows originally used UCS-2 (and now UTF-16) because of their early implementation of Unicode. GNU/Linux uses the UTF-8 alternative. If you wish to use Unicode encodings other than UTF-8 on Linux, you use conversion functions such as iconv, which will handle any encoding you can think of and more beside. Note that UTF-8 and UTF-16 can both be used to represent any of the Unicode code points, and are designed to be functionally equivalent.

If you are porting code from Windows, you also need to be careful of the wchar_t definition, since on gcc it is 32 bits (not 16 bits). See for example the output from this code:
Code:
#include <wchar.h>
#include <stdio.h>
#include <string.h>
int main( int argc, char *argv[])
{
	const wchar_t *bigString = L"1234567787955_-+:";
	FILE *fp = fopen( "bigString.hex", "wb");
	fwrite( bigString, wcslen(bigString), sizeof(wchar_t), fp);
	fclose( fp);
}
There are also wcrtomb and mbrtowc conversion functions, but again, beware of implementation differences between different systems.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Who writes Linux? (And how you can too!) LXer Syndicated Linux News 0 08-26-2008 10:20 AM
why fwprintf writes chars instead of wchars? cicorino Programming 6 02-19-2008 08:20 AM
How to get ascii value (decimal ) of chars in linux? dreams Linux - General 8 01-27-2006 07:43 AM
System hangs on FDD writes thelandrew Linux - Newbie 2 03-04-2004 09:21 PM
German umlaute (special chars) in file system steltner Linux - General 5 10-27-2003 02:07 PM


All times are GMT -5. The time now is 11:20 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration