LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 02-15-2008, 05:33 AM   #1
cicorino
Member
 
Registered: May 2006
Location: Italy
Distribution: Slackware, Slackware64
Posts: 31

Rep: Reputation: 16
solved: why fwprintf writes chars instead of wchars?


Hello good programmers, I'm testing wcs functions
on linux (slackware 12) and strangely my first test fails.
The code when compiled and executed on windows fills
bigString.hex by the 16bit wchars, while on linux
I only find 1 byte characters inside the hex file
instead of the 4 bytes of the wchar_t.
I tried many combinations and also tried to use fwide.
Can you help me to find what am I mistaking, please?

code:

#include <wchar.h>
#include <stdio.h>

int main( int argc, char *argv[])
{
wchar_t *bigString = L"1234567787955_-+:";
FILE *fp = fopen( "bigString.hex", "wb");
fwprintf( fp, L"%ls", bigString);
fclose( fp);
}

Last edited by cicorino; 02-19-2008 at 03:09 AM. Reason: solved
 
Old 02-16-2008, 02:36 PM   #2
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: Slackware64 13.37, Kubuntu 10.04
Posts: 2,944

Rep: Reputation: Disabled
Is your editor opening the file automatically with utf16, making it look the same?
ta0kira
 
Old 02-17-2008, 05:51 PM   #3
cicorino
Member
 
Registered: May 2006
Location: Italy
Distribution: Slackware, Slackware64
Posts: 31

Original Poster
Rep: Reputation: 16
Unluckily not, I'm using khexedit to view the result.
I'm a bit astonished, because the test code seems
correct to me (and I get the same 8bit behaviour at home).
This problem is setting me in great trouble because
I'm the only one to work on the porting of a large
codebase and I find myself with no clues in front
of such a 'simple' problem...and windows programmers
around here are obviously not helpful.
I still hope in some missing defines to switch on the
full wchar_t capabilities.
 
Old 02-18-2008, 04:33 AM   #4
cicorino
Member
 
Registered: May 2006
Location: Italy
Distribution: Slackware, Slackware64
Posts: 31

Original Poster
Rep: Reputation: 16
even worst...

The output of the next test program is:
test1: fwprintf wrote 16 bytes
test1: fwrite error: return code is 0
test2: fwrite wrote 64 bytes
test2: fwprintf error: return code is -1

It looks like I cannot mix fwprintf and fwrite.
If I try to use fwide it doesn't fail and it
always let the fwrite to fail.

I'm really lost. I only can guess that probably
the fwprintf does its work internally by the wchar_t
as expected, but when it comes to write inside the file,
it makes an UTF8 conversion.

Code:
#include <cwchar>
#include <cstring>
#include <cstdlib>
#include <cstdio>
#include <clocale>

int main( int argc, char *argv[])
{
 wchar_t *bigString = L"12345=+896jkgr/'";
 wchar_t writtenString[ 120];
 FILE    *fp;
 int     numberOfBytes;

// (0) setup
 setlocale( LC_ALL, "");

 fp = fopen( "bigString.hex", "wb");
 if( fp == NULL )
 { fprintf( stdout, "test1: fopen failed\n");
   return 1;
 }

// (1) test file write
 numberOfBytes = fwprintf( fp, L"%ls", bigString);
 if( numberOfBytes > 0 )
 { fprintf( stdout, "test1: fwprintf wrote %d bytes\n", numberOfBytes);
 }
 else
 { fprintf( stdout, "test1: fwprintf error: return code is %d\n", numberOfBytes);
 }

// (2) test string write
 swprintf( writtenString, 120, L"%ls", bigString);
 numberOfBytes = fwrite( writtenString, 1, wcslen( writtenString) * sizeof( wchar_t), fp);
 if( numberOfBytes > 0 )
 { fprintf( stdout, "test1: fwrite wrote %d bytes\n", numberOfBytes);
 }
 else
 { fprintf( stdout, "test1: fwrite error: return code is %d\n", numberOfBytes);
 }

 fclose( fp);

// (3) invert the tested functions
 fp = fopen( "bigString.hex", "ab");
 if( fp == NULL )
 { fprintf( stdout, "test2: fopen failed\n");
   return 1;
 }

// (4) test string write
 swprintf( writtenString, 120, L"%ls", bigString);
 numberOfBytes = fwrite( writtenString, 1, wcslen( writtenString) * sizeof( wchar_t), fp);
 if( numberOfBytes > 0 )
 { fprintf( stdout, "test2: fwrite wrote %d bytes\n", numberOfBytes);
 }
 else
 { fprintf( stdout, "test2: fwrite error: return code is %d\n", numberOfBytes);
 }

// (5) test file write
 numberOfBytes = fwprintf( fp, L"%ls", bigString);
 if( numberOfBytes > 0 )
 { fprintf( stdout, "test2: fwprintf wrote %d bytes\n", numberOfBytes);
 }
 else
 { fprintf( stdout, "test2: fwprintf error: return code is %d\n", numberOfBytes);
 }

 fclose( fp);
 return 0;
}
 
Old 02-18-2008, 12:08 PM   #5
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 64
Quote:
Originally Posted by cicorino View Post
I'm really lost. I only can guess that probably
the fwprintf does its work internally by the wchar_t
as expected, but when it comes to write inside the file,
it makes an UTF8 conversion.
This is expected behavior. The wide character string (a string of 4-byte values) will be converted to a string of mbcs characters depending on locale. On windows, the locale assumes UTF-16, in which case you get 2 bytes for most characters (especially the ones you picked). On most linux machines, the default locale is UTF-8, in which case you get one byte for ASCII characters (the ones you picked were all ASCII).

If you absolutely want UTF-16, change locale (non-thread-safe) or use iconv.
 
Old 02-19-2008, 03:07 AM   #6
cicorino
Member
 
Registered: May 2006
Location: Italy
Distribution: Slackware, Slackware64
Posts: 31

Original Poster
Rep: Reputation: 16
solved

Thank you very much, I learnt something.

Summary for ignorant people like me that are approaching wide chars:

1) a file pointer [FP] cannot be used during the same lifetime by
both char functions [CF]: {fprintf vfprintf fputc fwrite!! fwide(fp,-1)}
and wide char functions [WCF]: {fwprintf vfwprintf fputwc fwide(fp,1)}

2) to mix char and wide char output the code shall either use CF and write
wide chars by (%ls,%lc) or use WCF and write chars by (%s,%c).

3) the 'strange' but sane behaviour (that unluckily is not present in
windows) is due to the internal state of the FP that is imposed either
by fwide or implicitly by either a call to CF (that inhibits WCF) or a call
to WCF (that inhibits CF).

4) pay attention to fwrite when you want to mix binary and wide char
output inside a stream because it switches the FP to accept only CF

5) use swprintf, sprintf, iconv, open and write to start your experiments

Last edited by cicorino; 02-19-2008 at 03:10 AM. Reason: typo
 
Old 02-19-2008, 08:20 AM   #7
fantas
Member
 
Registered: Jun 2007
Location: Bavaria
Distribution: slackware, xubuntu
Posts: 143

Rep: Reputation: 22
Interesting result to say the least as I haven't used those specific WC-related file functions myself yet. What I do/did instead is use a templated char vector (8, 16 or 32 bits) and use the "normal" fwrite etc. functions with write size relative to the char width.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
10 chars requirement Alien_Hominid LQ Suggestions & Feedback 40 02-17-2008 03:07 PM
display only certain chars in each line cranium2004 Programming 2 02-21-2006 02:03 PM
international chars awl Slackware 8 09-01-2005 01:17 PM
storing 5 chars into an int in C++? jwn7 Programming 28 05-30-2005 06:54 AM
chars } and $ won't work.. among others.. apax Linux - Newbie 8 11-26-2003 09:20 AM


All times are GMT -5. The time now is 06:11 PM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration