Old 04-04-2007, 09:44 AM   #1
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Rep: Reputation: 30
iconv - why does it club and form a single character


Puzzled why does iconv convert the following in a peculiar way.

Following is the example text..
dfsønne H converu
iconv -f UTF-8 -t ISO-8859-1 < file
dfsønne H converu
ø ====>  ø
Any reason, why the two characters are clubbed together to form ø in latin1 encoded format.

Is there any pattern to be followed for iconv conversion ?

Basically my question is how does iconv select characters that are to be clubbed ( from utf-8 input ) and convert to a character ( latin1 )

Any pointers, much appreciated!

Old 04-05-2007, 07:16 PM   #2
HCL Maintainer
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 79
The issue here is quite interesting. Iconv does not “club” your characters. What happens is that your terminal is set to display utf8, and doesn’t know the output from iconv will be in latin1. So the input “ø” (which is in utf8 0xc383 followed by 0xc2b8) get’s converted as it should be to “ø” (which is in latin1 0xc3 followed by 0xb8). Your terminal thinks stdout should be interpreted as being in utf8, so it reads this as one character 0xc3b8, which translates in utf8 to ø.


