-   Linux - Software (
-   -   iconv us-ascii to UTF-8 or ISO-8859-15 (

m4rtin 02-16-2009 09:45 AM

iconv us-ascii to UTF-8 or ISO-8859-15
Why isn't it possible to convert us-ascii or ASCII to UTF-8? Or am I doing something wrong?


root@martin-desktop:/home/martin/test# nano file1.txt
root@martin-desktop:/home/martin/test# file --mime file1.txt
file1.txt: text/plain charset=us-ascii
root@martin-desktop:/home/martin/test# iconv -c -f ASCII -t UTF-8 file1.txt > file2.txt
root@martin-desktop:/home/martin/test# file --mime file2.txt
file2.txt: text/plain charset=us-ascii

Even from us-ascii to ISO-8859-15 doesn't work:


root@martin-desktop:/home/martin/test# iconv -c -f ASCII -t ISO-8859-15 file1.txt > file3.txt
root@martin-desktop:/home/martin/test# file --mime file3.txt file3.txt: text/plain charset=us-ascii

What might be the problem?

David the H. 02-16-2009 10:06 AM

I may be wrong, but I believe it's because the first set of encoding tables in UTF-8 and ISO-8859 are identical to ASCII. There's no need for the textfile to appear otherwise until non-ascii characters are introduced. Either that or file just can't tell the difference between them. As soon as you add a non-ascii character the file output should change.

servat78 02-18-2009 08:34 PM

The previous posting is correct. The ASCII encoding containing the 128 basic chars is exactly the same for the UTF-8. UTF-8 does it's tricks only for chars above the ASCII range. Technically an ASCII text file and an UTF-8 with the same contents are equivalent.

It would be a different case when converting ASCII to UTF-16, because UTF-16 uses 2-byte character code entries and the conversion would immediately double the file size.

All times are GMT -5. The time now is 12:02 PM.