LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   iconv us-ascii to UTF-8 or ISO-8859-15 (http://www.linuxquestions.org/questions/linux-software-2/iconv-us-ascii-to-utf-8-or-iso-8859-15-a-705054/)

m4rtin 02-16-2009 08:45 AM

iconv us-ascii to UTF-8 or ISO-8859-15
 
Why isn't it possible to convert us-ascii or ASCII to UTF-8? Or am I doing something wrong?

Code:

root@martin-desktop:/home/martin/test# nano file1.txt
root@martin-desktop:/home/martin/test# file --mime file1.txt
file1.txt: text/plain charset=us-ascii
root@martin-desktop:/home/martin/test# iconv -c -f ASCII -t UTF-8 file1.txt > file2.txt
root@martin-desktop:/home/martin/test# file --mime file2.txt
file2.txt: text/plain charset=us-ascii

Even from us-ascii to ISO-8859-15 doesn't work:

Code:

root@martin-desktop:/home/martin/test# iconv -c -f ASCII -t ISO-8859-15 file1.txt > file3.txt
root@martin-desktop:/home/martin/test# file --mime file3.txt file3.txt: text/plain charset=us-ascii

What might be the problem?

David the H. 02-16-2009 09:06 AM

I may be wrong, but I believe it's because the first set of encoding tables in UTF-8 and ISO-8859 are identical to ASCII. There's no need for the textfile to appear otherwise until non-ascii characters are introduced. Either that or file just can't tell the difference between them. As soon as you add a non-ascii character the file output should change.

servat78 02-18-2009 07:34 PM

The previous posting is correct. The ASCII encoding containing the 128 basic chars is exactly the same for the UTF-8. UTF-8 does it's tricks only for chars above the ASCII range. Technically an ASCII text file and an UTF-8 with the same contents are equivalent.

It would be a different case when converting ASCII to UTF-16, because UTF-16 uses 2-byte character code entries and the conversion would immediately double the file size.


All times are GMT -5. The time now is 09:23 PM.