LinuxQuestions.org - read other encoding file

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - read other encoding file (https://www.linuxquestions.org/questions/linux-general-1/read-other-encoding-file-620749/)

ufmale

02-13-2008 02:14 AM

read other encoding file

I am trying to read a Chinese file in Fedora 7. The file was encoded with UTF-16LE, viewable in XP machine called chinese16.txt. My Window XP has codepage 437. Then, using Notepad i saved the file as "save as UTF-8" and name it as chinese8.txt.

After copy the file to the Linux machine, where the default locale is en_US.utf8 (according to the "$ locale" command.

I searched on the web and tried changing the locale with localedef command to change the locale to zh_CN.UTF-8, and export LANG=zh_CN.UTF-8.

when trying to view the utf8 file, chinese8.txt, I could not see any content in that file. I am not sure what i did incorrectly.

is there any expert in this area who can help?

jschiwal

02-13-2008 02:42 AM

Can you copy the original file? Look at "file chinese16.txt" and see if the encoding is detected.
There is a program called iconv that may be able to convert the file to utf8 or utf16. Then see if you can read it in kate or another editor.

ufmale

02-13-2008 08:48 AM

yes. I also use iconv to convert the file. The file contain English word and Chinese words. After converting them to UTF-8, I can see only English word, but the Chinese characters are not viewable (blank)

Su-Shee

02-13-2008 12:52 PM

Stick with the orginal chinese16.txt file - UTF-16 would be a common encoding of Unicode under Windows.

Iconv knows of this and will (should be able to) convert this to the Unix world UTF-8 encoding.

If you call inconv -l you'll see all the utf-variants listed, try to convert directly from the orginal to utf-8.

And the locale setting you chose (LANG) actually just sets things to show error messages and menus and things like that in chinese - you'll need "show me chinese characters" which would be LC_CTYPE. (character type). LC_COLLATE affects sorting but I don't know if this really does chinese dictionary sorting with radicals and strokes counted and so on.

So, export LC_CTYPE to the proper locale setting you've got available (locale -a) - and I'm not sure wether or not this is case-sensitive or not.

ufmale

02-13-2008 07:08 PM

If I view this UTF-8 file in window, the content is
Application=先生

This is what I did

$ localedef -c -i zh_CN -f UTF-8 zh_CN.UTF-8
$ export LANG=zh_CN.UTF-8
$ locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

Viewing in Linux using $ more, and $ vi, I got blank space for the Chinese characters
Application=

All times are GMT -5. The time now is 04:54 AM.