LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 02-13-2008, 02:14 AM   #1
ufmale
Member
 
Registered: Feb 2007
Posts: 386

Rep: Reputation: 30
read other encoding file


I am trying to read a Chinese file in Fedora 7. The file was encoded with UTF-16LE, viewable in XP machine called chinese16.txt. My Window XP has codepage 437. Then, using Notepad i saved the file as "save as UTF-8" and name it as chinese8.txt.

After copy the file to the Linux machine, where the default locale is en_US.utf8 (according to the "$ locale" command.

I searched on the web and tried changing the locale with localedef command to change the locale to zh_CN.UTF-8, and export LANG=zh_CN.UTF-8.

when trying to view the utf8 file, chinese8.txt, I could not see any content in that file. I am not sure what i did incorrectly.

is there any expert in this area who can help?
 
Old 02-13-2008, 02:42 AM   #2
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Can you copy the original file? Look at "file chinese16.txt" and see if the encoding is detected.
There is a program called iconv that may be able to convert the file to utf8 or utf16. Then see if you can read it in kate or another editor.
 
Old 02-13-2008, 08:48 AM   #3
ufmale
Member
 
Registered: Feb 2007
Posts: 386

Original Poster
Rep: Reputation: 30
yes. I also use iconv to convert the file. The file contain English word and Chinese words. After converting them to UTF-8, I can see only English word, but the Chinese characters are not viewable (blank)
 
Old 02-13-2008, 12:52 PM   #4
Su-Shee
Member
 
Registered: Sep 2007
Location: Berlin
Distribution: Slackware
Posts: 510

Rep: Reputation: 53
Stick with the orginal chinese16.txt file - UTF-16 would be a common encoding of Unicode under Windows.

Iconv knows of this and will (should be able to) convert this to the Unix world UTF-8 encoding.

If you call inconv -l you'll see all the utf-variants listed, try to convert directly from the orginal to utf-8.

And the locale setting you chose (LANG) actually just sets things to show error messages and menus and things like that in chinese - you'll need "show me chinese characters" which would be LC_CTYPE. (character type). LC_COLLATE affects sorting but I don't know if this really does chinese dictionary sorting with radicals and strokes counted and so on.

So, export LC_CTYPE to the proper locale setting you've got available (locale -a) - and I'm not sure wether or not this is case-sensitive or not.
 
Old 02-13-2008, 07:08 PM   #5
ufmale
Member
 
Registered: Feb 2007
Posts: 386

Original Poster
Rep: Reputation: 30
If I view this UTF-8 file in window, the content is
Application=先生

This is what I did

$ localedef -c -i zh_CN -f UTF-8 zh_CN.UTF-8
$ export LANG=zh_CN.UTF-8
$ locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

Viewing in Linux using $ more, and $ vi, I got blank space for the Chinese characters
Application=
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Script to determine file encoding maxmil Linux - General 3 09-13-2008 03:26 AM
file encoding..?? peedaro Linux - General 7 11-20-2007 01:20 AM
finding the encoding of a file mike_stlouis Linux - Newbie 2 11-01-2007 08:47 AM
file encoding yuubouna Linux - General 3 01-16-2007 01:47 AM
encoding for ReiserFS file names? uselpa Slackware 1 11-15-2005 04:09 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 08:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration