LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 12-20-2004, 04:47 PM   #1
tcma
Member
 
Registered: Aug 2004
Distribution: gentoo, Fedora Core
Posts: 54

Rep: Reputation: 15
convert Traditional to Simplified Chinese & vice versa


I want to convert some Chinese documents from Traditional Chinese to Simplified Chinese and vice versa.
i.e. Select some text, click a button, then the text will toggle between Traditional Chinese and Simplified Chinese.
Is there a program to do that on linux?
 
Old 12-21-2004, 12:55 AM   #2
kngharv
Member
 
Registered: Nov 2001
Location: China, USA
Distribution: SUN JDS/SUSE 9.1
Posts: 33

Rep: Reputation: 15
You are asking a lot of questions in just one sentence.

There are several scenarios.

Scenario 1. the text you want to convert is in different encoding, and everything is in pure text file. AND you want to convert t.Chinese in Big5 encoded text to to s.Chinese in GB encoded text.

This is by far the most simple case.

In this case, you can run iconv and other UNIX command to convert the text. You will able to reach almost 100% accuracy due to the nature of the Traditional Chinese / Simplified Chinese characters mapping.

Scenario 2. In the same scenario as above except you want to convert GB encoded text to BIG5 encoded text, you can still use iconv, but you will not reach the accuracy which is considered "acceptable" for let say, government agency.

Traditional Chinese / Simplified Chinese is strictly many-to-many mapping. Having said that, T--->S is *MOSTLY* MANY-TO-ONE relationship except very few exceptions (乾 in 乾隆, for example).

in the case of those exceptions, iconv will fail to identify the proper characters.

S--->T mapping has a lot of ONE-TO-MANY relationship. 乾 vs. 干, 髮 vs. 發, etc. And that most of these ONE-To-MANY relationships occures on frequently used characters.

This can not be resolved easily without doing some interesting things such as lexical analysis and language modeling. I have not see any open-source tools that is good enough to be used on reliable basis.


Scenario 3: You are trying to convert Traditional Chinese to Simplified Chinese *OR* vice versa, but both Traditional Chinese and Simplified Chinese are encoded in the same encoding.

This scenario occurs when the document is encoded in UTF8 or other UNICODE encoding. In this case, you are out of luck. I think there is a java tool in mandarintools website which does that, but I am not very happy with the result, as it only works on those character which has ONE-TO-ONE mapping relationships.

One can argue that those ONE-to-ONE mapping characters should be merged into a single code point in UNICODE. Then again, that is a completely different topic on its own


kngharv
 
Old 12-21-2004, 07:27 AM   #3
checkchan
LQ Newbie
 
Registered: Jul 2003
Posts: 17

Rep: Reputation: 0
kngharv - does SimSci ring a bell?
 
Old 12-21-2004, 07:15 PM   #4
kngharv
Member
 
Registered: Nov 2001
Location: China, USA
Distribution: SUN JDS/SUSE 9.1
Posts: 33

Rep: Reputation: 15
please contact me :p

kngharv@hotmail.com
 
Old 02-06-2014, 09:21 PM   #5
Cybernetic1
LQ Newbie
 
Registered: Feb 2014
Posts: 1

Rep: Reputation: Disabled
This may be old, but I have tried the Java applet from the "mandarintools" web site to convert from traditional to simplified, and the result is (as far as I can tell) perfect.

The other way round (simplified to traditional) may be a one-to-many mapping, but I'm not interested in that so I have not tried.

In my case, I want:
UTF8 traditional --> UTF8 simplified

I have downloaded the Java source code from the above site. Inside there is a data file called "hcutf8.txt", that contains the simplified char followed by 1,2,or 3 traditional chars, all in UTF8 format. So basically I just need to "find and replace". It's easy to do in any other language (such as Javascript) using that data file.

Hope it helps
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to convert pdf to pdb and vice versa dr_zayus69 Linux - Software 1 07-17-2005 05:40 PM
in linux & c/c++: how do I convert an ascii string to utf8 & vice versa? davidh_uk Programming 2 02-06-2005 05:55 PM
copy&paste mozilla -> openoffice and vice versa mho Linux - Software 1 09-27-2003 03:51 PM
Convert Celius to fahreheit and vice versa bluewolf Linux - Newbie 2 02-07-2003 10:49 AM
Share Windows HD with MD & vice-versa lyllo Linux - Hardware 2 02-02-2003 06:03 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration