EDIT: Fixed
I am having some trouble getting Japanese text to show up properly inside my terminal/console. I do not want to use Kterm because it does not have some necessary features. Basically, I just want to know how to get something like mrxvt or konsole or xterm to work with this particular text properly.
I am running Ubuntu 6.10, and I have the Japanese language packs installed. Here is the basic problem:
I have a dictionary file that I am going to parse text from, but normal consoles (xterm, KDE's konsole) do not see the Japanese properly text when I do 'head dictionary_file' (I'm using Jim Breen's edict file) or anything else to output to the console. My consoles do work if I write in Japanese into them (both hiragana, katakana and kanji). I just cannot seem to get the text from this file.
Here is a small sample of the file:
http://students.washington.edu/cdobrich/edict.head
Kterm interprets the information properly, and here is the output from within Kterm with 'head dictionary_file':
Quote:
???? /EDICT, EDICT_SUB(P), EDICT2 Japanese-English Electronic Dictionary Files/Copyright Electronic Dictionary Research & Development Group - 2006/Created: 2007-02-11/
ヽ [くりかえし] /(n) repetition mark in katakana/
ヾ [くりかえし] /(n) voiced repetition mark in katakana/
ゝ [くりかえし] /(n) repetition mark in hiragana/
ゞ [くりかえし] /(n) voiced repetition mark in hiragana/
〃 [おなじく] /(n) ditto mark/
仝 [どうじょう] /(n) "as above" mark/
々 [くりかえし] /(n) repetition of kanji (sometimes voiced)/
〆 [しめ] /(n) end or closure mark/
〆切 [しめきり] /(n) closing/cut-off/end/deadline/Closed/No Entrance/
|
This is what is shows up like on other consoles:
Quote:
�������� /EDICT, EDICT_SUB(P), EDICT2 Japanese-English Electronic Dictionary Files/Copyright Electronic Dictionary Research & Development Group - 2006/Created: 2007-02-11/
�� [���꤫����] /(n) repetition mark in katakana/
�� [���꤫����] /(n) voiced repetition mark in katakana/
�� [���꤫����] /(n) repetition mark in hiragana/
�� [���꤫����] /(n) voiced repetition mark in hiragana/
�� [���ʤ���] /(n) ditto mark/
�� [�ɤ����礦] /(n) "as above" mark/
�� [���꤫����] /(n) repetition of kanji (sometimes voiced)/
�� [����] /(n) end or closure mark/
���� [�����] /(n) closing/cut-off/end/deadline/Closed/No Entrance/
|
If there is a problem with the dictionary_file itself, like perhaps it has been encoded strangely and needs to be altered, that is an option I would be happy to explore, but I really do not know if that is the root of the issue.
I tried custom compiling mrxvt from source with every variation of the options I could find the ./configure --help list, but nothing seems to work and they all give me garbage output. Again, I basically just want to know how to get something like mrxvt or konsole or xterm to work with the text properly.
EDIT: I figured out that the dictionary file was encoded using EUC, which is obviously wrong. So I used a program called 'nkf' to convert the text to UTF-8.