LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Convert file from ISO-8859-1 to some Japanese encoding? (iconv errors) (https://www.linuxquestions.org/questions/linux-software-2/convert-file-from-iso-8859-1-to-some-japanese-encoding-iconv-errors-630639/)

violagirl23 03-25-2008 07:44 PM

Convert file from ISO-8859-1 to some Japanese encoding? (iconv errors)
 
I was looking through the SCIM-anthy configuration files, and they are all 100% mojibake (gibberish) whenever a Japanese symbol is used. Why? I found out they're in ISO-8859-1. As to why, I haven't the SLIGHTEST idea, but the fact of the matter is, they are. SCIM seems to be able to read them just fine, but that leaves them uneditable to me.... and I want to edit a .sty file (making a copy first, of course!) to change that particular layout to fit my Dvorak keyboard, as Dvorak totally messes with it. It's just impossible-seeming to figure out what's what when obscure symbols are being put in place of the proper Japanese characters. @_@
This is what happened when I tried to convert one of them with iconv:
Code:

iconv --from-code=ISO-8859-1 --to-code=SHIFT-JIS ./101kana2.sty > ./101kana3.sty
iconv: illegal input sequence at position 876

iconv --from-code=ISO-8859-1 --to-code=EUC-JP ./101kana2.sty > ./101kana3.sty
iconv: illegal input sequence at position 880

iconv --from-code=ISO-8859-1 --to-code=ISO2022JP ./101kana2.sty > ./101kana3.sty
iconv: illegal input sequence at position 876

iconv --from-code=ISO-8859-1 --to-code=UTF-8 worked, but the symbols did not change to Japanese writing... :?
What should I do to fix this?

Edit: Now I'm even more befuddled... I looked at the document and it mentions near the top "Encoding = EUC-JP"
However, I had assumed it was ISO8859-1 because when I clicked Save As it listed the current encoding as ISO8859-1. And inconv lets me convert from EUC-JP to other Japanese encodings, or UTF-8, but the gibberish characters still remain. What could be causing this?

jschiwal 03-25-2008 07:53 PM

According to this Wikipedia article, you may have mis-identified the input character set:
http://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1

There are no japanese characters in the character set. Try and see what the "file" command says. Sometimes the characters used are also found with other character sets and you may have several options.

I've noticed that some programs like less don't handle different character sets but others like "more" do.

I tried converting a RealPlayer japanese README file from UTF-8 to SHIFT-JIS and it wasn't successful, unless I need to install something for SHIFT-JIS support or need to modprobe a kernel module. Wouldn't a UTF-8 file be usable?

violagirl23 03-25-2008 08:00 PM

Yes, that is exactly what happened, as I showed in my edit, but this doesn't solve the problem...

Ah, to further confound matters, it appears that the real issue lies in the fact that I can't save any files as EUC-JP with Japanese in them... well, I can, but they turn out as the same gibberish as that one document did... so this is obviously the problem. To save it, I just said Select other Codeset and then typed in EUC-JP.
Then I reopened it and had the same nonsense as that other file!

jschiwal 03-25-2008 11:00 PM

Shouldn't you convert the document before adding Japanese characters?
For example:
Perform the step
iconv --from-code=ISO-8859-1 --to-code=UTF-8 file >utf8_file
And then edit the file using utf8 encoding.

Also, take a look at the nls kernel modules. If there is a matching module, you may need to modprobe it. I'm not sure if this will just effect filenames however.

ls /lib/modules/$(uname -r)/kernel/fs/nls

violagirl23 03-25-2008 11:21 PM

But as I said I discovered, it is not actually a conversion problem at all. It turns out my problem is that I cannot probably see the contents of any file encoded in EUC-JP. Even if I write the file myself and save it as EUC-JP encoding, when I go to view it I see strange symbols in place of where the Japanese should be. I need to figure out how to equip my system to view such files. Does anyone have any idea what I need to do to do this?

jschiwal 03-26-2008 12:13 AM

Can you view the same type of characters in a utf-8 file?
Would a utf-8 file be usable for your purposes.

Can you view a EUC-JP file in kate?
How about the "more" program?

Maybe this document may help. I hope it isn't too dated.
http://www.suse.de/~mfabian/suse-cjk.pdf

Also explore packages in your packaging system and see if you may be missing
some packages for EUC-JP support.


All times are GMT -5. The time now is 06:17 AM.