LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-25-2008, 07:44 PM   #1
violagirl23
Member
 
Registered: Aug 2007
Location: Michigan
Distribution: Gentoo, Arch
Posts: 33

Rep: Reputation: 15
Convert file from ISO-8859-1 to some Japanese encoding? (iconv errors)


I was looking through the SCIM-anthy configuration files, and they are all 100% mojibake (gibberish) whenever a Japanese symbol is used. Why? I found out they're in ISO-8859-1. As to why, I haven't the SLIGHTEST idea, but the fact of the matter is, they are. SCIM seems to be able to read them just fine, but that leaves them uneditable to me.... and I want to edit a .sty file (making a copy first, of course!) to change that particular layout to fit my Dvorak keyboard, as Dvorak totally messes with it. It's just impossible-seeming to figure out what's what when obscure symbols are being put in place of the proper Japanese characters. @_@
This is what happened when I tried to convert one of them with iconv:
Code:
iconv --from-code=ISO-8859-1 --to-code=SHIFT-JIS ./101kana2.sty > ./101kana3.sty
iconv: illegal input sequence at position 876

iconv --from-code=ISO-8859-1 --to-code=EUC-JP ./101kana2.sty > ./101kana3.sty
iconv: illegal input sequence at position 880

iconv --from-code=ISO-8859-1 --to-code=ISO2022JP ./101kana2.sty > ./101kana3.sty
iconv: illegal input sequence at position 876
iconv --from-code=ISO-8859-1 --to-code=UTF-8 worked, but the symbols did not change to Japanese writing... :?
What should I do to fix this?

Edit: Now I'm even more befuddled... I looked at the document and it mentions near the top "Encoding = EUC-JP"
However, I had assumed it was ISO8859-1 because when I clicked Save As it listed the current encoding as ISO8859-1. And inconv lets me convert from EUC-JP to other Japanese encodings, or UTF-8, but the gibberish characters still remain. What could be causing this?

Last edited by violagirl23; 03-25-2008 at 07:56 PM.
 
Old 03-25-2008, 07:53 PM   #2
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
According to this Wikipedia article, you may have mis-identified the input character set:
http://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1

There are no japanese characters in the character set. Try and see what the "file" command says. Sometimes the characters used are also found with other character sets and you may have several options.

I've noticed that some programs like less don't handle different character sets but others like "more" do.

I tried converting a RealPlayer japanese README file from UTF-8 to SHIFT-JIS and it wasn't successful, unless I need to install something for SHIFT-JIS support or need to modprobe a kernel module. Wouldn't a UTF-8 file be usable?

Last edited by jschiwal; 03-25-2008 at 08:10 PM.
 
Old 03-25-2008, 08:00 PM   #3
violagirl23
Member
 
Registered: Aug 2007
Location: Michigan
Distribution: Gentoo, Arch
Posts: 33

Original Poster
Rep: Reputation: 15
Yes, that is exactly what happened, as I showed in my edit, but this doesn't solve the problem...

Ah, to further confound matters, it appears that the real issue lies in the fact that I can't save any files as EUC-JP with Japanese in them... well, I can, but they turn out as the same gibberish as that one document did... so this is obviously the problem. To save it, I just said Select other Codeset and then typed in EUC-JP.
Then I reopened it and had the same nonsense as that other file!

Last edited by violagirl23; 03-25-2008 at 08:04 PM.
 
Old 03-25-2008, 11:00 PM   #4
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Shouldn't you convert the document before adding Japanese characters?
For example:
Perform the step
iconv --from-code=ISO-8859-1 --to-code=UTF-8 file >utf8_file
And then edit the file using utf8 encoding.

Also, take a look at the nls kernel modules. If there is a matching module, you may need to modprobe it. I'm not sure if this will just effect filenames however.

ls /lib/modules/$(uname -r)/kernel/fs/nls

Last edited by jschiwal; 03-25-2008 at 11:03 PM.
 
Old 03-25-2008, 11:21 PM   #5
violagirl23
Member
 
Registered: Aug 2007
Location: Michigan
Distribution: Gentoo, Arch
Posts: 33

Original Poster
Rep: Reputation: 15
But as I said I discovered, it is not actually a conversion problem at all. It turns out my problem is that I cannot probably see the contents of any file encoded in EUC-JP. Even if I write the file myself and save it as EUC-JP encoding, when I go to view it I see strange symbols in place of where the Japanese should be. I need to figure out how to equip my system to view such files. Does anyone have any idea what I need to do to do this?
 
Old 03-26-2008, 12:13 AM   #6
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Can you view the same type of characters in a utf-8 file?
Would a utf-8 file be usable for your purposes.

Can you view a EUC-JP file in kate?
How about the "more" program?

Maybe this document may help. I hope it isn't too dated.
http://www.suse.de/~mfabian/suse-cjk.pdf

Also explore packages in your packaging system and see if you may be missing
some packages for EUC-JP support.

Last edited by jschiwal; 03-28-2008 at 02:29 AM. Reason: fixed typo.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
convert file from UTF8 to ASCII encoding graemef Programming 8 12-15-2008 04:45 AM
convert text-file from utf-8 to iso-8859-1 [SOLVED] @ngelot Linux - Server 1 06-12-2007 05:47 AM
LXer: CLI Magic: Convert file names to a different encoding with convmv LXer Syndicated Linux News 0 12-11-2006 03:54 PM
Changing system character encoding to ISO 8859-1 flork SUSE / openSUSE 1 12-15-2005 06:21 AM
iso 8859 or iso 9960 tsundram Linux - Newbie 16 02-22-2002 10:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration