-   Slackware (
-   -   Full Unicode support (

quiescere 02-02-2006 08:02 AM

Full Unicode support

I am a budding language scholar studying French, Japanese, and some Greek as I have time. Russian is hanging out in my future, tempting me . . . Anyway, I need to be able to work comfortably in, hopefully, all these languages, although Japanese is less critical to me at this stage. I am also working with a heterogeneous network, mixing Win2k, ME, XP and Slackware (10.1 or .2). Clearly, language encoding and typing has the potential to be a huge headache for me.

I have read the Unicode-HOWTO and done a good bit of searching in this and other forums, and still do not have a clear idea what to do.

In particular, I do not yet have an easy way to enter accented or special characters, though I understand the SCIM project will help with that considerably. Also, although I know that filenames should be clean regardless, as the filesystem (ext3, in this case) does not care how many bytes are used to encode a character, many programs that access the filesystem are not so clever. Just being able to view the filename and tags of my Japanese, English, and French mp3 files would be lovely.

First, encoding: I assume I will need to change /etc/profile.d/ to include
export lang=en_US.UTF-8
rather than
export lang=en_US
I have also tracked down a reasonable number of Unicode fonts, including cyberbit, kochi-substitute from, the MgOpen package for Greek, and the Microsoft core fonts for the web. I will need to tell such X applications as have an option for it (e.g., Firefox) to use Unicode (UTF-8) encoding. I have no direct vfat or ntfs mounts--all shares are via smbfs--so I do not need to specify utf8 in the mount options.

Is there anything else I need to do to use and share files cleanly within X?

That brings us to the console, which I do still use regularly, especially when my laptop is running via battery. There I am completely out to sea when it comes to Unicode support. Right now I am using the basic console fonts included with the kernel under a VESA framebuffer. Do I need to track down others? Will a kernel patch be required?

Principal system: Slackware 10.1, kernel 2.6.7, fairly vanilla.

Any guidance, links to manuals or other posts, requests for clarification, etc. are greatly appreciated.


cathectic 02-03-2006 12:18 PM

Once you change the export language, all programs will inherit from this - the only changes you really need to make are to ensure all programs are using UTF-8 fonts.

I also found that 'man' will display strange characters using UTF-8, the cure for that was to add the following line to /etc/profile:

alias man='LC_ALL=C man'

I haven't fiddled much with the console though with regards to fonts.

Also, I haven't experimented much with non-English characters (X programs handle them fine though, in this case, using MS Arial font). You might also have problems with names when transitioning from ASCII to UTF-8, as the accented characters in ASCII do not map correctly to UTF-8 (i.e. you may have to rename some files if you did not create them using UTF-8 if they use non-English characters).

I don't have any experience with non Latin alphabet languages, so I can't help you there.

quiescere 02-03-2006 06:11 PM

Thanks for the response. So far it has gone okay (and having the new fonts has certainly improved web browsing aesthetics). I have had to compile Terminal to replace xfterm4, but that was not a big deal. There are still a few screwy things--punctuation that does not map, and SCIM just will not compile--but mostly things have gone okay. Seeing kanji in a terminal window is just weird.

SCIM is complaining about ::malloc being undefined, but once I get that sorted it will be easier for me to give this a real workout. I'm afraid I may have to give up on the console plan for the moment, though--it looks like a lot of effort for little reward. O, and for some reason smbmount is not working properly now. I think it was working before the change, but I could be wrong.

Again, thanks much for taking the time to answer, and thoroughly.

Mugg 02-13-2006 04:17 PM

May I ask how you did?

I don't use a graphical environment. I use stable Debian with no real extra stuff other than the web-server kit. And I'd like to use unicode filenames. What are the exact steps to achieve that?

quiescere 02-24-2006 11:10 AM

Afraid I don't have good news for you. I only got as far as getting Unicode (mostly) working in X. I put off getting the console side working because it looked messier, and now the problem is moot because the power supply in the laptop has gone tits-up.

All I can offer you is this:
  1. You can already have UTF-8 filenames. The kernel and the linux filesystems don't really care, from what I gather. The trick is access/display/editting.
  2. The Unicode FAQ claims

    The console display and keyboard driver (another VT100 emulator) have to encode and decode UTF-8 and should support at least some subset of the Unicode character set. This had already been available in Linux as early as kernel 1.2 (send ESC %G to the console to activate UTF-8 mode).
    I overlooked that bit about the ESC code in my earlier research, so that may be all you need on a modern install.
  3. As a Debian user, you might find this Step by step introduction to switching your debian installation to utf-8 encoding useful. It's been updated as recently as late Oct. 2004, so it's relatively current.

Good luck. I'm sorry I was not more helpful, but if you get this working I would love to hear what you learn.

All times are GMT -5. The time now is 12:25 PM.