Looking for advice on terminal emulator. Utf-8 is not displayed well
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Looking for advice on terminal emulator. Utf-8 is not displayed well
Hello:
I'm setting my own minimalistic desktop, based on Fluxbox.
There are various terminal emulators that come installed with Slackware: Konsole, Xfce's Terminal... I don't want to use these, they're tied (?) to their respective desktop environments.
I've been trying xterm and rvxt (this last only a little), but they don't seem to display utf-8 characters well. Midnight Commander is unusable, man pages show strange characters where accented vowels should go. Hell, there are strange chars shown even when xmessage tries to display the "About Fluxbox" in my language.
I don't know if all these strange symptoms are all caused by one or more misconfigured options. I thought it could be the terminal emulator, at least. I'm unsure about xmessage.
I have the LANG variables in lang.sh and lang.csh set to es_ES.utf8. Also I have set the XkbLayout to "es" in the /etc/X11/xorg.conf.d/90-keyboard-layout.conf file. I don't know if I could do anything else.
Do you know if the problem would be only in the terminal emulator?. If so, which one would you suggest?. Could I just solve my problems setting some mystical option out there?.
I just checked. AFAIK all terminal emulators for X shipped in Slackware can display properly UTF8 encoded characters, at least if you use a suitable option (xterm) or a proper setting (konsole). Only exception: rxvt.
I've tried xterm, rxvt, terminal, konsole, vte.
My suggestion/request to Pat for Slackware-next: replace rxvt with rxvt-unicode (aka urxvt) and symlink the former to the latter, as sh => bash.
Or at least, ship urxvt alongside rxvt.
PS. reminder: on a Linux console or tty, just run unicode_start.
Last edited by Didier Spaier; 11-23-2014 at 03:27 PM.
I've always used either iso8859-1 or iso8859-15 (which is almost identical), and I've only recently switched to using utf-8 on my 'current' test partition.
'man' seems to cause problems for me using the terminus font on a virtual console. In a utf-8 locale man/groff appears to use 'hyphen' (U+2010) when it needs to split a word at a line end. A iso8859-1 locale appears to generate a 'hyphen-minus' (U+002D) for this; both despite their being a 'soft-hyphen' (U+00AD) in both unicode and 8859-1 encodings intended specifically for this purpose! (oh well. ).
It seems that the terminus font/unicode console map doesn't include a mapping to cater for U+2010, so it generates accented character(s) or graphics symbols. I'm thinking of customising a mapping file to map it to the glyph for U+002D which I suspect the soft-hyphen is also sharing anyway.
Another option would be to force nroff to generate latin1 output via the man.conf file, but I'm undecided whether I want to do that.
Long story short, correct operation is going to depend on not only your terminal program and the encoding it's using, but also your choice of font and how good its unicode-to-glyph mappings are.
Though I still have this issue with the virtual consoles+terminus to solve, I've not had any problems with xterm/uxterm using the dejavu sans mono font, so perhaps give that a try.
I've always used either iso8859-1 or iso8859-15 (which is almost identical), and I've only recently switched to using utf-8 on my 'current' test partition.
'man' seems to cause problems for me using the terminus font on a virtual console. In a utf-8 locale man/groff appears to use 'hyphen' (U+2010) when it needs tobout split a word at a line end. A iso8859-1 locale appears to generate a 'hyphen-minus' (U+002D) for this; both despite their being a 'soft-hyphen' (U+00AD) in both unicode and 8859-1 encodings intended specifically for this purpose! (oh well. ).
About the terminus fonts: you need to choose one that fits the encoding, see /usr/doc/terminus-font-4.38
It seems that the terminus font/unicode console map doesn't include a mapping to cater for U+2010, so it generates accented character(s) or graphics symbols. I'm thinking of customising a mapping file to map it to the glyph for U+002D which I suspect the soft-hyphen is also sharing anyway.
Another option would be to force nroff to generate latin1 output via the man.conf file, but I'm undecided whether I want to do that.
Long story short, correct operation is going to depend on not only your terminal program and the encoding it's using, but also your choice of font and how good its unicode-to-glyph mappings are.
Though I still have this issue with the virtual consoles+terminus to solve, I've not had any problems with xterm/uxterm using the dejavu sans mono font, so perhaps give that a try.
As a reminder the legacy encoding for man pages is ISO8859-1, but of course a lot of languages are not covered by this codepage. Also, groff uses by default the preconv processor to convert encodings to something troff understand see man preconv. A long as you know the file's encoding that's OK, you can set GROFF_ENCODING to either utf8 or the legacy encoding, but what if you don't know it?
AFAIK Slackware ships the man pages "as is". The result is a mess, because each upstream provider of man pages has its own policy for encoding and choosing the man pages' location.
Currently the main issue is that the encoding of a man page is easy enough to guess when the name of the directory that hosts it ends in .UTF-8 or .ISO8859-1, but else only your crystal ball can help you.
Of course one can request Pat to convert all man pages to UTF-8. But there are more than 16000 regular files in /usr/man, so he will need some help to do that
According to Denis Barbier BSDs have chosen to put all man pages in subdirectories per encoding, but that has a maintenance cost as well, and a risk to go out of sync.
Another solution is described in this post from Bruno Haible, I didn't try it.
We encountered that issue in the Slint project choosing an encoding for localized man pages shipped in the packages slackpkg and pktools. Using only ISO8859-1 was not an option, as it can't be used for Greek, Polish, Russian, Serbian, Turkish... So we ended up using UTF-8 for all languages.
About the terminus fonts: you have to use the variant that fits the encoding, as stated in /usr/doc/terminus-font-4.38/README.
Yes DejaVu Sans Mono is a good font, that covers almost all alphabetic languages.
You can even use it in a Linux console or tty, provided you installed fbterm.
Last edited by Didier Spaier; 11-24-2014 at 11:25 AM.
that's what tmux or dvtm are for. what i don't like about st is the xft dependency - thatswhy i use an old snapshot of st.
Another minimalistic utf8 capable terminal emulator is uuterm (by the author of musl).
Last edited by dederon; 11-24-2014 at 11:02 AM.
Reason: fix grammar
uxterm displays unicode just fine here. I personally don't like xft/TrueType fonts and use this (from my ~/.Xresources getting read by ~/.xinitrc or ~/.xsession depending on how you start x):
About the terminus fonts: you have to use the variant that fits the encoding, as stated in /usr/doc/terminus-font-4.38/README.
Thanks for the info on man/roff. That added a bit of detail I wasn't aware of. On the terminus front, I've been happily using it just fine under 8859-x and its only since moving to utf-8 that the issue with the hyphen came to light. The problem is that U+2010 that man/roff is generating when run in utf-8 isn't mapped in any of the terminus fonts, so you end up with whatever glyph the replacement/missing character (U+FFFD) is mapped to.
Anyway, I decided adding a substitute mapping to 0x2d for U+2010 was the right way to go, so the hyphen issue is sorted, even if the rest of man-page localisation is still as messy as ever.
The problem is that U+2010 that man/roff is generating when run in utf-8 isn't mapped in any of the terminus fonts, so you end up with whatever glyph the replacement/missing character (U+FFFD) is mapped to.
But if the man page itself is encoded in ISO8859-1, did you try "GROFF_ENCODING=ISO8859-1 man <page name>"?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.