LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Slackware (https://www.linuxquestions.org/questions/slackware-14/)
-   -   Looking for advice on terminal emulator. Utf-8 is not displayed well (https://www.linuxquestions.org/questions/slackware-14/looking-for-advice-on-terminal-emulator-utf-8-is-not-displayed-well-4175526279/)

PreguntoYo 11-23-2014 12:37 PM

Looking for advice on terminal emulator. Utf-8 is not displayed well
 
Hello:

I'm setting my own minimalistic desktop, based on Fluxbox.

There are various terminal emulators that come installed with Slackware: Konsole, Xfce's Terminal... I don't want to use these, they're tied (?) to their respective desktop environments.

I've been trying xterm and rvxt (this last only a little), but they don't seem to display utf-8 characters well. Midnight Commander is unusable, man pages show strange characters where accented vowels should go. Hell, there are strange chars shown even when xmessage tries to display the "About Fluxbox" in my language.

I don't know if all these strange symptoms are all caused by one or more misconfigured options. I thought it could be the terminal emulator, at least. I'm unsure about xmessage.

I have the LANG variables in lang.sh and lang.csh set to es_ES.utf8. Also I have set the XkbLayout to "es" in the /etc/X11/xorg.conf.d/90-keyboard-layout.conf file. I don't know if I could do anything else.

Do you know if the problem would be only in the terminal emulator?. If so, which one would you suggest?. Could I just solve my problems setting some mystical option out there?.

I'm clueless :confused: Help needed.

astrogeek 11-23-2014 12:41 PM

I use (and recommend) rxvt-unicode for utf-8 as the terminal emulator along with tmux terminal multiplexer in Fluxbox and couldn't be happier.

rxvt-unicode is available from SBo (assumes Slackware).

Tmux is shipped with current Slackware and Fluxbox you have.

moisespedro 11-23-2014 12:53 PM

You can also use xterm or st

st is way better IMO

ml4711 11-23-2014 01:23 PM

Quote:

xterm and rvxt (this last only a little), but they don't seem to display utf-8 characters well.

Midnight Commander is unusable
uxterm displays utf-8 characters, and so does MC used in uxterm!

Enjoy

dugan 11-23-2014 01:26 PM

Terminator displays every character in quickbrown.txt.

genss 11-23-2014 01:29 PM

uxterm is xterm ran with the -u8 flag
(open the uxterm binary with a text editor)

xterm is the best terminal emulator i have found yet
lowest memory/cpu usage while good enough

st would be better if it had history :)

Didier Spaier 11-23-2014 03:05 PM

I just checked. AFAIK all terminal emulators for X shipped in Slackware can display properly UTF8 encoded characters, at least if you use a suitable option (xterm) or a proper setting (konsole). Only exception: rxvt.

I've tried xterm, rxvt, terminal, konsole, vte.

My suggestion/request to Pat for Slackware-next: replace rxvt with rxvt-unicode (aka urxvt) and symlink the former to the latter, as sh => bash.

Or at least, ship urxvt alongside rxvt.

PS. reminder: on a Linux console or tty, just run unicode_start.

GazL 11-24-2014 05:34 AM

I've always used either iso8859-1 or iso8859-15 (which is almost identical), and I've only recently switched to using utf-8 on my 'current' test partition.

'man' seems to cause problems for me using the terminus font on a virtual console. In a utf-8 locale man/groff appears to use 'hyphen' (U+2010) when it needs to split a word at a line end. A iso8859-1 locale appears to generate a 'hyphen-minus' (U+002D) for this; both despite their being a 'soft-hyphen' (U+00AD) in both unicode and 8859-1 encodings intended specifically for this purpose! (oh well. :().


It seems that the terminus font/unicode console map doesn't include a mapping to cater for U+2010, so it generates accented character(s) or graphics symbols. I'm thinking of customising a mapping file to map it to the glyph for U+002D which I suspect the soft-hyphen is also sharing anyway.

Another option would be to force nroff to generate latin1 output via the man.conf file, but I'm undecided whether I want to do that.

Long story short, correct operation is going to depend on not only your terminal program and the encoding it's using, but also your choice of font and how good its unicode-to-glyph mappings are.

Though I still have this issue with the virtual consoles+terminus to solve, I've not had any problems with xterm/uxterm using the dejavu sans mono font, so perhaps give that a try.

Paulo2 11-24-2014 06:47 AM

Quote:

Originally Posted by PreguntoYo (Post 5273883)
they're tied (?) to their respective desktop environments.

If you did a full install, you can use konsole or xfce4-terminal on Fluxbox.
Konsole even appears on the menu.

Didier Spaier 11-24-2014 09:18 AM

Quote:

Originally Posted by GazL (Post 5274195)
I've always used either iso8859-1 or iso8859-15 (which is almost identical), and I've only recently switched to using utf-8 on my 'current' test partition.

'man' seems to cause problems for me using the terminus font on a virtual console. In a utf-8 locale man/groff appears to use 'hyphen' (U+2010) when it needs tobout split a word at a line end. A iso8859-1 locale appears to generate a 'hyphen-minus' (U+002D) for this; both despite their being a 'soft-hyphen' (U+00AD) in both unicode and 8859-1 encodings intended specifically for this purpose! (oh well. :().

About the terminus fonts: you need to choose one that fits the encoding, see /usr/doc/terminus-font-4.38

It seems that the terminus font/unicode console map doesn't include a mapping to cater for U+2010, so it generates accented character(s) or graphics symbols. I'm thinking of customising a mapping file to map it to the glyph for U+002D which I suspect the soft-hyphen is also sharing anyway.

Another option would be to force nroff to generate latin1 output via the man.conf file, but I'm undecided whether I want to do that.

Long story short, correct operation is going to depend on not only your terminal program and the encoding it's using, but also your choice of font and how good its unicode-to-glyph mappings are.

Though I still have this issue with the virtual consoles+terminus to solve, I've not had any problems with xterm/uxterm using the dejavu sans mono font, so perhaps give that a try.

As a reminder the legacy encoding for man pages is ISO8859-1, but of course a lot of languages are not covered by this codepage. Also, groff uses by default the preconv processor to convert encodings to something troff understand see man preconv. A long as you know the file's encoding that's OK, you can set GROFF_ENCODING to either utf8 or the legacy encoding, but what if you don't know it?

AFAIK Slackware ships the man pages "as is". The result is a mess, because each upstream provider of man pages has its own policy for encoding and choosing the man pages' location.

Currently the main issue is that the encoding of a man page is easy enough to guess when the name of the directory that hosts it ends in .UTF-8 or .ISO8859-1, but else only your crystal ball can help you.

Of course one can request Pat to convert all man pages to UTF-8. But there are more than 16000 regular files in /usr/man, so he will need some help to do that ;)

According to Denis Barbier BSDs have chosen to put all man pages in subdirectories per encoding, but that has a maintenance cost as well, and a risk to go out of sync.

Another solution is described in this post from Bruno Haible, I didn't try it.

We encountered that issue in the Slint project choosing an encoding for localized man pages shipped in the packages slackpkg and pktools. Using only ISO8859-1 was not an option, as it can't be used for Greek, Polish, Russian, Serbian, Turkish... So we ended up using UTF-8 for all languages.

About the terminus fonts: you have to use the variant that fits the encoding, as stated in /usr/doc/terminus-font-4.38/README.

Yes DejaVu Sans Mono is a good font, that covers almost all alphabetic languages.

You can even use it in a Linux console or tty, provided you installed fbterm.

dederon 11-24-2014 10:22 AM

Quote:

Originally Posted by genss (Post 5273912)
st would be better if it had history :)

that's what tmux or dvtm are for. what i don't like about st is the xft dependency - thatswhy i use an old snapshot of st.
Another minimalistic utf8 capable terminal emulator is uuterm (by the author of musl).

lems 11-24-2014 11:45 AM

uxterm displays unicode just fine here. I personally don't like xft/TrueType fonts and use this (from my ~/.Xresources getting read by ~/.xinitrc or ~/.xsession depending on how you start x):

Code:

! 6x13
#define UFONT0 -Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
! 7x13
#define UFONT1 -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-70-ISO10646-1
! 7x14
#define UFONT2 -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-70-ISO10646-1
! 8x13
#define UFONT3 -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-80-ISO10646-1
! 9x15
#define UFONT4 -Misc-Fixed-Medium-R-Normal--15-140-75-75-C-90-ISO10646-1
! 9x18
#define UFONT5 -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1

UXTerm*font:                    UFONT5
UXTerm*wideFont:                UFONT5

edit:
UXTerm*wideFont could be redundant, haven't bothered to check.

Some more information:
https://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html

A text file you can « cat » (or open with less or an editor) to see if it's working:
https://www.cl.cam.ac.uk/~mgk25/ucs/...UTF-8-demo.txt

GazL 11-24-2014 01:50 PM

Quote:

Originally Posted by Didier Spaier (Post 5274269)
About the terminus fonts: you have to use the variant that fits the encoding, as stated in /usr/doc/terminus-font-4.38/README.

Thanks for the info on man/roff. That added a bit of detail I wasn't aware of. On the terminus front, I've been happily using it just fine under 8859-x and its only since moving to utf-8 that the issue with the hyphen came to light. The problem is that U+2010 that man/roff is generating when run in utf-8 isn't mapped in any of the terminus fonts, so you end up with whatever glyph the replacement/missing character (U+FFFD) is mapped to.

Anyway, I decided adding a substitute mapping to 0x2d for U+2010 was the right way to go, so the hyphen issue is sorted, even if the rest of man-page localisation is still as messy as ever. ;)

Didier Spaier 11-24-2014 02:04 PM

Quote:

Originally Posted by GazL (Post 5274383)
The problem is that U+2010 that man/roff is generating when run in utf-8 isn't mapped in any of the terminus fonts, so you end up with whatever glyph the replacement/missing character (U+FFFD) is mapped to.

But if the man page itself is encoded in ISO8859-1, did you try "GROFF_ENCODING=ISO8859-1 man <page name>"?

GazL 11-24-2014 02:18 PM

Yeah, didn't work, LANG="en_GB.iso8859-1" man <manpage> works though.


All times are GMT -5. The time now is 03:52 PM.