Have you set up your Slackware to use UTF-8?

NonNonBa · 12-08-2012, 09:43 AM

Hello,

Nowadays, most of the widely used Linux distros use UTF-8 as their default charset. Slackware seems to be the last one providing a byte-oriented charset environment base.

Of course, it is not for the fun or by dogmatism Slackware does that. Some applications don't handle UTF-8 (e.g. elvis, the default "vi" command), and others become less efficient using it (e.g. for a long time grep was known to be drastically slowed with the UTF-8 locales).

Nevertheless, many of the Slackers (including me) have chosen to adopt UTF-8 for their charset. The purpose of this thread is to figure the part of the Slackware users they might represent, and to collect information about their motivations and the problems (fixed or not) they may have encountered doing it. Ideally, the result might be used as a kind of bugtracker to someday get a full UTF-8 default Slackware.

TobiSGD · 12-08-2012, 09:46 AM

Running with UTF-8 to prevent some glitches with displaying German umlauts.

WiseDraco · 12-08-2012, 09:49 AM

i live with russian and latvian languages ( cyrillic and some non-standart latin characters), as so sometimes i encounter a filenames in russian and so on, as so UTF8 is important for me.

H_TeXMeX_H · 12-08-2012, 09:58 AM

No, I don't need UTF-8 ATM. If the need comes up, I'll just change it.

sycamorex · 12-08-2012, 10:37 AM

Yes, I use UTF-8.

Didier Spaier · 12-08-2012, 11:38 AM

I do, without a hitch.

As suggested by the OP, it would be interesting to hear from people encountering problems in doing so, in order to list obstacles in the path of generalizing UTF-8 and find ways to overcome it.

@NonNonBa: merci d'avoir tenu ta promesse

bobzilla · 12-08-2012, 01:50 PM

Quote:

Originally Posted by Didier Spaier

As suggested by the OP, it would be interesting to hear from people encountering problems in doing so, in order to list obstacles in the path of generalizing UTF-8 and find ways to overcome it.

It would be nice to have a list of common problems (and solutions if possible). If those were known, they could be added to "Localization" article in the SDP.

markush · 12-08-2012, 02:18 PM

I've UTF-8 because of the German Umlaute ä, ö, ü, ß, Ä, Ü, Ö

Markus

w1k0 · 12-08-2012, 04:15 PM

I live in Poland. The traditional encoding for Polish language is ISO-8859-2. The people in French or Russia use the other encodings. I don’t know how the people in Poland, French, or Russia deal with the different encodings because I tend to solve the problems my way. So I can merely describe the methods which I used in the past or which I use now. The following description is simplified – in real life I used and I use some additional more or less sophisticated solutions.

Up to Slackware 9.1 I used ISO-8859-2 encoding. In those times to input characters using ISO-8859-2 encoding I used the script xplkbset.iso-8859-2:

Code:

#!/bin/sh

# installs Polish keyboard for X Window (ISO-8859-2)

PLKBOK=`xmodmap -pm | awk '/Mode_switch/ { print $1; exit }'`
if [ -z "$PLKBOK" ]
then
    for MODALT in Alt Meta Super Hyper Shift
    do
        SYMALTR=`xmodmap -pk | awk "/${MODALT}_R/ { print \$1; exit }"`
        if [ -n "$SYMALTR" ]
        then
            SYMALTL=`xmodmap -pk | awk "/${MODALT}_L/ { print \$1; exit }"`
            if [ -n "$SYMALTL" ]
            then
                MODALTR=`xmodmap -pm | awk "/${MODALT}_R/ { print \$1; exit }"`
                xmodmap -e "remove $MODALTR = ${MODALT}_R"
                xmodmap -e "keycode $SYMALTR =  Mode_switch"
                MODSWT=`xmodmap -pm | awk '/^mod/ { if ( $2=="" ) { print $1; exit } }'`
                xmodmap -e "add $MODSWT = Mode_switch"
            break
            fi
        fi
    done
fi
PLKBOK=`xmodmap -pm | awk '/Mode_switch/ { print $1; exit }'`
if [ -n "$PLKBOK" ]
then
    xmodmap -e "keysym A = a A plusminus exclamdown"
    xmodmap -e "keysym C = c C ae AE"
    xmodmap -e "keysym E = e E ecircumflex Ecircumflex"
    xmodmap -e "keysym L = l L threesuperior sterling"
    xmodmap -e "keysym N = n N ntilde Ntilde"
    xmodmap -e "keysym O = o O oacute Oacute"
    xmodmap -e "keysym S = s S paragraph brokenbar"
    xmodmap -e "keysym X = x X onequarter notsign"
    xmodmap -e "keysym Z = z Z questiondown macron"
fi

(I had also the analogous scripts for UTF-8 and CP1250 – the Microsoft Windows encoding for Polish language.)

When Slackware 10.0 appeared it turned out that my scripts stopped to work and xterm requires ISO-8859-2 encoding to input the characters while OpenOffice.org requires UTF-8 encoding. Since Slackware 10.0 the default Polish keyboard for X Window uses UTF-8 encoding and is stored in /etc/X11/xkb/symbols/pl file. In order to be able to input Polish characters using ISO-8859-2 encoding in xterm I prepared my own keyboard driver named pl0:

Code:

// based on a keyboard map from an 'xkb/symbols/pl' file
//
// $XFree86: xc/programs/xkbcomp/symbols/pc/pl,v 1.3 2003/04/19 12:22:12 pascal Exp $

partial default alphanumeric_keys
xkb_symbols "pl" {

    include "latin"

    name[Group1]="Polish";

    key <AD03>  { [         e,          E,  ecircumflex,  Ecircumflex ] };
    key <AD09>  { [         o,          O,       oacute,       Oacute ] };

    key <AC01>  { [         a,          A,    plusminus,   exclamdown ] };
    key <AC02>  { [         s,          S,    paragraph,    brokenbar ] };
    key <AC09>  { [         l,          L,threesuperior,     sterling ] };

    key <AB01>  { [         z,          Z, questiondown,       macron ] };
    key <AB02>  { [         x,          X,   onequarter,      notsign ] };
    key <AB03>  { [         c,          C,           ae,           AE ] };
    key <AB06>  { [         n,          N,       ntilde,       Ntilde ] };

    include "level3(ralt_switch)"
};

To switch between ISO-8859-2 and UTF-8 back and forth I used the commands: setxkbmap pl0 and setxkbmap pl.

When Slackware 13.37 appeared I decided to switch to UTF-8 completely. From time to time I need the terminal using ISO-8859-2 encoding. In such a situations I run the script xterm-ISO-8859-2:

Code:

#!/bin/sh

export GROFF_ENCODING=iso-8859-1 LESSCHARSET=latin1 LANG=en_US XTERM_LOCALE=en_US
/usr/bin/X11/xterm +sb -fg black -bg yellow -geometry 99x49+64+0 -fn -misc-fixed-medium-r-normal--15-140-75-75-c-90-iso8859-2

(The above script is customized to work in Window Maker using 1024×768 resolution.)

I never localized the system nor the programs except the testing purposes so I use the settings specific for an American-English:

Code:

GROFF_ENCODING=UTF-8
LANG=en_US.UTF-8
LESSCHARSET=UTF-8
XTERM_LOCALE=en_US.UTF-8

I described above the solutions which I used or I use in X Window (since 1998 I run Window Maker exclusively). The other solutions were and are required in the console mode: up to Slackware 13.1 I used ISO-8859-2 and since Slackware 13.37 I use UTF-8. The same with HTML files: up to Slackware 13.1 I used ISO-8859-2 and since Slackware 13.37 I use UTF-8.

The other problem concerned different encodings. Before ISO-8859-2 (Linux) and CP1250 (Microsoft Windows) were invented and popularized we have in Poland about twenty different encodings (most notable were Mazovia and IBM Latin-2). Before UTF-8 was invented Polish TeX and LaTeX users used seven different encodings (most of these encodings simply used different prefixes such as: /, ", @, and ~ though there were also slightly more complicated encodings such as popular “ogonek”). So in those old times I wrote a few converters from and to all those encodings. Now we have in Poland three encodings in use: UTF-8, ISO-8859-2, and CP1250. To convert between them it’s enough to use piconv program.

Before the right designed fonts offering Polish diacritic characters appeared different ugly methods were used to substitute eighteen Polish diacritics: Ą, Ć, Ę, Ł, Ń, Ó, Ś, Ź, Ż, ą, ć, ę, ł, ń, ó, ś, ź, and ż. But it isn’t the history but a prehistory of computing in Poland.

Didier Spaier · 12-08-2012, 04:50 PM

w1k0: very instructive, thanks!

In addition, could you tell us which fonts you use with all the needed glyphs for Polish, including the diacritics and ligatures if any are needed?

Paulo2 · 12-08-2012, 05:21 PM

I'm from Brazil, and our language is portuguese (pt-br).
It has some differences to portuguese from Portugal (pt-pt),
but both has the same accented characters á à é í ó ã õ ü ô ê ç and etc.

For me, changing to utf-8 solved the problem in graphical
interface, but not on the console.
All fonts that come with Slackware doesn't show
correctly on the console.
I downloaded the font terminus-font-4.38.tar.gz and it's
slackbuild, and problem solved

I'm not a power user of shell (just a regular user) so for me there is no
problem using utf-8 in graphical or command line environment.

Quercus ruber · 12-08-2012, 05:39 PM

Yes I use it because of the German umlauts. I haven't had any problems yet, so I guess I'm not really the type of person you want to hear from.

astrogeek · 12-08-2012, 05:44 PM

I spend most of my time in a terminal doing development with heavy database use.

Usually Unicode characters showing up in data from outside was the only minor annoyance, and using Vim on files from others which contained Unicode characters.

Early this year I switched from Konsole to Tmux with urxvt and changed everything to UTF-8 at that time.

There have been no big changes for me, but now I see "odd" characters in data and Vim correctly and that was probably worth the trouble (but it was no trouble at all!).

w1k0 · 12-08-2012, 07:16 PM

I can’t remember what fonts I used before Slackware 8.0.

According to my article about Slackware 8.0 the Polish fonts for X Window provided with that distribution were incomplete so I advised the users to remove the standard fonts with removepkg xfntslt2 command and to install the set of the fonts which I put into website. Unfortunately I don’t have that package so I can’t be more specific in that case. With Slackware 8.0 I used Polish keyboard defined in .Xmodmap.

The most sophisticated methods I tested in 2002 with OpenOffice.org 1.0.1. I installed then the Type1 fonts from ulT1mo collection used by X Window type1 module and the TrueType fonts from Microsoft FontPack used by X Window freetype module. Each font type caused the other problems with OpenOffice.org. The program displayed Type1 fonts properly but ignored in their case spell checking and it used proper spell checking in the case of TrueType fonts but the standard Slackware keyboard driver produced invalid Polish diacritics. To get the valid characters with TrueType fonts I prepared keyboard driver using UTF-8 encoding. As for Type1 fonts and spell checking it wasn’t possible to enable it because these fonts used non-standard adobe-fontspecific encoding.

For a long time I refused to use Microsoft FontPack on a regular basis and I used the mentioned Type1 fonts. Then for some time I switched to Microsoft FontPack (these fonts were in those times the most popular among Linux users in Poland).

A few years ago I switched to GNU FreeFont (see: http://www.gnu.org/software/freefont/). These fonts are designed very well and offer a lot of special characters. I prefer them than angular Liberation fonts and I prefer them than the fonts from Microsoft FontPack which offer poorly designed Polish diacritic characters (especially Ą, Ę, ą, and ę). GNU FreeFont looks also better on the screen than Liberation or Microsoft fonts.

A year ago I bought Brother HL-5340D laser printer. As it turned out that printer causes serious problems during printing the texts prepared with GNU FreeFont – many diacritic characters from different languages including Polish are printed bad. I inspected these fonts, established the reasons of the problems, and repaired a lot of characters. I reported that in a rather long thread (see: http://savannah.gnu.org/bugs/?32220). To read about the partial solution see the first post comment #15 from that thread (Sun 23 Jan 2011). My bug report is still open so I suppose these fonts aren’t repaired yet.

***

As I see there’s a new GNU FreeFont release 20120503 (see: http://ftp.gnu.org/gnu/freefont/ and http://slackbuilds.org/repository/14.0/system/freefont/). I’ll test them with my Brother laser printer and I’ll report the results here.

Grischuna · 12-08-2012, 07:36 PM

As already many mentioned, I use as well UTF-8 because of the the German and French special characters.

Cheers