What locale/codeset do you run your slackware box on?

GazL · 08-04-2014, 04:57 PM

I know most distro's tend to be pre-configured for UTF-8 these days, but I've been giving this some thought of late and was curious how many slackers have made the jump to unicode, and if so, have you encountered any incompatible programs.

GazL · 08-04-2014, 05:01 PM

As for myself, I'm still using ISO8859-15.

metaschima · 08-04-2014, 05:25 PM

I use whatever the default one is, not UTF-8. I suppose UTF-8 will eventually become the default, but it doesn't concern me too much.

allend · 08-04-2014, 07:02 PM

I just change to en_AU in /etc/profile.d/lang.sh and /etc/profile.d/lang.csh which suits my purposes. I do not have a need for UTF-8.

Code:

bash-4.3$ locale
LANG=en_AU
LC_CTYPE="en_AU"
LC_NUMERIC="en_AU"
LC_TIME="en_AU"
LC_COLLATE=C
LC_MONETARY="en_AU"
LC_MESSAGES="en_AU"
LC_PAPER="en_AU"
LC_NAME="en_AU"
LC_ADDRESS="en_AU"
LC_TELEPHONE="en_AU"
LC_MEASUREMENT="en_AU"
LC_IDENTIFICATION="en_AU"
LC_ALL=

keefaz · 08-04-2014, 07:22 PM

I prefer set the lang in ~/.bash_profile, I use ISO8859-1

ttk · 08-04-2014, 11:46 PM

ASCII4EVAR

Also, I improve the performance of all my text-parsing utilities (sort, grep, etc) by setting LANG=C and LC_ALL=C. It's like the modern equivalent to the old PC's "Turbo" switch.

astrogeek · 08-05-2014, 02:30 AM

All UTF8 now, since about when Slackware 14 was released.

a4z · 08-05-2014, 03:09 AM

UTF-8,
should be default,
especially if you have to deal with multiple languages, even if the sys lang en_??

GazL · 08-05-2014, 04:17 AM

BTW, one I've found breaks in UTF-8 is vi (elvis). vim is fine though.

Didier Spaier · 08-05-2014, 04:21 AM

fr_FR.utf8. This doesn't prevent me to write "LANG=C <something>" and maybe LC_COLLATE=C [1] when <something> is happier or faster with that, of course. To properly display the man pages encoded in UTF-8, I've in ~/.bashrc:

Code:

alias uman="GROFF_ENCODING=utf8 man"

There still remain a few _not_English_man_ pages_ in legacy encodings, but what can I do?

Also, I can understand that people speaking and reading only in English be not that much interested by UTF8, though but a very few performance costs, or issues with legacy utilities, as ASCII is functionally a subset of UTF-8 I hardly see any drawback even for them using UTF-8.

[1] I'll add LC_CTYPE if you insist, though I rarely need to set LANG to anything other than fr_FR.utf8, and practically never find the need to set other internationalization variables as defined in POSIX' xbd volume.

GazL · 08-05-2014, 04:48 AM

Running in utf-8 and then overriding to LANG=C for performance is fine as long as you know there are no multibyte characters in the input data, or that you are doing no character specific operations on it. But, as the following shows, it can break things:

Code:

gazl@ws1:/tmp$ echo -n "€x€" | wc -m
3
gazl@ws1:/tmp$ echo -n "€x€" | LANG=C wc -m
7

I don't think I'd be inclined to do this very often, if at all.

brianL · 08-05-2014, 05:01 AM

I've got locale set to en_GB. On my laptop, anyway, (where I am now). But I'm pretty sure I've got en_GB.UTF-8 on my desktop - I'll check later.
I'm using a unicode font in the console (Lat2-Terminus16), because it looks better than the default. Nothing bad has happened yet. But it probably will now I've mentioned it.

chrisretusn · 08-05-2014, 05:06 AM

~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Didier Spaier · 08-05-2014, 05:19 AM

Quote:

Originally Posted by GazL

Running in utf-8 and then overriding to LANG=C for performance is fine as long as you know there are no multibyte characters

Of course! I do that only when I *know* that the input is encoded in ASCII.

Quote:

But, as the following shows, it can break things:

Code:

gazl@ws1:/tmp$ echo -n "€x€" | wc -m
3
gazl@ws1:/tmp$ echo -n "€x€" | LANG=C wc -m
7

I don't think I'd be inclined to do this very often, if at all.

Nothing is broken IMO. You tell wc that you feed it with one byte characters, give it 7 bytes, then it answers you that it founded 7 characters. I don't see anything wrong here.

GazL · 08-05-2014, 05:32 AM

The breakage I was referring to is in the usage: the inappropriate override of LANG=C. I thought that was obvious from the context of what I posted, but I guess not. As you say, the 'wc' utility is clearly not broken, working as designed, and doing exactly what I told it to.