LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 10-15-2015, 10:36 AM   #1
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Rep: Reputation: Disabled
UTF-8, not utf-8 or utf8 in locale setting to have SCIM working?


In /etc/profile.d/scim.sh, shipped in the scim package I see:
Code:
# For SCIM to work, you need to use a UTF-8 locale.  Make sure it ends on
# ".UTF-8", not "utf-8"!  As an example, you would need to use en_US.UTF-8
# for a US locale (export LANG=en_US.UTF-8), not en_US.
However, "locale -a|grep -i utf" only returns locales ending in .utf8. I understand that utf8 is an alias for UTF-8 but still, I never had an issue setting LANG to fr_FR.utf8, nor a complaint from a Slint user using that form.

I am not a scim user myself, however my question is: is it still true that setting LANG to <something>.utf-8 or to <something>.utf8 prevents scim of working properly?

I ask because in the Slint installers we use the form <something>.uf8 and I don't want to prevent scim to work.

Last edited by Didier Spaier; 10-15-2015 at 10:37 AM.
 
Old 10-15-2015, 12:40 PM   #2
Alien Bob
Slackware Contributor
 
Registered: Sep 2005
Location: Eindhoven, The Netherlands
Distribution: Slackware
Posts: 8,559

Rep: Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106
While "locale -a" will show you ".utf-8" lowercase suffixes, the commands "locale -m" and "locale charmap" will show you uppercase ".UTF-8".
The LANG, LC_ALL etc environment variables need to have uppercase ".UTF-8" in their definitions, at least that is what all articles claim. I have not found the ultimate backing proof for that statement however. But I think it does not harm anyone to stick with this uppercase definition.

Nice read: https://www.cl.cam.ac.uk/~mgk25/unicode.html
 
1 members found this post helpful.
Old 10-15-2015, 12:55 PM   #3
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Alien Bob View Post
While "locale -a" will show you ".utf-8" lowercase suffixes, the commands "locale -m" and "locale charmap" will show you uppercase ".UTF-8".
The LANG, LC_ALL etc environment variables need to have uppercase ".UTF-8" in their definitions, at least that is what all articles claim. I have not found the ultimate backing proof for that statement however. But I think it does not harm anyone to stick with this uppercase definition.

Nice read: https://www.cl.cam.ac.uk/~mgk25/unicode.html
Thanks for your answer Eric.

Yes I have seen such statements like this one in the document you linked to:
Quote:
Please do not write UTF-8 in any documentation text in other ways (such as utf8 or UTF_8), unless of course you refer to a variable name and not the encoding itself.
But I couldn't find any convincing backing for that.

I have downloaded the whole 1SO-10646 docs in pdf format (150 megabytes, as that includes the glyphs...) and also looked into the last Unicode specification (version 8.0.0) and found nothing about the alias.

Also the POSIX specification doesn't say anything about UTF-8 (or I need better glasses): it just mentions more generally UCS.

Finally, I just know that the alias can be used in some programming languages and have seen it mentioned in an RFC (I can't remember which at the moment).

Still I confirm that I didn't have any issue so far (maybe because glibc is lenient?) and stay curious about the problems that could or not actually arise in SCIM.

Last edited by Didier Spaier; 10-15-2015 at 01:00 PM.
 
Old 10-15-2015, 01:48 PM   #4
Alien Bob
Slackware Contributor
 
Registered: Sep 2005
Location: Eindhoven, The Netherlands
Distribution: Slackware
Posts: 8,559

Rep: Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106Reputation: 8106
Why so adamant to go against the advice in the Slackware script? What is there to gain? If things do go wrong because of your use of lowercase .utf-8 people will complain in this forum and not in your mailbox.
 
Old 10-15-2015, 02:07 PM   #5
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Original Poster
Rep: Reputation: Disabled
My goal is not to go against an advice I just discovered today! I am just trying to figuring if not following (involuntarily) that advice so far could have really hurt an user.

Incidentally I also discovered today that Salix' localesetup use the same naming scheme, so I am not alone

Anyway I will probably end up checking myself if no SCIM user posts an answer.

Last edited by Didier Spaier; 10-15-2015 at 02:10 PM.
 
Old 10-15-2015, 03:40 PM   #6
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Original Poster
Rep: Reputation: Disabled
Well, I tried SCIM in Salix with LANG=fr_FR.utf8 and that works. I ill try in Slackware too.
 
Old 10-15-2015, 03:48 PM   #7
imitheos
Member
 
Registered: May 2005
Location: Greece
Posts: 441

Rep: Reputation: 141Reputation: 141
Quote:
Originally Posted by Alien Bob View Post
While "locale -a" will show you ".utf-8" lowercase suffixes, the commands "locale -m" and "locale charmap" will show you uppercase ".UTF-8".
The LANG, LC_ALL etc environment variables need to have uppercase ".UTF-8" in their definitions, at least that is what all articles claim. I have not found the ultimate backing proof for that statement however. But I think it does not harm anyone to stick with this uppercase definition.
locale -a pretty much shows the locale directories which are indeed named as lowercase without dash utf8 as we can see from /usr/lib{,64}/locale. The charmap prints the "correct" name which is uppercase with dash UTF-8.

Quote:
Originally Posted by Didier Spaier View Post
Thanks for your answer Eric.
Yes I have seen such statements like this one in the document you linked to:But I couldn't find any convincing backing for that.
This is a bit different because the document speaks about the unicode encoding (or the unicode standard if you like) when it mentions "always write is as UTF-8" and not the linux locale.

Code:
% LANG=el_GR.kkk locale > /dev/null 
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
% LANG=el_GR.utf locale > /dev/null  
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
% LANG=el_GR.utf8 locale > /dev/null  
% LANG=el_GR.utf-8 locale > /dev/null
% LANG=el_GR.UTF8 locale > /dev/null  
% LANG=el_GR.UTF-8 locale > /dev/null
The correct term to use is .UTF-8 but on linux (or more correctly on glibc), all variations work as you see. The best action for config files is to always use the proper term even if others work because you might use the same config on another OS (i learnt that the hard way some years ago when i copied a config of mine to netbsd and it took me a long time to find why it didn't work )
 
2 members found this post helpful.
Old 10-15-2015, 05:54 PM   #8
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Original Poster
Rep: Reputation: Disabled
Thanks for your answer, Imitheos, that seems to confirm my assumption about glibc, although I didn't find anything in the docs about that. I must admit that I didn't dive in the code where I would have drowned myself.

I tried SCIM in Slackware-14.1 and still with LANG=fr_FR.utf8 and that works. This is not surprising as /etc/profile.d/scim.sh in Salix-Mate-14.1 was obviously borrowed to Slackware.

I will take a note to reconsider these settings as soon as Slint will have to migrate to a *bsd...

Meanwhile, I mark this thread as [SOLVED]

PS Still, I think that you are right generally speaking to try to make everything portable as much as possible.

That was my guideline writing convtags (see my signature below), strictly following the POSIX specification for sed. For instance I used only basic regular expressions (although I assume that most if not all sed implementations allow usage of extended ones).

Last edited by Didier Spaier; 10-15-2015 at 11:10 PM. Reason: Typo fix.
 
Old 10-16-2015, 01:56 AM   #9
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Original Poster
Rep: Reputation: Disabled
I did more testing. It seems that what really counts is that the locale set has actually an UTF-8 encoding, regardless of its name.

For instance I have now LANG set to fa_IR (there is no fa_IR.utf8 listed by locale -a) and as you can see in the three lines below I can type in Persian, Tamoul and Greek:
ُاهس هس حثقسهضد
டொஸ் இஸ் ட்ஃmஇல்
Τηισ ισ Γρεεκ

This works also in xfce4-terminal and kate.

But if I set LANG to fr_FR that doesn't work everywhere: it works in this online editor as well as e.g. in leafpad, geany or kate, but not in terminals like e.g. xfce4-terminal.

Last edited by Didier Spaier; 10-16-2015 at 10:48 AM. Reason: kate mentioned.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
locale setting, problem with en_US.UTF-8 pru Linux - Software 4 04-18-2008 04:07 AM
GTK1 not understanging UTF8?, LANG=pl_PL.UTF-8 qs-raven Linux - Software 5 12-03-2006 08:31 AM
locale utf iso Xris718 Linux - General 0 01-13-2005 09:22 AM
[Enter] in text documents diffrent on Windows and Linux? UTF-8/UTF-16 problem or? brynjarh Linux - General 1 11-24-2004 05:20 AM
X11 / UTF-8 locale seems missing 'fr_FR.UTF-8' chrsitophermann Debian 11 07-17-2004 02:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 06:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration